codeur66
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
361 Bytes b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
361 Bytes
diff --git a/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
366 Bytes b/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
366 Bytes
diff --git a/‎dev/_downloads/d55388904f5399e98ed36e971c4da3cf/plot_rbf_parameters.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/d55388904f5399e98ed36e971c4da3cf/plot_rbf_parameters.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/ea8b449d4699d078ef9cc5cded54cc67/plot_rbf_parameters.py
Lines changed: 12 additions & 7 deletions b/‎dev/_downloads/ea8b449d4699d078ef9cc5cded54cc67/plot_rbf_parameters.py
Lines changed: 12 additions & 7 deletions
diff --git a/‎dev/_downloads/scikit-learn-docs.pdf
1.71 KB b/‎dev/_downloads/scikit-learn-docs.pdf
1.71 KB
diff --git a/‎dev/_images/binder_badge_logo.png
0 Bytes b/‎dev/_images/binder_badge_logo.png
0 Bytes
diff --git a/‎dev/_images/iris.png
0 Bytes b/‎dev/_images/iris.png
0 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
-349 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
-349 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0011.png
-349 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0011.png
-349 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
-167 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
-167 Bytes
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# RBF SVM parameters\n\nThis example illustrates the effect of the parameters ``gamma`` and ``C`` of\nthe Radial Basis Function (RBF) kernel SVM.\n\nIntuitively, the ``gamma`` parameter defines how far the influence of a single\ntraining example reaches, with low values meaning 'far' and high values meaning\n'close'. The ``gamma`` parameters can be seen as the inverse of the radius of\ninfluence of samples selected by the model as support vectors.\n\nThe ``C`` parameter trades off correct classification of training examples\nagainst maximization of the decision function's margin. For larger values of\n``C``, a smaller margin will be accepted if the decision function is better at\nclassifying all training points correctly. A lower ``C`` will encourage a\nlarger margin, therefore a simpler decision function, at the cost of training\naccuracy. In other words ``C`` behaves as a regularization parameter in the\nSVM.\n\nThe first plot is a visualization of the decision function for a variety of\nparameter values on a simplified classification problem involving only 2 input\nfeatures and 2 possible target classes (binary classification). Note that this\nkind of plot is not possible to do for problems with more features or target\nclasses.\n\nThe second plot is a heatmap of the classifier's cross-validation accuracy as a\nfunction of ``C`` and ``gamma``. For this example we explore a relatively large\ngrid for illustration purposes. In practice, a logarithmic grid from\n$10^{-3}$ to $10^3$ is usually sufficient. If the best parameters\nlie on the boundaries of the grid, it can be extended in that direction in a\nsubsequent search.\n\nNote that the heat map plot has a special colorbar with a midpoint value close\nto the score values of the best performing models so as to make it easy to tell\nthem apart in the blink of an eye.\n\nThe behavior of the model is very sensitive to the ``gamma`` parameter. If\n``gamma`` is too large, the radius of the area of influence of the support\nvectors only includes the support vector itself and no amount of\nregularization with ``C`` will be able to prevent overfitting.\n\nWhen ``gamma`` is very small, the model is too constrained and cannot capture\nthe complexity or \"shape\" of the data. The region of influence of any selected\nsupport vector would include the whole training set. The resulting model will\nbehave similarly to a linear model with a set of hyperplanes that separate the\ncenters of high density of any pair of two classes.\n\nFor intermediate values, we can see on the second plot that good models can\nbe found on a diagonal of ``C`` and ``gamma``. Smooth models (lower ``gamma``\nvalues) can be made more complex by increasing the importance of classifying\neach point correctly (larger ``C`` values) hence the diagonal of good\nperforming models.\n\nFinally one can also observe that for some intermediate values of ``gamma`` we\nget equally performing models when ``C`` becomes very large: it is not\nnecessary to regularize by enforcing a larger margin. The radius of the RBF\nkernel alone acts as a good structural regularizer. In practice though it\nmight still be interesting to simplify the decision function with a lower\nvalue of ``C`` so as to favor models that use less memory and that are faster\nto predict.\n\nWe should also note that small differences in scores results from the random\nsplits of the cross-validation procedure. Those spurious variations can be\nsmoothed out by increasing the number of CV iterations ``n_splits`` at the\nexpense of compute time. Increasing the value number of ``C_range`` and\n``gamma_range`` steps will increase the resolution of the hyper-parameter heat\nmap.\n"
+        "\n# RBF SVM parameters\n\nThis example illustrates the effect of the parameters ``gamma`` and ``C`` of\nthe Radial Basis Function (RBF) kernel SVM.\n\nIntuitively, the ``gamma`` parameter defines how far the influence of a single\ntraining example reaches, with low values meaning 'far' and high values meaning\n'close'. The ``gamma`` parameters can be seen as the inverse of the radius of\ninfluence of samples selected by the model as support vectors.\n\nThe ``C`` parameter trades off correct classification of training examples\nagainst maximization of the decision function's margin. For larger values of\n``C``, a smaller margin will be accepted if the decision function is better at\nclassifying all training points correctly. A lower ``C`` will encourage a\nlarger margin, therefore a simpler decision function, at the cost of training\naccuracy. In other words ``C`` behaves as a regularization parameter in the\nSVM.\n\nThe first plot is a visualization of the decision function for a variety of\nparameter values on a simplified classification problem involving only 2 input\nfeatures and 2 possible target classes (binary classification). Note that this\nkind of plot is not possible to do for problems with more features or target\nclasses.\n\nThe second plot is a heatmap of the classifier's cross-validation accuracy as a\nfunction of ``C`` and ``gamma``. For this example we explore a relatively large\ngrid for illustration purposes. In practice, a logarithmic grid from\n$10^{-3}$ to $10^3$ is usually sufficient. If the best parameters\nlie on the boundaries of the grid, it can be extended in that direction in a\nsubsequent search.\n\nNote that the heat map plot has a special colorbar with a midpoint value close\nto the score values of the best performing models so as to make it easy to tell\nthem apart in the blink of an eye.\n\nThe behavior of the model is very sensitive to the ``gamma`` parameter. If\n``gamma`` is too large, the radius of the area of influence of the support\nvectors only includes the support vector itself and no amount of\nregularization with ``C`` will be able to prevent overfitting.\n\nWhen ``gamma`` is very small, the model is too constrained and cannot capture\nthe complexity or \"shape\" of the data. The region of influence of any selected\nsupport vector would include the whole training set. The resulting model will\nbehave similarly to a linear model with a set of hyperplanes that separate the\ncenters of high density of any pair of two classes.\n\nFor intermediate values, we can see on the second plot that good models can\nbe found on a diagonal of ``C`` and ``gamma``. Smooth models (lower ``gamma``\nvalues) can be made more complex by increasing the importance of classifying\neach point correctly (larger ``C`` values) hence the diagonal of good\nperforming models.\n\nFinally, one can also observe that for some intermediate values of ``gamma`` we\nget equally performing models when ``C`` becomes very large. This suggests that\nthe set of support vectors does not change anymore. The radius of the RBF\nkernel alone acts as a good structural regularizer. Increasing ``C`` further\ndoesn't help, likely because there are no more training points in violation\n(inside the margin or wrongly classified), or at least no better solution can\nbe found. Scores being equal, it may make sense to use the smaller ``C``\nvalues, since very high ``C`` values typically increase fitting time.\n\nOn the other hand, lower ``C`` values generally lead to more support vectors,\nwhich may increase prediction time. Therefore, lowering the value of ``C``\ninvolves a trade-off between fitting time and prediction time.\n\nWe should also note that small differences in scores results from the random\nsplits of the cross-validation procedure. Those spurious variations can be\nsmoothed out by increasing the number of CV iterations ``n_splits`` at the\nexpense of compute time. Increasing the value number of ``C_range`` and\n``gamma_range`` steps will increase the resolution of the hyper-parameter heat\nmap.\n"
       ]
     },
     {
 
@@ -53,13 +53,18 @@
 each point correctly (larger ``C`` values) hence the diagonal of good
 performing models.
 
-Finally one can also observe that for some intermediate values of ``gamma`` we
-get equally performing models when ``C`` becomes very large: it is not
-necessary to regularize by enforcing a larger margin. The radius of the RBF
-kernel alone acts as a good structural regularizer. In practice though it
-might still be interesting to simplify the decision function with a lower
-value of ``C`` so as to favor models that use less memory and that are faster
-to predict.
+Finally, one can also observe that for some intermediate values of ``gamma`` we
+get equally performing models when ``C`` becomes very large. This suggests that
+the set of support vectors does not change anymore. The radius of the RBF
+kernel alone acts as a good structural regularizer. Increasing ``C`` further
+doesn't help, likely because there are no more training points in violation
+(inside the margin or wrongly classified), or at least no better solution can
+be found. Scores being equal, it may make sense to use the smaller ``C``
+values, since very high ``C`` values typically increase fitting time.
+
+On the other hand, lower ``C`` values generally lead to more support vectors,
+which may increase prediction time. Therefore, lowering the value of ``C``
+involves a trade-off between fitting time and prediction time.
 
 We should also note that small differences in scores results from the random
 splits of the cross-validation procedure. Those spurious variations can be
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`"cell_type": "markdown",`
`16`	`16`	`"metadata": {},`
`17`	`17`	`"source": [`
`18`		- "\n# RBF SVM parameters\n\nThis example illustrates the effect of the parameters ``gamma`` and ``C`` of\nthe Radial Basis Function (RBF) kernel SVM.\n\nIntuitively, the ``gamma`` parameter defines how far the influence of a single\ntraining example reaches, with low values meaning 'far' and high values meaning\n'close'. The ``gamma`` parameters can be seen as the inverse of the radius of\ninfluence of samples selected by the model as support vectors.\n\nThe ``C`` parameter trades off correct classification of training examples\nagainst maximization of the decision function's margin. For larger values of\n``C``, a smaller margin will be accepted if the decision function is better at\nclassifying all training points correctly. A lower ``C`` will encourage a\nlarger margin, therefore a simpler decision function, at the cost of training\naccuracy. In other words ``C`` behaves as a regularization parameter in the\nSVM.\n\nThe first plot is a visualization of the decision function for a variety of\nparameter values on a simplified classification problem involving only 2 input\nfeatures and 2 possible target classes (binary classification). Note that this\nkind of plot is not possible to do for problems with more features or target\nclasses.\n\nThe second plot is a heatmap of the classifier's cross-validation accuracy as a\nfunction of ``C`` and ``gamma``. For this example we explore a relatively large\ngrid for illustration purposes. In practice, a logarithmic grid from\n$10^{-3}$ to $10^3$ is usually sufficient. If the best parameters\nlie on the boundaries of the grid, it can be extended in that direction in a\nsubsequent search.\n\nNote that the heat map plot has a special colorbar with a midpoint value close\nto the score values of the best performing models so as to make it easy to tell\nthem apart in the blink of an eye.\n\nThe behavior of the model is very sensitive to the ``gamma`` parameter. If\n``gamma`` is too large, the radius of the area of influence of the support\nvectors only includes the support vector itself and no amount of\nregularization with ``C`` will be able to prevent overfitting.\n\nWhen ``gamma`` is very small, the model is too constrained and cannot capture\nthe complexity or \"shape\" of the data. The region of influence of any selected\nsupport vector would include the whole training set. The resulting model will\nbehave similarly to a linear model with a set of hyperplanes that separate the\ncenters of high density of any pair of two classes.\n\nFor intermediate values, we can see on the second plot that good models can\nbe found on a diagonal of ``C`` and ``gamma``. Smooth models (lower ``gamma``\nvalues) can be made more complex by increasing the importance of classifying\neach point correctly (larger ``C`` values) hence the diagonal of good\nperforming models.\n\nFinally one can also observe that for some intermediate values of ``gamma`` we\nget equally performing models when ``C`` becomes very large: it is not\nnecessary to regularize by enforcing a larger margin. The radius of the RBF\nkernel alone acts as a good structural regularizer. In practice though it\nmight still be interesting to simplify the decision function with a lower\nvalue of ``C`` so as to favor models that use less memory and that are faster\nto predict.\n\nWe should also note that small differences in scores results from the random\nsplits of the cross-validation procedure. Those spurious variations can be\nsmoothed out by increasing the number of CV iterations ``n_splits`` at the\nexpense of compute time. Increasing the value number of ``C_range`` and\n``gamma_range`` steps will increase the resolution of the hyper-parameter heat\nmap.\n"
	`18`	+ "\n# RBF SVM parameters\n\nThis example illustrates the effect of the parameters ``gamma`` and ``C`` of\nthe Radial Basis Function (RBF) kernel SVM.\n\nIntuitively, the ``gamma`` parameter defines how far the influence of a single\ntraining example reaches, with low values meaning 'far' and high values meaning\n'close'. The ``gamma`` parameters can be seen as the inverse of the radius of\ninfluence of samples selected by the model as support vectors.\n\nThe ``C`` parameter trades off correct classification of training examples\nagainst maximization of the decision function's margin. For larger values of\n``C``, a smaller margin will be accepted if the decision function is better at\nclassifying all training points correctly. A lower ``C`` will encourage a\nlarger margin, therefore a simpler decision function, at the cost of training\naccuracy. In other words ``C`` behaves as a regularization parameter in the\nSVM.\n\nThe first plot is a visualization of the decision function for a variety of\nparameter values on a simplified classification problem involving only 2 input\nfeatures and 2 possible target classes (binary classification). Note that this\nkind of plot is not possible to do for problems with more features or target\nclasses.\n\nThe second plot is a heatmap of the classifier's cross-validation accuracy as a\nfunction of ``C`` and ``gamma``. For this example we explore a relatively large\ngrid for illustration purposes. In practice, a logarithmic grid from\n$10^{-3}$ to $10^3$ is usually sufficient. If the best parameters\nlie on the boundaries of the grid, it can be extended in that direction in a\nsubsequent search.\n\nNote that the heat map plot has a special colorbar with a midpoint value close\nto the score values of the best performing models so as to make it easy to tell\nthem apart in the blink of an eye.\n\nThe behavior of the model is very sensitive to the ``gamma`` parameter. If\n``gamma`` is too large, the radius of the area of influence of the support\nvectors only includes the support vector itself and no amount of\nregularization with ``C`` will be able to prevent overfitting.\n\nWhen ``gamma`` is very small, the model is too constrained and cannot capture\nthe complexity or \"shape\" of the data. The region of influence of any selected\nsupport vector would include the whole training set. The resulting model will\nbehave similarly to a linear model with a set of hyperplanes that separate the\ncenters of high density of any pair of two classes.\n\nFor intermediate values, we can see on the second plot that good models can\nbe found on a diagonal of ``C`` and ``gamma``. Smooth models (lower ``gamma``\nvalues) can be made more complex by increasing the importance of classifying\neach point correctly (larger ``C`` values) hence the diagonal of good\nperforming models.\n\nFinally, one can also observe that for some intermediate values of ``gamma`` we\nget equally performing models when ``C`` becomes very large. This suggests that\nthe set of support vectors does not change anymore. The radius of the RBF\nkernel alone acts as a good structural regularizer. Increasing ``C`` further\ndoesn't help, likely because there are no more training points in violation\n(inside the margin or wrongly classified), or at least no better solution can\nbe found. Scores being equal, it may make sense to use the smaller ``C``\nvalues, since very high ``C`` values typically increase fitting time.\n\nOn the other hand, lower ``C`` values generally lead to more support vectors,\nwhich may increase prediction time. Therefore, lowering the value of ``C``\ninvolves a trade-off between fitting time and prediction time.\n\nWe should also note that small differences in scores results from the random\nsplits of the cross-validation procedure. Those spurious variations can be\nsmoothed out by increasing the number of CV iterations ``n_splits`` at the\nexpense of compute time. Increasing the value number of ``C_range`` and\n``gamma_range`` steps will increase the resolution of the hyper-parameter heat\nmap.\n"
`19`	`19`	`]`
`20`	`20`	`},`
`21`	`21`	`{`