Skip to content

Commit 2d63e19

Browse files
committed
Pushing the docs to dev/ for branch: main, commit ad537bca627dddd502b6539ae8c7c349c89bfc69
1 parent c00e5fa commit 2d63e19

File tree

1,221 files changed

+4930
-4572
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,221 files changed

+4930
-4572
lines changed
Binary file not shown.
Binary file not shown.

dev/_downloads/d55388904f5399e98ed36e971c4da3cf/plot_rbf_parameters.ipynb

Lines changed: 134 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,139 @@
1818
"\n# RBF SVM parameters\n\nThis example illustrates the effect of the parameters ``gamma`` and ``C`` of\nthe Radial Basis Function (RBF) kernel SVM.\n\nIntuitively, the ``gamma`` parameter defines how far the influence of a single\ntraining example reaches, with low values meaning 'far' and high values meaning\n'close'. The ``gamma`` parameters can be seen as the inverse of the radius of\ninfluence of samples selected by the model as support vectors.\n\nThe ``C`` parameter trades off correct classification of training examples\nagainst maximization of the decision function's margin. For larger values of\n``C``, a smaller margin will be accepted if the decision function is better at\nclassifying all training points correctly. A lower ``C`` will encourage a\nlarger margin, therefore a simpler decision function, at the cost of training\naccuracy. In other words ``C`` behaves as a regularization parameter in the\nSVM.\n\nThe first plot is a visualization of the decision function for a variety of\nparameter values on a simplified classification problem involving only 2 input\nfeatures and 2 possible target classes (binary classification). Note that this\nkind of plot is not possible to do for problems with more features or target\nclasses.\n\nThe second plot is a heatmap of the classifier's cross-validation accuracy as a\nfunction of ``C`` and ``gamma``. For this example we explore a relatively large\ngrid for illustration purposes. In practice, a logarithmic grid from\n$10^{-3}$ to $10^3$ is usually sufficient. If the best parameters\nlie on the boundaries of the grid, it can be extended in that direction in a\nsubsequent search.\n\nNote that the heat map plot has a special colorbar with a midpoint value close\nto the score values of the best performing models so as to make it easy to tell\nthem apart in the blink of an eye.\n\nThe behavior of the model is very sensitive to the ``gamma`` parameter. If\n``gamma`` is too large, the radius of the area of influence of the support\nvectors only includes the support vector itself and no amount of\nregularization with ``C`` will be able to prevent overfitting.\n\nWhen ``gamma`` is very small, the model is too constrained and cannot capture\nthe complexity or \"shape\" of the data. The region of influence of any selected\nsupport vector would include the whole training set. The resulting model will\nbehave similarly to a linear model with a set of hyperplanes that separate the\ncenters of high density of any pair of two classes.\n\nFor intermediate values, we can see on the second plot that good models can\nbe found on a diagonal of ``C`` and ``gamma``. Smooth models (lower ``gamma``\nvalues) can be made more complex by increasing the importance of classifying\neach point correctly (larger ``C`` values) hence the diagonal of good\nperforming models.\n\nFinally, one can also observe that for some intermediate values of ``gamma`` we\nget equally performing models when ``C`` becomes very large. This suggests that\nthe set of support vectors does not change anymore. The radius of the RBF\nkernel alone acts as a good structural regularizer. Increasing ``C`` further\ndoesn't help, likely because there are no more training points in violation\n(inside the margin or wrongly classified), or at least no better solution can\nbe found. Scores being equal, it may make sense to use the smaller ``C``\nvalues, since very high ``C`` values typically increase fitting time.\n\nOn the other hand, lower ``C`` values generally lead to more support vectors,\nwhich may increase prediction time. Therefore, lowering the value of ``C``\ninvolves a trade-off between fitting time and prediction time.\n\nWe should also note that small differences in scores results from the random\nsplits of the cross-validation procedure. Those spurious variations can be\nsmoothed out by increasing the number of CV iterations ``n_splits`` at the\nexpense of compute time. Increasing the value number of ``C_range`` and\n``gamma_range`` steps will increase the resolution of the hyper-parameter heat\nmap.\n"
1919
]
2020
},
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {},
24+
"source": [
25+
"Utility class to move the midpoint of a colormap to be around\nthe values of interest.\n\n"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {
32+
"collapsed": false
33+
},
34+
"outputs": [],
35+
"source": [
36+
"import numpy as np\nfrom matplotlib.colors import Normalize\n\n\nclass MidpointNormalize(Normalize):\n def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):\n self.midpoint = midpoint\n Normalize.__init__(self, vmin, vmax, clip)\n\n def __call__(self, value, clip=None):\n x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]\n return np.ma.masked_array(np.interp(value, x, y))"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"metadata": {},
42+
"source": [
43+
"## Load and prepare data set\n\ndataset for grid search\n\n"
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"execution_count": null,
49+
"metadata": {
50+
"collapsed": false
51+
},
52+
"outputs": [],
53+
"source": [
54+
"from sklearn.datasets import load_iris\n\niris = load_iris()\nX = iris.data\ny = iris.target"
55+
]
56+
},
57+
{
58+
"cell_type": "markdown",
59+
"metadata": {},
60+
"source": [
61+
"Dataset for decision function visualization: we only keep the first two\nfeatures in X and sub-sample the dataset to keep only 2 classes and\nmake it a binary classification problem.\n\n"
62+
]
63+
},
64+
{
65+
"cell_type": "code",
66+
"execution_count": null,
67+
"metadata": {
68+
"collapsed": false
69+
},
70+
"outputs": [],
71+
"source": [
72+
"X_2d = X[:, :2]\nX_2d = X_2d[y > 0]\ny_2d = y[y > 0]\ny_2d -= 1"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"It is usually a good idea to scale the data for SVM training.\nWe are cheating a bit in this example in scaling all of the data,\ninstead of fitting the transformation on the training set and\njust applying it on the test set.\n\n"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {
86+
"collapsed": false
87+
},
88+
"outputs": [],
89+
"source": [
90+
"from sklearn.preprocessing import StandardScaler\n\nscaler = StandardScaler()\nX = scaler.fit_transform(X)\nX_2d = scaler.fit_transform(X_2d)"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"metadata": {},
96+
"source": [
97+
"## Train classifiers\n\nFor an initial search, a logarithmic grid with basis\n10 is often helpful. Using a basis of 2, a finer\ntuning can be achieved but at a much higher cost.\n\n"
98+
]
99+
},
100+
{
101+
"cell_type": "code",
102+
"execution_count": null,
103+
"metadata": {
104+
"collapsed": false
105+
},
106+
"outputs": [],
107+
"source": [
108+
"from sklearn.svm import SVC\nfrom sklearn.model_selection import StratifiedShuffleSplit\nfrom sklearn.model_selection import GridSearchCV\n\nC_range = np.logspace(-2, 10, 13)\ngamma_range = np.logspace(-9, 3, 13)\nparam_grid = dict(gamma=gamma_range, C=C_range)\ncv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)\ngrid = GridSearchCV(SVC(), param_grid=param_grid, cv=cv)\ngrid.fit(X, y)\n\nprint(\n \"The best parameters are %s with a score of %0.2f\"\n % (grid.best_params_, grid.best_score_)\n)"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"Now we need to fit a classifier for all parameters in the 2d version\n(we use a smaller set of parameters here because it takes a while to train)\n\n"
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": null,
121+
"metadata": {
122+
"collapsed": false
123+
},
124+
"outputs": [],
125+
"source": [
126+
"C_2d_range = [1e-2, 1, 1e2]\ngamma_2d_range = [1e-1, 1, 1e1]\nclassifiers = []\nfor C in C_2d_range:\n for gamma in gamma_2d_range:\n clf = SVC(C=C, gamma=gamma)\n clf.fit(X_2d, y_2d)\n classifiers.append((C, gamma, clf))"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"## Visualization\n\ndraw visualization of parameter effects\n\n"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {
140+
"collapsed": false
141+
},
142+
"outputs": [],
143+
"source": [
144+
"import matplotlib.pyplot as plt\n\nplt.figure(figsize=(8, 6))\nxx, yy = np.meshgrid(np.linspace(-3, 3, 200), np.linspace(-3, 3, 200))\nfor (k, (C, gamma, clf)) in enumerate(classifiers):\n # evaluate decision function in a grid\n Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])\n Z = Z.reshape(xx.shape)\n\n # visualize decision function for these parameters\n plt.subplot(len(C_2d_range), len(gamma_2d_range), k + 1)\n plt.title(\"gamma=10^%d, C=10^%d\" % (np.log10(gamma), np.log10(C)), size=\"medium\")\n\n # visualize parameter's effect on decision function\n plt.pcolormesh(xx, yy, -Z, cmap=plt.cm.RdBu)\n plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y_2d, cmap=plt.cm.RdBu_r, edgecolors=\"k\")\n plt.xticks(())\n plt.yticks(())\n plt.axis(\"tight\")\n\nscores = grid.cv_results_[\"mean_test_score\"].reshape(len(C_range), len(gamma_range))"
145+
]
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"metadata": {},
150+
"source": [
151+
"Draw heatmap of the validation accuracy as a function of gamma and C\n\nThe score are encoded as colors with the hot colormap which varies from dark\nred to bright yellow. As the most interesting scores are all located in the\n0.92 to 0.97 range we use a custom normalizer to set the mid-point to 0.92 so\nas to make it easier to visualize the small variations of score values in the\ninteresting range while not brutally collapsing all the low score values to\nthe same color.\n\n"
152+
]
153+
},
21154
{
22155
"cell_type": "code",
23156
"execution_count": null,
@@ -26,7 +159,7 @@
26159
},
27160
"outputs": [],
28161
"source": [
29-
"import numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.colors import Normalize\n\nfrom sklearn.svm import SVC\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import StratifiedShuffleSplit\nfrom sklearn.model_selection import GridSearchCV\n\n\n# Utility function to move the midpoint of a colormap to be around\n# the values of interest.\n\n\nclass MidpointNormalize(Normalize):\n def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):\n self.midpoint = midpoint\n Normalize.__init__(self, vmin, vmax, clip)\n\n def __call__(self, value, clip=None):\n x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]\n return np.ma.masked_array(np.interp(value, x, y))\n\n\n# #############################################################################\n# Load and prepare data set\n#\n# dataset for grid search\n\n\niris = load_iris()\nX = iris.data\ny = iris.target\n\n# Dataset for decision function visualization: we only keep the first two\n# features in X and sub-sample the dataset to keep only 2 classes and\n# make it a binary classification problem.\n\nX_2d = X[:, :2]\nX_2d = X_2d[y > 0]\ny_2d = y[y > 0]\ny_2d -= 1\n\n# It is usually a good idea to scale the data for SVM training.\n# We are cheating a bit in this example in scaling all of the data,\n# instead of fitting the transformation on the training set and\n# just applying it on the test set.\n\nscaler = StandardScaler()\nX = scaler.fit_transform(X)\nX_2d = scaler.fit_transform(X_2d)\n\n# #############################################################################\n# Train classifiers\n#\n# For an initial search, a logarithmic grid with basis\n# 10 is often helpful. Using a basis of 2, a finer\n# tuning can be achieved but at a much higher cost.\n\nC_range = np.logspace(-2, 10, 13)\ngamma_range = np.logspace(-9, 3, 13)\nparam_grid = dict(gamma=gamma_range, C=C_range)\ncv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)\ngrid = GridSearchCV(SVC(), param_grid=param_grid, cv=cv)\ngrid.fit(X, y)\n\nprint(\n \"The best parameters are %s with a score of %0.2f\"\n % (grid.best_params_, grid.best_score_)\n)\n\n# Now we need to fit a classifier for all parameters in the 2d version\n# (we use a smaller set of parameters here because it takes a while to train)\n\nC_2d_range = [1e-2, 1, 1e2]\ngamma_2d_range = [1e-1, 1, 1e1]\nclassifiers = []\nfor C in C_2d_range:\n for gamma in gamma_2d_range:\n clf = SVC(C=C, gamma=gamma)\n clf.fit(X_2d, y_2d)\n classifiers.append((C, gamma, clf))\n\n# #############################################################################\n# Visualization\n#\n# draw visualization of parameter effects\n\nplt.figure(figsize=(8, 6))\nxx, yy = np.meshgrid(np.linspace(-3, 3, 200), np.linspace(-3, 3, 200))\nfor (k, (C, gamma, clf)) in enumerate(classifiers):\n # evaluate decision function in a grid\n Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])\n Z = Z.reshape(xx.shape)\n\n # visualize decision function for these parameters\n plt.subplot(len(C_2d_range), len(gamma_2d_range), k + 1)\n plt.title(\"gamma=10^%d, C=10^%d\" % (np.log10(gamma), np.log10(C)), size=\"medium\")\n\n # visualize parameter's effect on decision function\n plt.pcolormesh(xx, yy, -Z, cmap=plt.cm.RdBu)\n plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y_2d, cmap=plt.cm.RdBu_r, edgecolors=\"k\")\n plt.xticks(())\n plt.yticks(())\n plt.axis(\"tight\")\n\nscores = grid.cv_results_[\"mean_test_score\"].reshape(len(C_range), len(gamma_range))\n\n# Draw heatmap of the validation accuracy as a function of gamma and C\n#\n# The score are encoded as colors with the hot colormap which varies from dark\n# red to bright yellow. As the most interesting scores are all located in the\n# 0.92 to 0.97 range we use a custom normalizer to set the mid-point to 0.92 so\n# as to make it easier to visualize the small variations of score values in the\n# interesting range while not brutally collapsing all the low score values to\n# the same color.\n\nplt.figure(figsize=(8, 6))\nplt.subplots_adjust(left=0.2, right=0.95, bottom=0.15, top=0.95)\nplt.imshow(\n scores,\n interpolation=\"nearest\",\n cmap=plt.cm.hot,\n norm=MidpointNormalize(vmin=0.2, midpoint=0.92),\n)\nplt.xlabel(\"gamma\")\nplt.ylabel(\"C\")\nplt.colorbar()\nplt.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)\nplt.yticks(np.arange(len(C_range)), C_range)\nplt.title(\"Validation accuracy\")\nplt.show()"
162+
"plt.figure(figsize=(8, 6))\nplt.subplots_adjust(left=0.2, right=0.95, bottom=0.15, top=0.95)\nplt.imshow(\n scores,\n interpolation=\"nearest\",\n cmap=plt.cm.hot,\n norm=MidpointNormalize(vmin=0.2, midpoint=0.92),\n)\nplt.xlabel(\"gamma\")\nplt.ylabel(\"C\")\nplt.colorbar()\nplt.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)\nplt.yticks(np.arange(len(C_range)), C_range)\nplt.title(\"Validation accuracy\")\nplt.show()"
30163
]
31164
}
32165
],

dev/_downloads/ea8b449d4699d078ef9cc5cded54cc67/plot_rbf_parameters.py

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -75,20 +75,13 @@
7575
7676
"""
7777

78+
# %%
79+
# Utility class to move the midpoint of a colormap to be around
80+
# the values of interest.
81+
7882
import numpy as np
79-
import matplotlib.pyplot as plt
8083
from matplotlib.colors import Normalize
8184

82-
from sklearn.svm import SVC
83-
from sklearn.preprocessing import StandardScaler
84-
from sklearn.datasets import load_iris
85-
from sklearn.model_selection import StratifiedShuffleSplit
86-
from sklearn.model_selection import GridSearchCV
87-
88-
89-
# Utility function to move the midpoint of a colormap to be around
90-
# the values of interest.
91-
9285

9386
class MidpointNormalize(Normalize):
9487
def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
@@ -100,16 +93,19 @@ def __call__(self, value, clip=None):
10093
return np.ma.masked_array(np.interp(value, x, y))
10194

10295

103-
# #############################################################################
96+
# %%
10497
# Load and prepare data set
98+
# -------------------------
10599
#
106100
# dataset for grid search
107101

102+
from sklearn.datasets import load_iris
108103

109104
iris = load_iris()
110105
X = iris.data
111106
y = iris.target
112107

108+
# %%
113109
# Dataset for decision function visualization: we only keep the first two
114110
# features in X and sub-sample the dataset to keep only 2 classes and
115111
# make it a binary classification problem.
@@ -119,22 +115,30 @@ def __call__(self, value, clip=None):
119115
y_2d = y[y > 0]
120116
y_2d -= 1
121117

118+
# %%
122119
# It is usually a good idea to scale the data for SVM training.
123120
# We are cheating a bit in this example in scaling all of the data,
124121
# instead of fitting the transformation on the training set and
125122
# just applying it on the test set.
126123

124+
from sklearn.preprocessing import StandardScaler
125+
127126
scaler = StandardScaler()
128127
X = scaler.fit_transform(X)
129128
X_2d = scaler.fit_transform(X_2d)
130129

131-
# #############################################################################
130+
# %%
132131
# Train classifiers
132+
# -----------------
133133
#
134134
# For an initial search, a logarithmic grid with basis
135135
# 10 is often helpful. Using a basis of 2, a finer
136136
# tuning can be achieved but at a much higher cost.
137137

138+
from sklearn.svm import SVC
139+
from sklearn.model_selection import StratifiedShuffleSplit
140+
from sklearn.model_selection import GridSearchCV
141+
138142
C_range = np.logspace(-2, 10, 13)
139143
gamma_range = np.logspace(-9, 3, 13)
140144
param_grid = dict(gamma=gamma_range, C=C_range)
@@ -147,6 +151,7 @@ def __call__(self, value, clip=None):
147151
% (grid.best_params_, grid.best_score_)
148152
)
149153

154+
# %%
150155
# Now we need to fit a classifier for all parameters in the 2d version
151156
# (we use a smaller set of parameters here because it takes a while to train)
152157

@@ -159,11 +164,14 @@ def __call__(self, value, clip=None):
159164
clf.fit(X_2d, y_2d)
160165
classifiers.append((C, gamma, clf))
161166

162-
# #############################################################################
167+
# %%
163168
# Visualization
169+
# -------------
164170
#
165171
# draw visualization of parameter effects
166172

173+
import matplotlib.pyplot as plt
174+
167175
plt.figure(figsize=(8, 6))
168176
xx, yy = np.meshgrid(np.linspace(-3, 3, 200), np.linspace(-3, 3, 200))
169177
for (k, (C, gamma, clf)) in enumerate(classifiers):
@@ -184,6 +192,7 @@ def __call__(self, value, clip=None):
184192

185193
scores = grid.cv_results_["mean_test_score"].reshape(len(C_range), len(gamma_range))
186194

195+
# %%
187196
# Draw heatmap of the validation accuracy as a function of gamma and C
188197
#
189198
# The score are encoded as colors with the hot colormap which varies from dark

dev/_downloads/scikit-learn-docs.zip

9.1 KB
Binary file not shown.
27 Bytes
-169 Bytes
-106 Bytes
-286 Bytes
-14 Bytes

0 commit comments

Comments
 (0)