Skip to content

Commit f1a3fbe

Browse files
committed
Pushing the docs to dev/ for branch: master, commit d873ed152aee501fafed5766014b88ee145ce02b
1 parent 1910722 commit f1a3fbe

File tree

1,208 files changed

+3677
-3666
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,208 files changed

+3677
-3666
lines changed

dev/_downloads/285b194a4740110cb23e241031123972/plot_johnson_lindenstrauss_bound.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
"cell_type": "markdown",
3434
"metadata": {},
3535
"source": [
36-
"Theoretical bounds\n==================\nThe distortion introduced by a random projection `p` is asserted by\nthe fact that `p` is defining an eps-embedding with good probability\nas defined by:\n\n\\begin{align}(1 - eps) \\|u - v\\|^2 < \\|p(u) - p(v)\\|^2 < (1 + eps) \\|u - v\\|^2\\end{align}\n\nWhere u and v are any rows taken from a dataset of shape [n_samples,\nn_features] and p is a projection by a random Gaussian N(0, 1) matrix\nwith shape [n_components, n_features] (or a sparse Achlioptas matrix).\n\nThe minimum number of components to guarantees the eps-embedding is\ngiven by:\n\n\\begin{align}n\\_components >= 4 log(n\\_samples) / (eps^2 / 2 - eps^3 / 3)\\end{align}\n\n\nThe first plot shows that with an increasing number of samples ``n_samples``,\nthe minimal number of dimensions ``n_components`` increased logarithmically\nin order to guarantee an ``eps``-embedding.\n\n"
36+
"Theoretical bounds\n==================\nThe distortion introduced by a random projection `p` is asserted by\nthe fact that `p` is defining an eps-embedding with good probability\nas defined by:\n\n\\begin{align}(1 - eps) \\|u - v\\|^2 < \\|p(u) - p(v)\\|^2 < (1 + eps) \\|u - v\\|^2\\end{align}\n\nWhere u and v are any rows taken from a dataset of shape (n_samples,\nn_features) and p is a projection by a random Gaussian N(0, 1) matrix\nof shape (n_components, n_features) (or a sparse Achlioptas matrix).\n\nThe minimum number of components to guarantees the eps-embedding is\ngiven by:\n\n\\begin{align}n\\_components \\geq 4 log(n\\_samples) / (eps^2 / 2 - eps^3 / 3)\\end{align}\n\n\nThe first plot shows that with an increasing number of samples ``n_samples``,\nthe minimal number of dimensions ``n_components`` increased logarithmically\nin order to guarantee an ``eps``-embedding.\n\n"
3737
]
3838
},
3939
{
Binary file not shown.

dev/_downloads/57163227aeb4c19ca4c69b87a8d1949c/plot_learning_curve.py

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -34,27 +34,28 @@ def plot_learning_curve(estimator, title, X, y, axes=None, ylim=None, cv=None,
3434
3535
Parameters
3636
----------
37-
estimator : object type that implements the "fit" and "predict" methods
38-
An object of that type which is cloned for each validation.
37+
estimator : estimator instance
38+
An estimator instance implementing `fit` and `predict` methods which
39+
will be cloned for each validation.
3940
40-
title : string
41+
title : str
4142
Title for the chart.
4243
43-
X : array-like, shape (n_samples, n_features)
44-
Training vector, where n_samples is the number of samples and
45-
n_features is the number of features.
44+
X : array-like of shape (n_samples, n_features)
45+
Training vector, where ``n_samples`` is the number of samples and
46+
``n_features`` is the number of features.
4647
47-
y : array-like, shape (n_samples) or (n_samples, n_features), optional
48-
Target relative to X for classification or regression;
48+
y : array-like of shape (n_samples) or (n_samples, n_features)
49+
Target relative to ``X`` for classification or regression;
4950
None for unsupervised learning.
5051
51-
axes : array of 3 axes, optional (default=None)
52+
axes : array-like of shape (3,), default=None
5253
Axes to use for plotting the curves.
5354
54-
ylim : tuple, shape (ymin, ymax), optional
55-
Defines minimum and maximum yvalues plotted.
55+
ylim : tuple of shape (2,), default=None
56+
Defines minimum and maximum y-values plotted, e.g. (ymin, ymax).
5657
57-
cv : int, cross-validation generator or an iterable, optional
58+
cv : int, cross-validation generator or an iterable, default=None
5859
Determines the cross-validation splitting strategy.
5960
Possible inputs for cv are:
6061
@@ -70,20 +71,20 @@ def plot_learning_curve(estimator, title, X, y, axes=None, ylim=None, cv=None,
7071
Refer :ref:`User Guide <cross_validation>` for the various
7172
cross-validators that can be used here.
7273
73-
n_jobs : int or None, optional (default=None)
74+
n_jobs : int or None, default=None
7475
Number of jobs to run in parallel.
7576
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
7677
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
7778
for more details.
7879
79-
train_sizes : array-like, shape (n_ticks,), dtype float or int
80+
train_sizes : array-like of shape (n_ticks,), dtype={int, float}
8081
Relative or absolute numbers of training examples that will be used to
81-
generate the learning curve. If the dtype is float, it is regarded as a
82-
fraction of the maximum size of the training set (that is determined
83-
by the selected validation method), i.e. it has to be within (0, 1].
84-
Otherwise it is interpreted as absolute sizes of the training sets.
85-
Note that for classification the number of samples usually have to
86-
be big enough to contain at least one sample from each class.
82+
generate the learning curve. If the ``dtype`` is float, it is regarded
83+
as a fraction of the maximum size of the training set (that is
84+
determined by the selected validation method), i.e. it has to be within
85+
(0, 1]. Otherwise it is interpreted as absolute sizes of the training
86+
sets. Note that for classification the number of samples usually have
87+
to be big enough to contain at least one sample from each class.
8788
(default: np.linspace(0.1, 1.0, 5))
8889
"""
8990
if axes is None:

dev/_downloads/ca0bfe2435d9b3fffe21c713e63d3a6f/plot_learning_curve.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
},
2727
"outputs": [],
2828
"source": [
29-
"print(__doc__)\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.naive_bayes import GaussianNB\nfrom sklearn.svm import SVC\nfrom sklearn.datasets import load_digits\nfrom sklearn.model_selection import learning_curve\nfrom sklearn.model_selection import ShuffleSplit\n\n\ndef plot_learning_curve(estimator, title, X, y, axes=None, ylim=None, cv=None,\n n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)):\n \"\"\"\n Generate 3 plots: the test and training learning curve, the training\n samples vs fit times curve, the fit times vs score curve.\n\n Parameters\n ----------\n estimator : object type that implements the \"fit\" and \"predict\" methods\n An object of that type which is cloned for each validation.\n\n title : string\n Title for the chart.\n\n X : array-like, shape (n_samples, n_features)\n Training vector, where n_samples is the number of samples and\n n_features is the number of features.\n\n y : array-like, shape (n_samples) or (n_samples, n_features), optional\n Target relative to X for classification or regression;\n None for unsupervised learning.\n\n axes : array of 3 axes, optional (default=None)\n Axes to use for plotting the curves.\n\n ylim : tuple, shape (ymin, ymax), optional\n Defines minimum and maximum yvalues plotted.\n\n cv : int, cross-validation generator or an iterable, optional\n Determines the cross-validation splitting strategy.\n Possible inputs for cv are:\n\n - None, to use the default 5-fold cross-validation,\n - integer, to specify the number of folds.\n - :term:`CV splitter`,\n - An iterable yielding (train, test) splits as arrays of indices.\n\n For integer/None inputs, if ``y`` is binary or multiclass,\n :class:`StratifiedKFold` used. If the estimator is not a classifier\n or if ``y`` is neither binary nor multiclass, :class:`KFold` is used.\n\n Refer :ref:`User Guide <cross_validation>` for the various\n cross-validators that can be used here.\n\n n_jobs : int or None, optional (default=None)\n Number of jobs to run in parallel.\n ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.\n ``-1`` means using all processors. See :term:`Glossary <n_jobs>`\n for more details.\n\n train_sizes : array-like, shape (n_ticks,), dtype float or int\n Relative or absolute numbers of training examples that will be used to\n generate the learning curve. If the dtype is float, it is regarded as a\n fraction of the maximum size of the training set (that is determined\n by the selected validation method), i.e. it has to be within (0, 1].\n Otherwise it is interpreted as absolute sizes of the training sets.\n Note that for classification the number of samples usually have to\n be big enough to contain at least one sample from each class.\n (default: np.linspace(0.1, 1.0, 5))\n \"\"\"\n if axes is None:\n _, axes = plt.subplots(1, 3, figsize=(20, 5))\n\n axes[0].set_title(title)\n if ylim is not None:\n axes[0].set_ylim(*ylim)\n axes[0].set_xlabel(\"Training examples\")\n axes[0].set_ylabel(\"Score\")\n\n train_sizes, train_scores, test_scores, fit_times, _ = \\\n learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs,\n train_sizes=train_sizes,\n return_times=True)\n train_scores_mean = np.mean(train_scores, axis=1)\n train_scores_std = np.std(train_scores, axis=1)\n test_scores_mean = np.mean(test_scores, axis=1)\n test_scores_std = np.std(test_scores, axis=1)\n fit_times_mean = np.mean(fit_times, axis=1)\n fit_times_std = np.std(fit_times, axis=1)\n\n # Plot learning curve\n axes[0].grid()\n axes[0].fill_between(train_sizes, train_scores_mean - train_scores_std,\n train_scores_mean + train_scores_std, alpha=0.1,\n color=\"r\")\n axes[0].fill_between(train_sizes, test_scores_mean - test_scores_std,\n test_scores_mean + test_scores_std, alpha=0.1,\n color=\"g\")\n axes[0].plot(train_sizes, train_scores_mean, 'o-', color=\"r\",\n label=\"Training score\")\n axes[0].plot(train_sizes, test_scores_mean, 'o-', color=\"g\",\n label=\"Cross-validation score\")\n axes[0].legend(loc=\"best\")\n\n # Plot n_samples vs fit_times\n axes[1].grid()\n axes[1].plot(train_sizes, fit_times_mean, 'o-')\n axes[1].fill_between(train_sizes, fit_times_mean - fit_times_std,\n fit_times_mean + fit_times_std, alpha=0.1)\n axes[1].set_xlabel(\"Training examples\")\n axes[1].set_ylabel(\"fit_times\")\n axes[1].set_title(\"Scalability of the model\")\n\n # Plot fit_time vs score\n axes[2].grid()\n axes[2].plot(fit_times_mean, test_scores_mean, 'o-')\n axes[2].fill_between(fit_times_mean, test_scores_mean - test_scores_std,\n test_scores_mean + test_scores_std, alpha=0.1)\n axes[2].set_xlabel(\"fit_times\")\n axes[2].set_ylabel(\"Score\")\n axes[2].set_title(\"Performance of the model\")\n\n return plt\n\n\nfig, axes = plt.subplots(3, 2, figsize=(10, 15))\n\nX, y = load_digits(return_X_y=True)\n\ntitle = \"Learning Curves (Naive Bayes)\"\n# Cross validation with 100 iterations to get smoother mean test and train\n# score curves, each time with 20% data randomly selected as a validation set.\ncv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0)\n\nestimator = GaussianNB()\nplot_learning_curve(estimator, title, X, y, axes=axes[:, 0], ylim=(0.7, 1.01),\n cv=cv, n_jobs=4)\n\ntitle = r\"Learning Curves (SVM, RBF kernel, $\\gamma=0.001$)\"\n# SVC is more expensive so we do a lower number of CV iterations:\ncv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)\nestimator = SVC(gamma=0.001)\nplot_learning_curve(estimator, title, X, y, axes=axes[:, 1], ylim=(0.7, 1.01),\n cv=cv, n_jobs=4)\n\nplt.show()"
29+
"print(__doc__)\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.naive_bayes import GaussianNB\nfrom sklearn.svm import SVC\nfrom sklearn.datasets import load_digits\nfrom sklearn.model_selection import learning_curve\nfrom sklearn.model_selection import ShuffleSplit\n\n\ndef plot_learning_curve(estimator, title, X, y, axes=None, ylim=None, cv=None,\n n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)):\n \"\"\"\n Generate 3 plots: the test and training learning curve, the training\n samples vs fit times curve, the fit times vs score curve.\n\n Parameters\n ----------\n estimator : estimator instance\n An estimator instance implementing `fit` and `predict` methods which\n will be cloned for each validation.\n\n title : str\n Title for the chart.\n\n X : array-like of shape (n_samples, n_features)\n Training vector, where ``n_samples`` is the number of samples and\n ``n_features`` is the number of features.\n\n y : array-like of shape (n_samples) or (n_samples, n_features)\n Target relative to ``X`` for classification or regression;\n None for unsupervised learning.\n\n axes : array-like of shape (3,), default=None\n Axes to use for plotting the curves.\n\n ylim : tuple of shape (2,), default=None\n Defines minimum and maximum y-values plotted, e.g. (ymin, ymax).\n\n cv : int, cross-validation generator or an iterable, default=None\n Determines the cross-validation splitting strategy.\n Possible inputs for cv are:\n\n - None, to use the default 5-fold cross-validation,\n - integer, to specify the number of folds.\n - :term:`CV splitter`,\n - An iterable yielding (train, test) splits as arrays of indices.\n\n For integer/None inputs, if ``y`` is binary or multiclass,\n :class:`StratifiedKFold` used. If the estimator is not a classifier\n or if ``y`` is neither binary nor multiclass, :class:`KFold` is used.\n\n Refer :ref:`User Guide <cross_validation>` for the various\n cross-validators that can be used here.\n\n n_jobs : int or None, default=None\n Number of jobs to run in parallel.\n ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.\n ``-1`` means using all processors. See :term:`Glossary <n_jobs>`\n for more details.\n\n train_sizes : array-like of shape (n_ticks,), dtype={int, float}\n Relative or absolute numbers of training examples that will be used to\n generate the learning curve. If the ``dtype`` is float, it is regarded\n as a fraction of the maximum size of the training set (that is\n determined by the selected validation method), i.e. it has to be within\n (0, 1]. Otherwise it is interpreted as absolute sizes of the training\n sets. Note that for classification the number of samples usually have\n to be big enough to contain at least one sample from each class.\n (default: np.linspace(0.1, 1.0, 5))\n \"\"\"\n if axes is None:\n _, axes = plt.subplots(1, 3, figsize=(20, 5))\n\n axes[0].set_title(title)\n if ylim is not None:\n axes[0].set_ylim(*ylim)\n axes[0].set_xlabel(\"Training examples\")\n axes[0].set_ylabel(\"Score\")\n\n train_sizes, train_scores, test_scores, fit_times, _ = \\\n learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs,\n train_sizes=train_sizes,\n return_times=True)\n train_scores_mean = np.mean(train_scores, axis=1)\n train_scores_std = np.std(train_scores, axis=1)\n test_scores_mean = np.mean(test_scores, axis=1)\n test_scores_std = np.std(test_scores, axis=1)\n fit_times_mean = np.mean(fit_times, axis=1)\n fit_times_std = np.std(fit_times, axis=1)\n\n # Plot learning curve\n axes[0].grid()\n axes[0].fill_between(train_sizes, train_scores_mean - train_scores_std,\n train_scores_mean + train_scores_std, alpha=0.1,\n color=\"r\")\n axes[0].fill_between(train_sizes, test_scores_mean - test_scores_std,\n test_scores_mean + test_scores_std, alpha=0.1,\n color=\"g\")\n axes[0].plot(train_sizes, train_scores_mean, 'o-', color=\"r\",\n label=\"Training score\")\n axes[0].plot(train_sizes, test_scores_mean, 'o-', color=\"g\",\n label=\"Cross-validation score\")\n axes[0].legend(loc=\"best\")\n\n # Plot n_samples vs fit_times\n axes[1].grid()\n axes[1].plot(train_sizes, fit_times_mean, 'o-')\n axes[1].fill_between(train_sizes, fit_times_mean - fit_times_std,\n fit_times_mean + fit_times_std, alpha=0.1)\n axes[1].set_xlabel(\"Training examples\")\n axes[1].set_ylabel(\"fit_times\")\n axes[1].set_title(\"Scalability of the model\")\n\n # Plot fit_time vs score\n axes[2].grid()\n axes[2].plot(fit_times_mean, test_scores_mean, 'o-')\n axes[2].fill_between(fit_times_mean, test_scores_mean - test_scores_std,\n test_scores_mean + test_scores_std, alpha=0.1)\n axes[2].set_xlabel(\"fit_times\")\n axes[2].set_ylabel(\"Score\")\n axes[2].set_title(\"Performance of the model\")\n\n return plt\n\n\nfig, axes = plt.subplots(3, 2, figsize=(10, 15))\n\nX, y = load_digits(return_X_y=True)\n\ntitle = \"Learning Curves (Naive Bayes)\"\n# Cross validation with 100 iterations to get smoother mean test and train\n# score curves, each time with 20% data randomly selected as a validation set.\ncv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0)\n\nestimator = GaussianNB()\nplot_learning_curve(estimator, title, X, y, axes=axes[:, 0], ylim=(0.7, 1.01),\n cv=cv, n_jobs=4)\n\ntitle = r\"Learning Curves (SVM, RBF kernel, $\\gamma=0.001$)\"\n# SVC is more expensive so we do a lower number of CV iterations:\ncv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)\nestimator = SVC(gamma=0.001)\nplot_learning_curve(estimator, title, X, y, axes=axes[:, 1], ylim=(0.7, 1.01),\n cv=cv, n_jobs=4)\n\nplt.show()"
3030
]
3131
}
3232
],

dev/_downloads/cba66f803bb263f8032bc4d46368e20b/plot_johnson_lindenstrauss_bound.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,15 +42,15 @@
4242
# .. math::
4343
# (1 - eps) \|u - v\|^2 < \|p(u) - p(v)\|^2 < (1 + eps) \|u - v\|^2
4444
#
45-
# Where u and v are any rows taken from a dataset of shape [n_samples,
46-
# n_features] and p is a projection by a random Gaussian N(0, 1) matrix
47-
# with shape [n_components, n_features] (or a sparse Achlioptas matrix).
45+
# Where u and v are any rows taken from a dataset of shape (n_samples,
46+
# n_features) and p is a projection by a random Gaussian N(0, 1) matrix
47+
# of shape (n_components, n_features) (or a sparse Achlioptas matrix).
4848
#
4949
# The minimum number of components to guarantees the eps-embedding is
5050
# given by:
5151
#
5252
# .. math::
53-
# n\_components >= 4 log(n\_samples) / (eps^2 / 2 - eps^3 / 3)
53+
# n\_components \geq 4 log(n\_samples) / (eps^2 / 2 - eps^3 / 3)
5454
#
5555
#
5656
# The first plot shows that with an increasing number of samples ``n_samples``,
Binary file not shown.

dev/_downloads/scikit-learn-docs.pdf

-4.92 KB
Binary file not shown.

dev/_images/iris.png

0 Bytes
53 Bytes
53 Bytes

0 commit comments

Comments
 (0)