codeur66
diff --git a/‎dev/_downloads/006fc185672e58b056a5c134db26935c/plot_coin_segmentation.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/006fc185672e58b056a5c134db26935c/plot_coin_segmentation.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/02f111fb3dd79805b161e14c564184fc/plot_sgd_comparison.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/02f111fb3dd79805b161e14c564184fc/plot_sgd_comparison.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/0486bf9e537e44cedd2a236d034bcd90/plot_pcr_vs_pls.ipynb
Lines changed: 3 additions & 3 deletions b/‎dev/_downloads/0486bf9e537e44cedd2a236d034bcd90/plot_pcr_vs_pls.ipynb
Lines changed: 3 additions & 3 deletions
diff --git a/‎dev/_downloads/055e8313e28f2f3b5fd508054dfe5fe0/plot_roc_crossval.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/055e8313e28f2f3b5fd508054dfe5fe0/plot_roc_crossval.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/05ca8a4e90b4cc2acd69f9e24b4a1f3a/plot_classifier_chain_yeast.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/05ca8a4e90b4cc2acd69f9e24b4a1f3a/plot_classifier_chain_yeast.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/061854726c268bcdae5cd1c330cf8c75/plot_sgd_penalties.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/061854726c268bcdae5cd1c330cf8c75/plot_sgd_penalties.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/067cd5d39b097d2c49dd98f563dac13a/plot_iterative_imputer_variants_comparison.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/067cd5d39b097d2c49dd98f563dac13a/plot_iterative_imputer_variants_comparison.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/06cfc926acb27652fb2aa5bfc583e7cb/plot_hashing_vs_dict_vectorizer.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/06cfc926acb27652fb2aa5bfc583e7cb/plot_hashing_vs_dict_vectorizer.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/06ffeb4f0ded6447302acd5a712f8490/plot_nearest_centroid.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/06ffeb4f0ded6447302acd5a712f8490/plot_nearest_centroid.ipynb
Lines changed: 1 addition & 1 deletion
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Segmenting the picture of greek coins in regions\n\n\nThis example uses `spectral_clustering` on a graph created from\nvoxel-to-voxel difference on an image to break this image into multiple\npartly-homogeneous regions.\n\nThis procedure (spectral clustering on an image) is an efficient\napproximate solution for finding normalized graph cuts.\n\nThere are two options to assign labels:\n\n* with 'kmeans' spectral clustering will cluster samples in the embedding space\n  using a kmeans algorithm\n* whereas 'discrete' will iteratively search for the closest partition\n  space to the embedding space.\n"
+        "\n# Segmenting the picture of greek coins in regions\n\nThis example uses `spectral_clustering` on a graph created from\nvoxel-to-voxel difference on an image to break this image into multiple\npartly-homogeneous regions.\n\nThis procedure (spectral clustering on an image) is an efficient\napproximate solution for finding normalized graph cuts.\n\nThere are two options to assign labels:\n\n* with 'kmeans' spectral clustering will cluster samples in the embedding space\n  using a kmeans algorithm\n* whereas 'discrete' will iteratively search for the closest partition\n  space to the embedding space.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Comparing various online solvers\n\n\nAn example showing how different online solvers perform\non the hand-written digits dataset.\n"
+        "\n# Comparing various online solvers\n\nAn example showing how different online solvers perform\non the hand-written digits dataset.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Principal Component Regression vs Partial Least Squares Regression\n\n\nThis example compares `Principal Component Regression\n<https://en.wikipedia.org/wiki/Principal_component_regression>`_ (PCR) and\n`Partial Least Squares Regression\n<https://en.wikipedia.org/wiki/Partial_least_squares_regression>`_ (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with *directions* that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
+        "\n# Principal Component Regression vs Partial Least Squares Regression\n\nThis example compares `Principal Component Regression\n<https://en.wikipedia.org/wiki/Principal_component_regression>`_ (PCR) and\n`Partial Least Squares Regression\n<https://en.wikipedia.org/wiki/Partial_least_squares_regression>`_ (PLS) on a\ntoy dataset. Our goal is to illustrate how PLS can outperform PCR when the\ntarget is strongly correlated with some directions in the data that have a\nlow variance.\n\nPCR is a regressor composed of two steps: first,\n:class:`~sklearn.decomposition.PCA` is applied to the training data, possibly\nperforming dimensionality reduction; then, a regressor (e.g. a linear\nregressor) is trained on the transformed samples. In\n:class:`~sklearn.decomposition.PCA`, the transformation is purely\nunsupervised, meaning that no information about the targets is used. As a\nresult, PCR may perform poorly in some datasets where the target is strongly\ncorrelated with *directions* that have low variance. Indeed, the\ndimensionality reduction of PCA projects the data into a lower dimensional\nspace where the variance of the projected data is greedily maximized along\neach axis. Despite them having the most predictive power on the target, the\ndirections with a lower variance will be dropped, and the final regressor\nwill not be able to leverage them.\n\nPLS is both a transformer and a regressor, and it is quite similar to PCR: it\nalso applies a dimensionality reduction to the samples before applying a\nlinear regressor to the transformed data. The main difference with PCR is\nthat the PLS transformation is supervised. Therefore, as we will see in this\nexample, it does not suffer from the issue we just mentioned.\n"
       ]
     },
     {
@@ -33,7 +33,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "The data\n--------\n\nWe start by creating a simple dataset with two features. Before we even dive\ninto PCR and PLS, we fit a PCA estimator to display the two principal\ncomponents of this dataset, i.e. the two directions that explain the most\nvariance in the data.\n\n"
+        "## The data\n\nWe start by creating a simple dataset with two features. Before we even dive\ninto PCR and PLS, we fit a PCA estimator to display the two principal\ncomponents of this dataset, i.e. the two directions that explain the most\nvariance in the data.\n\n"
       ]
     },
     {
@@ -69,7 +69,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Projection on one component and predictive power\n------------------------------------------------\n\nWe now create two regressors: PCR and PLS, and for our illustration purposes\nwe set the number of components to 1. Before feeding the data to the PCA step\nof PCR, we first standardize it, as recommended by good practice. The PLS\nestimator has built-in scaling capabilities.\n\nFor both models, we plot the projected data onto the first component against\nthe target. In both cases, this projected data is what the regressors will\nuse as training data.\n\n"
+        "## Projection on one component and predictive power\n\nWe now create two regressors: PCR and PLS, and for our illustration purposes\nwe set the number of components to 1. Before feeding the data to the PCA step\nof PCR, we first standardize it, as recommended by good practice. The PLS\nestimator has built-in scaling capabilities.\n\nFor both models, we plot the projected data onto the first component against\nthe target. In both cases, this projected data is what the regressors will\nuse as training data.\n\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n=============================================================\nReceiver Operating Characteristic (ROC) with cross validation\n=============================================================\n\nExample of Receiver Operating Characteristic (ROC) metric to evaluate\nclassifier output quality using cross-validation.\n\nROC curves typically feature true positive rate on the Y axis, and false\npositive rate on the X axis. This means that the top left corner of the plot is\nthe \"ideal\" point - a false positive rate of zero, and a true positive rate of\none. This is not very realistic, but it does mean that a larger area under the\ncurve (AUC) is usually better.\n\nThe \"steepness\" of ROC curves is also important, since it is ideal to maximize\nthe true positive rate while minimizing the false positive rate.\n\nThis example shows the ROC response of different datasets, created from K-fold\ncross-validation. Taking all of these curves, it is possible to calculate the\nmean area under curve, and see the variance of the curve when the\ntraining set is split into different subsets. This roughly shows how the\nclassifier output is affected by changes in the training data, and how\ndifferent the splits generated by K-fold cross-validation are from one another.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>See also :func:`sklearn.metrics.roc_auc_score`,\n             :func:`sklearn.model_selection.cross_val_score`,\n             `sphx_glr_auto_examples_model_selection_plot_roc.py`,</p></div>\n"
+        "\n# Receiver Operating Characteristic (ROC) with cross validation\n\nExample of Receiver Operating Characteristic (ROC) metric to evaluate\nclassifier output quality using cross-validation.\n\nROC curves typically feature true positive rate on the Y axis, and false\npositive rate on the X axis. This means that the top left corner of the plot is\nthe \"ideal\" point - a false positive rate of zero, and a true positive rate of\none. This is not very realistic, but it does mean that a larger area under the\ncurve (AUC) is usually better.\n\nThe \"steepness\" of ROC curves is also important, since it is ideal to maximize\nthe true positive rate while minimizing the false positive rate.\n\nThis example shows the ROC response of different datasets, created from K-fold\ncross-validation. Taking all of these curves, it is possible to calculate the\nmean area under curve, and see the variance of the curve when the\ntraining set is split into different subsets. This roughly shows how the\nclassifier output is affected by changes in the training data, and how\ndifferent the splits generated by K-fold cross-validation are from one another.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>See also :func:`sklearn.metrics.roc_auc_score`,\n             :func:`sklearn.model_selection.cross_val_score`,\n             `sphx_glr_auto_examples_model_selection_plot_roc.py`,</p></div>\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Classifier Chain\n\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<https://www.openml.org/d/40597>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_similarity_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
+        "\n# Classifier Chain\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<https://www.openml.org/d/40597>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_similarity_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n==============\nSGD: Penalties\n==============\n\nContours of where the penalty is equal to 1\nfor the three penalties L1, L2 and elastic-net.\n\nAll of the above are supported by :class:`~sklearn.linear_model.SGDClassifier`\nand :class:`~sklearn.linear_model.SGDRegressor`.\n"
+        "\n# SGD: Penalties\n\nContours of where the penalty is equal to 1\nfor the three penalties L1, L2 and elastic-net.\n\nAll of the above are supported by :class:`~sklearn.linear_model.SGDClassifier`\nand :class:`~sklearn.linear_model.SGDRegressor`.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Imputing missing values with variants of IterativeImputer\n\n\n.. currentmodule:: sklearn\n\nThe :class:`~impute.IterativeImputer` class is very flexible - it can be\nused with a variety of estimators to do round-robin regression, treating every\nvariable as an output in turn.\n\nIn this example we compare some estimators for the purpose of missing feature\nimputation with :class:`~impute.IterativeImputer`:\n\n* :class:`~linear_model.BayesianRidge`: regularized linear regression\n* :class:`~tree.DecisionTreeRegressor`: non-linear regression\n* :class:`~ensemble.ExtraTreesRegressor`: similar to missForest in R\n* :class:`~neighbors.KNeighborsRegressor`: comparable to other KNN\n  imputation approaches\n\nOf particular interest is the ability of\n:class:`~impute.IterativeImputer` to mimic the behavior of missForest, a\npopular imputation package for R. In this example, we have chosen to use\n:class:`~ensemble.ExtraTreesRegressor` instead of\n:class:`~ensemble.RandomForestRegressor` (as in missForest) due to its\nincreased speed.\n\nNote that :class:`~neighbors.KNeighborsRegressor` is different from KNN\nimputation, which learns from samples with missing values by using a distance\nmetric that accounts for missing values, rather than imputing them.\n\nThe goal is to compare different estimators to see which one is best for the\n:class:`~impute.IterativeImputer` when using a\n:class:`~linear_model.BayesianRidge` estimator on the California housing\ndataset with a single value randomly removed from each row.\n\nFor this particular pattern of missing values we see that\n:class:`~ensemble.ExtraTreesRegressor` and\n:class:`~linear_model.BayesianRidge` give the best results.\n"
+        "\n# Imputing missing values with variants of IterativeImputer\n\n.. currentmodule:: sklearn\n\nThe :class:`~impute.IterativeImputer` class is very flexible - it can be\nused with a variety of estimators to do round-robin regression, treating every\nvariable as an output in turn.\n\nIn this example we compare some estimators for the purpose of missing feature\nimputation with :class:`~impute.IterativeImputer`:\n\n* :class:`~linear_model.BayesianRidge`: regularized linear regression\n* :class:`~tree.DecisionTreeRegressor`: non-linear regression\n* :class:`~ensemble.ExtraTreesRegressor`: similar to missForest in R\n* :class:`~neighbors.KNeighborsRegressor`: comparable to other KNN\n  imputation approaches\n\nOf particular interest is the ability of\n:class:`~impute.IterativeImputer` to mimic the behavior of missForest, a\npopular imputation package for R. In this example, we have chosen to use\n:class:`~ensemble.ExtraTreesRegressor` instead of\n:class:`~ensemble.RandomForestRegressor` (as in missForest) due to its\nincreased speed.\n\nNote that :class:`~neighbors.KNeighborsRegressor` is different from KNN\nimputation, which learns from samples with missing values by using a distance\nmetric that accounts for missing values, rather than imputing them.\n\nThe goal is to compare different estimators to see which one is best for the\n:class:`~impute.IterativeImputer` when using a\n:class:`~linear_model.BayesianRidge` estimator on the California housing\ndataset with a single value randomly removed from each row.\n\nFor this particular pattern of missing values we see that\n:class:`~ensemble.ExtraTreesRegressor` and\n:class:`~linear_model.BayesianRidge` give the best results.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# FeatureHasher and DictVectorizer Comparison\n\n\nCompares FeatureHasher and DictVectorizer by using both to vectorize\ntext documents.\n\nThe example demonstrates syntax and speed only; it doesn't actually do\nanything useful with the extracted vectors. See the example scripts\n{document_classification_20newsgroups,clustering}.py for actual learning\non text documents.\n\nA discrepancy between the number of terms reported for DictVectorizer and\nfor FeatureHasher is to be expected due to hash collisions.\n"
+        "\n# FeatureHasher and DictVectorizer Comparison\n\nCompares FeatureHasher and DictVectorizer by using both to vectorize\ntext documents.\n\nThe example demonstrates syntax and speed only; it doesn't actually do\nanything useful with the extracted vectors. See the example scripts\n{document_classification_20newsgroups,clustering}.py for actual learning\non text documents.\n\nA discrepancy between the number of terms reported for DictVectorizer and\nfor FeatureHasher is to be expected due to hash collisions.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Nearest Centroid Classification\n\n\nSample usage of Nearest Centroid classification.\nIt will plot the decision boundaries for each class.\n"
+        "\n# Nearest Centroid Classification\n\nSample usage of Nearest Centroid classification.\nIt will plot the decision boundaries for each class.\n"
       ]
     },
     {
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`"cell_type": "markdown",`
`16`	`16`	`"metadata": {},`
`17`	`17`	`"source": [`
`18`		- "\n# Segmenting the picture of greek coins in regions\n\n\nThis example uses `spectral_clustering` on a graph created from\nvoxel-to-voxel difference on an image to break this image into multiple\npartly-homogeneous regions.\n\nThis procedure (spectral clustering on an image) is an efficient\napproximate solution for finding normalized graph cuts.\n\nThere are two options to assign labels:\n\n* with 'kmeans' spectral clustering will cluster samples in the embedding space\n using a kmeans algorithm\n* whereas 'discrete' will iteratively search for the closest partition\n space to the embedding space.\n"
	`18`	+ "\n# Segmenting the picture of greek coins in regions\n\nThis example uses `spectral_clustering` on a graph created from\nvoxel-to-voxel difference on an image to break this image into multiple\npartly-homogeneous regions.\n\nThis procedure (spectral clustering on an image) is an efficient\napproximate solution for finding normalized graph cuts.\n\nThere are two options to assign labels:\n\n* with 'kmeans' spectral clustering will cluster samples in the embedding space\n using a kmeans algorithm\n* whereas 'discrete' will iteratively search for the closest partition\n space to the embedding space.\n"
`19`	`19`	`]`
`20`	`20`	`},`
`21`	`21`	`{`