Skip to content

Commit 392ad42

Browse files
committed
Pushing the docs to dev/ for branch: master, commit 6bd8df099f37205864eab1759f77b6070577f6e7
1 parent 70eaae3 commit 392ad42

File tree

1,211 files changed

+3809
-3785
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,211 files changed

+3809
-3785
lines changed

dev/_downloads/0bb7c3d746d5ca362b6fa8a462f13098/plot_faces_decomposition.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Faces dataset decompositions
44
============================
55
6-
This example applies to :ref:`olivetti_faces` different unsupervised
6+
This example applies to :ref:`olivetti_faces_dataset` different unsupervised
77
matrix decomposition (dimension reduction) methods from the module
88
:py:mod:`sklearn.decomposition` (see the documentation chapter
99
:ref:`decompositions`) .
Binary file not shown.

dev/_downloads/9481da9ed1cbf016109715bdf7e79a6f/plot_permutation_importance.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
2121
.. topic:: References:
2222
23-
.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
23+
[1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
2424
2001. https://doi.org/10.1023/A:1010933404324
2525
"""
2626
print(__doc__)

dev/_downloads/9e4e8e1cf9e1bc7322177aeb4a2af787/plot_permutation_importance.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n================================================================\nPermutation Importance vs Random Forest Feature Importance (MDI)\n================================================================\n\nIn this example, we will compare the impurity-based feature importance of\n:class:`~sklearn.ensemble.RandomForestClassifier` with the\npermutation importance on the titanic dataset using\n:func:`~sklearn.inspection.permutation_importance`. We will show that the\nimpurity-based feature importance can inflate the importance of numerical\nfeatures.\n\nFurthermore, the impurity-based feature importance of random forests suffers\nfrom being computed on statistics derived from the training dataset: the\nimportances can be high even for features that are not predictive of the target\nvariable, as long as the model has the capacity to use them to overfit.\n\nThis example shows how to use Permutation Importances as an alternative that\ncan mitigate those limitations.\n\n.. topic:: References:\n\n .. [1] L. Breiman, \"Random Forests\", Machine Learning, 45(1), 5-32,\n 2001. https://doi.org/10.1023/A:1010933404324\n"
18+
"\n================================================================\nPermutation Importance vs Random Forest Feature Importance (MDI)\n================================================================\n\nIn this example, we will compare the impurity-based feature importance of\n:class:`~sklearn.ensemble.RandomForestClassifier` with the\npermutation importance on the titanic dataset using\n:func:`~sklearn.inspection.permutation_importance`. We will show that the\nimpurity-based feature importance can inflate the importance of numerical\nfeatures.\n\nFurthermore, the impurity-based feature importance of random forests suffers\nfrom being computed on statistics derived from the training dataset: the\nimportances can be high even for features that are not predictive of the target\nvariable, as long as the model has the capacity to use them to overfit.\n\nThis example shows how to use Permutation Importances as an alternative that\ncan mitigate those limitations.\n\n.. topic:: References:\n\n [1] L. Breiman, \"Random Forests\", Machine Learning, 45(1), 5-32,\n 2001. https://doi.org/10.1023/A:1010933404324\n"
1919
]
2020
},
2121
{
Binary file not shown.

dev/_downloads/da18402e0026ebbd327b461785c9bb39/plot_faces_decomposition.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Faces dataset decompositions\n\n\nThis example applies to `olivetti_faces` different unsupervised\nmatrix decomposition (dimension reduction) methods from the module\n:py:mod:`sklearn.decomposition` (see the documentation chapter\n`decompositions`) .\n"
18+
"\n# Faces dataset decompositions\n\n\nThis example applies to `olivetti_faces_dataset` different unsupervised\nmatrix decomposition (dimension reduction) methods from the module\n:py:mod:`sklearn.decomposition` (see the documentation chapter\n`decompositions`) .\n"
1919
]
2020
},
2121
{

dev/_downloads/f4a723cdf97babef6136bfcb0f7491ca/plot_classifier_chain_yeast.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Classifier Chain\n\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<https://www.openml.org/d/40597>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
18+
"\n# Classifier Chain\n\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<https://www.openml.org/d/40597>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_similarity_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n"
1919
]
2020
},
2121
{

dev/_downloads/fcc43934ce8cf7dccbc140cdf13c05d3/plot_classifier_chain_yeast.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
data point has at least one label. As a baseline we first train a logistic
1111
regression classifier for each of the 14 labels. To evaluate the performance of
1212
these classifiers we predict on a held-out test set and calculate the
13-
:ref:`jaccard score <jaccard_score>` for each sample.
13+
:ref:`jaccard score <jaccard_similarity_score>` for each sample.
1414
1515
Next we create 10 classifier chains. Each classifier chain contains a
1616
logistic regression model for each of the 14 labels. The models in each

dev/_downloads/scikit-learn-docs.pdf

-5.99 KB
Binary file not shown.

dev/_images/iris.png

0 Bytes

0 commit comments

Comments
 (0)