Skip to content

Commit 26d2a80

Browse files
committed
Pushing the docs to dev/ for branch: master, commit 76e82ba00a3418b19b90a59ec1c7c7f39c2dc26f
1 parent 9527fbe commit 26d2a80

File tree

1,234 files changed

+6460
-4747
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,234 files changed

+6460
-4747
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: ebb9ea68698aea15799d10fa0216399e
3+
config: 133687f2af5eb2f75adf185bdaa6b0c9
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"metadata": {
7+
"collapsed": false
8+
},
9+
"outputs": [],
10+
"source": [
11+
"%matplotlib inline"
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"\n# Release Highlights for scikit-learn 0.24\n\n.. currentmodule:: sklearn\n\nWe are pleased to announce the release of scikit-learn 0.24! Many bug fixes\nand improvements were added, as well as some new key features. We detail\nbelow a few of the major features of this release. **For an exhaustive list of\nall the changes**, please refer to the `release notes <changes_0_24>`.\n\nTo install the latest version (with pip)::\n\n pip install --upgrade scikit-learn\n\nor with conda::\n\n conda install -c conda-forge scikit-learn\n"
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {},
24+
"source": [
25+
"## Successive Halving estimators for tuning hyper-parameters\nSuccessive Halving, a state of the art method, is now available to\nexplore the space of the parameters and identify their best combination.\n:class:`~sklearn.model_selection.HalvingGridSearchCV` and\n:class:`~sklearn.model_selection.HalvingRandomSearchCV` can be\nused as drop-in replacement for\n:class:`~sklearn.model_selection.GridSearchCV` and\n:class:`~sklearn.model_selection.RandomizedSearchCV`.\nSuccessive Halving is an iterative selection process illustrated in the\nfigure below. The first iteration is run with a small amount of resources,\nwhere the resource typically corresponds to the number of training samples,\nbut can also be an arbitrary integer parameter such as `n_estimators` in a\nrandom forest. Only a subset of the parameter candidates are selected for the\nnext iteration, which will be run with an increasing amount of allocated\nresources. Only a subset of candidates will last until the end of the\niteration process, and the best parameter candidate is the one that has the\nhighest score on the last iteration.\n\nRead more in the `User Guide <successive_halving_user_guide>` (note:\nthe Successive Halving estimators are still :term:`experimental\n<experimental>`).\n\n.. figure:: ../model_selection/images/sphx_glr_plot_successive_halving_iterations_001.png\n :target: ../model_selection/plot_successive_halving_iterations.html\n :align: center\n\n"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {
32+
"collapsed": false
33+
},
34+
"outputs": [],
35+
"source": [
36+
"import numpy as np\nfrom scipy.stats import randint\nfrom sklearn.experimental import enable_halving_search_cv # noqa\nfrom sklearn.model_selection import HalvingRandomSearchCV\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.datasets import make_classification\n\nrng = np.random.RandomState(0)\n\nX, y = make_classification(n_samples=700, random_state=rng)\n\nclf = RandomForestClassifier(n_estimators=10, random_state=rng)\n\nparam_dist = {\"max_depth\": [3, None],\n \"max_features\": randint(1, 11),\n \"min_samples_split\": randint(2, 11),\n \"bootstrap\": [True, False],\n \"criterion\": [\"gini\", \"entropy\"]}\n\nrsh = HalvingRandomSearchCV(estimator=clf, param_distributions=param_dist,\n factor=2, random_state=rng)\nrsh.fit(X, y)\nrsh.best_params_"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"metadata": {},
42+
"source": [
43+
"## Native support for categorical features in HistGradientBoosting estimators\n:class:`~sklearn.ensemble.HistGradientBoostingClassifier` and\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` now have native\nsupport for categorical features: they can consider splits on non-ordered,\ncategorical data. Read more in the `User Guide\n<categorical_support_gbdt>`.\n\n.. figure:: ../ensemble/images/sphx_glr_plot_gradient_boosting_categorical_001.png\n :target: ../ensemble/plot_gradient_boosting_categorical.html\n :align: center\n\nThe plot shows that the new native support for categorical features leads to\nfitting times that are comparable to models where the categories are treated\nas ordered quantities, i.e. simply ordinal-encoded. Native support is also\nmore expressive than both one-hot encoding and ordinal encoding. However, to\nuse the new `categorical_features` parameter, it is still required to\npreprocess the data within a pipeline as demonstrated in this `example\n<sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py>`.\n\n"
44+
]
45+
},
46+
{
47+
"cell_type": "markdown",
48+
"metadata": {},
49+
"source": [
50+
"## Improved performances of HistGradientBoosting estimators\nThe memory footprint of :class:`ensemble.HistGradientBoostingRegressor` and\n:class:`ensemble.HistGradientBoostingClassifier` has been significantly\nimproved during calls to `fit`. In addition, histogram initialization is now\ndone in parallel which results in slight speed improvements.\nSee more in the `Benchmark page\n<https://scikit-learn.org/scikit-learn-benchmarks/>`_.\n\n"
51+
]
52+
},
53+
{
54+
"cell_type": "markdown",
55+
"metadata": {},
56+
"source": [
57+
"## New self-training meta-estimator\nA new self-training implementation, based on `Yarowski's algorithm\n<https://doi.org/10.3115/981658.981684>`_ can now be used with any\nclassifier that implements :term:`predict_proba`. The sub-classifier\nwill behave as a\nsemi-supervised classifier, allowing it to learn from unlabeled data.\nRead more in the `User guide <self_training>`.\n\n"
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": null,
63+
"metadata": {
64+
"collapsed": false
65+
},
66+
"outputs": [],
67+
"source": [
68+
"import numpy as np\nfrom sklearn import datasets\nfrom sklearn.semi_supervised import SelfTrainingClassifier\nfrom sklearn.svm import SVC\n\nrng = np.random.RandomState(42)\niris = datasets.load_iris()\nrandom_unlabeled_points = rng.rand(iris.target.shape[0]) < 0.3\niris.target[random_unlabeled_points] = -1\nsvc = SVC(probability=True, gamma=\"auto\")\nself_training_model = SelfTrainingClassifier(svc)\nself_training_model.fit(iris.data, iris.target)"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"metadata": {},
74+
"source": [
75+
"## New SequentialFeatureSelector transformer\nA new iterative transformer to select features is available:\n:class:`~sklearn.feature_selection.SequentialFeatureSelector`.\nSequential Feature Selection can add features one at a time (forward\nselection) or remove features from the list of the available features\n(backward selection), based on a cross-validated score maximization.\nSee the `User Guide <sequential_feature_selection>`.\n\n"
76+
]
77+
},
78+
{
79+
"cell_type": "code",
80+
"execution_count": null,
81+
"metadata": {
82+
"collapsed": false
83+
},
84+
"outputs": [],
85+
"source": [
86+
"from sklearn.feature_selection import SequentialFeatureSelector\nfrom sklearn.neighbors import KNeighborsClassifier\nfrom sklearn.datasets import load_iris\n\nX, y = load_iris(return_X_y=True, as_frame=True)\nfeature_names = X.columns\nknn = KNeighborsClassifier(n_neighbors=3)\nsfs = SequentialFeatureSelector(knn, n_features_to_select=2)\nsfs.fit(X, y)\nprint(\"Features selected by forward sequential selection: \"\n f\"{feature_names[sfs.get_support().tolist()]}\")"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"metadata": {},
92+
"source": [
93+
"## New PolynomialCountSketch kernel approximation function\nThe new :class:`~sklearn.kernel_approximation.PolynomialCountSketch`\napproximates a polynomial expansion of a feature space when used with linear\nmodels, but uses much less memory than\n:class:`~sklearn.preprocessing.PolynomialFeatures`.\n\n"
94+
]
95+
},
96+
{
97+
"cell_type": "code",
98+
"execution_count": null,
99+
"metadata": {
100+
"collapsed": false
101+
},
102+
"outputs": [],
103+
"source": [
104+
"from sklearn.datasets import fetch_covtype\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import MinMaxScaler\nfrom sklearn.kernel_approximation import PolynomialCountSketch\nfrom sklearn.linear_model import LogisticRegression\n\nX, y = fetch_covtype(return_X_y=True)\npipe = make_pipeline(MinMaxScaler(),\n PolynomialCountSketch(degree=2, n_components=300),\n LogisticRegression(max_iter=1000))\nX_train, X_test, y_train, y_test = train_test_split(X, y, train_size=5000,\n test_size=10000,\n random_state=42)\npipe.fit(X_train, y_train).score(X_test, y_test)\n\n# ##############################################################################\n# # For comparison, here is the score of a linear baseline for the same data:\n\nlinear_baseline = make_pipeline(MinMaxScaler(),\n LogisticRegression(max_iter=1000))\nlinear_baseline.fit(X_train, y_train).score(X_test, y_test)"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"## Individual Conditional Expectation plots\nA new kind of partial dependence plot is available: the Individual\nConditional Expectation (ICE) plot. ICE plots visualize the dependence of the\nprediction on a feature for each sample separately, with one line per sample.\nSee the `User Guide <individual_conditional>`\n\n"
112+
]
113+
},
114+
{
115+
"cell_type": "code",
116+
"execution_count": null,
117+
"metadata": {
118+
"collapsed": false
119+
},
120+
"outputs": [],
121+
"source": [
122+
"from sklearn.ensemble import RandomForestRegressor\nfrom sklearn.datasets import fetch_california_housing\nfrom sklearn.inspection import plot_partial_dependence\n\nX, y = fetch_california_housing(return_X_y=True, as_frame=True)\nfeatures = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms']\nest = RandomForestRegressor(n_estimators=10)\nest.fit(X, y)\ndisplay = plot_partial_dependence(\n est, X, features, kind=\"individual\", subsample=50,\n n_jobs=3, grid_resolution=20, random_state=0\n)\ndisplay.figure_.suptitle(\n 'Partial dependence of house value on non-___location features\\n'\n 'for the California housing dataset, with BayesianRidge'\n)\ndisplay.figure_.subplots_adjust(hspace=0.3)"
123+
]
124+
},
125+
{
126+
"cell_type": "markdown",
127+
"metadata": {},
128+
"source": [
129+
"## New Poisson splitting criterion for DecisionTreeRegressor\nThe integration of Poisson regression estimation continues from version 0.23.\n:class:`~sklearn.tree.DecisionTreeRegressor` now supports a new `'poisson'`\nsplitting criterion. Setting `criterion=\"poisson\"` might be a good choice\nif your target is a count or a frequency.\n\n"
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": null,
135+
"metadata": {
136+
"collapsed": false
137+
},
138+
"outputs": [],
139+
"source": [
140+
"from sklearn.tree import DecisionTreeRegressor\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\n\nn_samples, n_features = 1000, 20\nrng = np.random.RandomState(0)\nX = rng.randn(n_samples, n_features)\n# positive integer target correlated with X[:, 5] with many zeros:\ny = rng.poisson(lam=np.exp(X[:, 5]) / 2)\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rng)\nregressor = DecisionTreeRegressor(criterion='poisson', random_state=0)\nregressor.fit(X_train, y_train)"
141+
]
142+
},
143+
{
144+
"cell_type": "markdown",
145+
"metadata": {},
146+
"source": [
147+
"## New documentation improvements\n\nNew examples and documentation pages have been added, in a continuous effort\nto improve the understanding of machine learning practices:\n\n- a new section about `common pitfalls and recommended\n practices <common_pitfalls>`,\n- an example illustrating how to `statistically compare the performance of\n models <sphx_glr_auto_examples_model_selection_plot_grid_search_stats.py>`\n evaluated using :class:`~sklearn.model_selection.GridSearchCV`,\n- an example on how to `interpret coefficients of linear models\n <sphx_glr_auto_examples_inspection_plot_linear_model_coefficient_interpretation.py>`,\n- an `example\n <sphx_glr_auto_examples_cross_decomposition_plot_pcr_vs_pls.py>`\n comparing Principal Component Regression and Partial Least Squares.\n\n"
148+
]
149+
}
150+
],
151+
"metadata": {
152+
"kernelspec": {
153+
"display_name": "Python 3",
154+
"language": "python",
155+
"name": "python3"
156+
},
157+
"language_info": {
158+
"codemirror_mode": {
159+
"name": "ipython",
160+
"version": 3
161+
},
162+
"file_extension": ".py",
163+
"mimetype": "text/x-python",
164+
"name": "python",
165+
"nbconvert_exporter": "python",
166+
"pygments_lexer": "ipython3",
167+
"version": "3.8.5"
168+
}
169+
},
170+
"nbformat": 4,
171+
"nbformat_minor": 0
172+
}
Binary file not shown.

0 commit comments

Comments
 (0)