You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"\n# Release Highlights for scikit-learn 0.24\n\n.. currentmodule:: sklearn\n\nWe are pleased to announce the release of scikit-learn 0.24! Many bug fixes\nand improvements were added, as well as some new key features. We detail\nbelow a few of the major features of this release. **For an exhaustive list of\nall the changes**, please refer to the `release notes <changes_0_24>`.\n\nTo install the latest version (with pip)::\n\n pip install --upgrade scikit-learn\n\nor with conda::\n\n conda install -c conda-forge scikit-learn\n"
19
+
]
20
+
},
21
+
{
22
+
"cell_type": "markdown",
23
+
"metadata": {},
24
+
"source": [
25
+
"## Successive Halving estimators for tuning hyper-parameters\nSuccessive Halving, a state of the art method, is now available to\nexplore the space of the parameters and identify their best combination.\n:class:`~sklearn.model_selection.HalvingGridSearchCV` and\n:class:`~sklearn.model_selection.HalvingRandomSearchCV` can be\nused as drop-in replacement for\n:class:`~sklearn.model_selection.GridSearchCV` and\n:class:`~sklearn.model_selection.RandomizedSearchCV`.\nSuccessive Halving is an iterative selection process illustrated in the\nfigure below. The first iteration is run with a small amount of resources,\nwhere the resource typically corresponds to the number of training samples,\nbut can also be an arbitrary integer parameter such as `n_estimators` in a\nrandom forest. Only a subset of the parameter candidates are selected for the\nnext iteration, which will be run with an increasing amount of allocated\nresources. Only a subset of candidates will last until the end of the\niteration process, and the best parameter candidate is the one that has the\nhighest score on the last iteration.\n\nRead more in the `User Guide <successive_halving_user_guide>` (note:\nthe Successive Halving estimators are still :term:`experimental\n<experimental>`).\n\n.. figure:: ../model_selection/images/sphx_glr_plot_successive_halving_iterations_001.png\n :target: ../model_selection/plot_successive_halving_iterations.html\n :align: center\n\n"
"## Native support for categorical features in HistGradientBoosting estimators\n:class:`~sklearn.ensemble.HistGradientBoostingClassifier` and\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` now have native\nsupport for categorical features: they can consider splits on non-ordered,\ncategorical data. Read more in the `User Guide\n<categorical_support_gbdt>`.\n\n.. figure:: ../ensemble/images/sphx_glr_plot_gradient_boosting_categorical_001.png\n :target: ../ensemble/plot_gradient_boosting_categorical.html\n :align: center\n\nThe plot shows that the new native support for categorical features leads to\nfitting times that are comparable to models where the categories are treated\nas ordered quantities, i.e. simply ordinal-encoded. Native support is also\nmore expressive than both one-hot encoding and ordinal encoding. However, to\nuse the new `categorical_features` parameter, it is still required to\npreprocess the data within a pipeline as demonstrated in this `example\n<sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py>`.\n\n"
44
+
]
45
+
},
46
+
{
47
+
"cell_type": "markdown",
48
+
"metadata": {},
49
+
"source": [
50
+
"## Improved performances of HistGradientBoosting estimators\nThe memory footprint of :class:`ensemble.HistGradientBoostingRegressor` and\n:class:`ensemble.HistGradientBoostingClassifier` has been significantly\nimproved during calls to `fit`. In addition, histogram initialization is now\ndone in parallel which results in slight speed improvements.\nSee more in the `Benchmark page\n<https://scikit-learn.org/scikit-learn-benchmarks/>`_.\n\n"
51
+
]
52
+
},
53
+
{
54
+
"cell_type": "markdown",
55
+
"metadata": {},
56
+
"source": [
57
+
"## New self-training meta-estimator\nA new self-training implementation, based on `Yarowski's algorithm\n<https://doi.org/10.3115/981658.981684>`_ can now be used with any\nclassifier that implements :term:`predict_proba`. The sub-classifier\nwill behave as a\nsemi-supervised classifier, allowing it to learn from unlabeled data.\nRead more in the `User guide <self_training>`.\n\n"
"## New SequentialFeatureSelector transformer\nA new iterative transformer to select features is available:\n:class:`~sklearn.feature_selection.SequentialFeatureSelector`.\nSequential Feature Selection can add features one at a time (forward\nselection) or remove features from the list of the available features\n(backward selection), based on a cross-validated score maximization.\nSee the `User Guide <sequential_feature_selection>`.\n\n"
"## New PolynomialCountSketch kernel approximation function\nThe new :class:`~sklearn.kernel_approximation.PolynomialCountSketch`\napproximates a polynomial expansion of a feature space when used with linear\nmodels, but uses much less memory than\n:class:`~sklearn.preprocessing.PolynomialFeatures`.\n\n"
94
+
]
95
+
},
96
+
{
97
+
"cell_type": "code",
98
+
"execution_count": null,
99
+
"metadata": {
100
+
"collapsed": false
101
+
},
102
+
"outputs": [],
103
+
"source": [
104
+
"from sklearn.datasets import fetch_covtype\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import MinMaxScaler\nfrom sklearn.kernel_approximation import PolynomialCountSketch\nfrom sklearn.linear_model import LogisticRegression\n\nX, y = fetch_covtype(return_X_y=True)\npipe = make_pipeline(MinMaxScaler(),\n PolynomialCountSketch(degree=2, n_components=300),\n LogisticRegression(max_iter=1000))\nX_train, X_test, y_train, y_test = train_test_split(X, y, train_size=5000,\n test_size=10000,\n random_state=42)\npipe.fit(X_train, y_train).score(X_test, y_test)\n\n# ##############################################################################\n# # For comparison, here is the score of a linear baseline for the same data:\n\nlinear_baseline = make_pipeline(MinMaxScaler(),\n LogisticRegression(max_iter=1000))\nlinear_baseline.fit(X_train, y_train).score(X_test, y_test)"
105
+
]
106
+
},
107
+
{
108
+
"cell_type": "markdown",
109
+
"metadata": {},
110
+
"source": [
111
+
"## Individual Conditional Expectation plots\nA new kind of partial dependence plot is available: the Individual\nConditional Expectation (ICE) plot. ICE plots visualize the dependence of the\nprediction on a feature for each sample separately, with one line per sample.\nSee the `User Guide <individual_conditional>`\n\n"
112
+
]
113
+
},
114
+
{
115
+
"cell_type": "code",
116
+
"execution_count": null,
117
+
"metadata": {
118
+
"collapsed": false
119
+
},
120
+
"outputs": [],
121
+
"source": [
122
+
"from sklearn.ensemble import RandomForestRegressor\nfrom sklearn.datasets import fetch_california_housing\nfrom sklearn.inspection import plot_partial_dependence\n\nX, y = fetch_california_housing(return_X_y=True, as_frame=True)\nfeatures = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms']\nest = RandomForestRegressor(n_estimators=10)\nest.fit(X, y)\ndisplay = plot_partial_dependence(\n est, X, features, kind=\"individual\", subsample=50,\n n_jobs=3, grid_resolution=20, random_state=0\n)\ndisplay.figure_.suptitle(\n 'Partial dependence of house value on non-___location features\\n'\n 'for the California housing dataset, with BayesianRidge'\n)\ndisplay.figure_.subplots_adjust(hspace=0.3)"
123
+
]
124
+
},
125
+
{
126
+
"cell_type": "markdown",
127
+
"metadata": {},
128
+
"source": [
129
+
"## New Poisson splitting criterion for DecisionTreeRegressor\nThe integration of Poisson regression estimation continues from version 0.23.\n:class:`~sklearn.tree.DecisionTreeRegressor` now supports a new `'poisson'`\nsplitting criterion. Setting `criterion=\"poisson\"` might be a good choice\nif your target is a count or a frequency.\n\n"
130
+
]
131
+
},
132
+
{
133
+
"cell_type": "code",
134
+
"execution_count": null,
135
+
"metadata": {
136
+
"collapsed": false
137
+
},
138
+
"outputs": [],
139
+
"source": [
140
+
"from sklearn.tree import DecisionTreeRegressor\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\n\nn_samples, n_features = 1000, 20\nrng = np.random.RandomState(0)\nX = rng.randn(n_samples, n_features)\n# positive integer target correlated with X[:, 5] with many zeros:\ny = rng.poisson(lam=np.exp(X[:, 5]) / 2)\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rng)\nregressor = DecisionTreeRegressor(criterion='poisson', random_state=0)\nregressor.fit(X_train, y_train)"
141
+
]
142
+
},
143
+
{
144
+
"cell_type": "markdown",
145
+
"metadata": {},
146
+
"source": [
147
+
"## New documentation improvements\n\nNew examples and documentation pages have been added, in a continuous effort\nto improve the understanding of machine learning practices:\n\n- a new section about `common pitfalls and recommended\n practices <common_pitfalls>`,\n- an example illustrating how to `statistically compare the performance of\n models <sphx_glr_auto_examples_model_selection_plot_grid_search_stats.py>`\n evaluated using :class:`~sklearn.model_selection.GridSearchCV`,\n- an example on how to `interpret coefficients of linear models\n <sphx_glr_auto_examples_inspection_plot_linear_model_coefficient_interpretation.py>`,\n- an `example\n <sphx_glr_auto_examples_cross_decomposition_plot_pcr_vs_pls.py>`\n comparing Principal Component Regression and Partial Least Squares.\n\n"
0 commit comments