You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"\n# Release Highlights for scikit-learn 1.5\n\n.. currentmodule:: sklearn\n\nWe are pleased to announce the release of scikit-learn 1.5! Many bug fixes\nand improvements were added, as well as some key new features. Below we\ndetail the highlights of this release. **For an exhaustive list of\nall the changes**, please refer to the `release notes <release_notes_1_5>`.\n\nTo install the latest version (with pip)::\n\n pip install --upgrade scikit-learn\n\nor with conda::\n\n conda install -c conda-forge scikit-learn\n"
8
+
]
9
+
},
10
+
{
11
+
"cell_type": "markdown",
12
+
"metadata": {},
13
+
"source": [
14
+
"## FixedThresholdClassifier: Setting the decision threshold of a binary classifier\nAll binary classifiers of scikit-learn use a fixed decision threshold of 0.5 to\nconvert probability estimates (i.e. output of `predict_proba`) into class\npredictions. However, 0.5 is almost never the desired threshold for a given problem.\n:class:`~model_selection.FixedThresholdClassifier` allows to wrap any binary\nclassifier and set a custom decision threshold.\n\n"
"Lowering the threshold, i.e. allowing more samples to be classified as the positive\nclass, increases the number of true positives at the cost of more false positives\n(as is well known from the concavity of the ROC curve).\n\n"
"## TunedThresholdClassifierCV: Tuning the decision threshold of a binary classifier\nThe decision threshold of a binary classifier can be tuned to optimize a given\nmetric, using :class:`~model_selection.TunedThresholdClassifierCV`.\n\n"
51
+
]
52
+
},
53
+
{
54
+
"cell_type": "code",
55
+
"execution_count": null,
56
+
"metadata": {
57
+
"collapsed": false
58
+
},
59
+
"outputs": [],
60
+
"source": [
61
+
"from sklearn.metrics import balanced_accuracy_score\n\n# Due to the class imbalance, the balanced accuracy is not optimal for the default\n# threshold. The classifier tends to over predict the majority class.\nprint(f\"balanced accuracy: {balanced_accuracy_score(y, classifier.predict(X)):.2f}\")"
62
+
]
63
+
},
64
+
{
65
+
"cell_type": "markdown",
66
+
"metadata": {},
67
+
"source": [
68
+
"Tuning the threshold to optimize the balanced accuracy gives a smaller threshold\nthat allows more samples to be classified as the positive class.\n\n"
":class:`~model_selection.TunedThresholdClassifierCV` also benefits from the\nmetadata routing support (`Metadata Routing User Guide<metadata_routing>`)\nallowing to optimze complex business metrics, detailed\nin `Post-tuning the decision threshold for cost-sensitive learning\n<sphx_glr_auto_examples_model_selection_plot_cost_sensitive_learning.py>`.\n\n"
87
+
]
88
+
},
89
+
{
90
+
"cell_type": "markdown",
91
+
"metadata": {},
92
+
"source": [
93
+
"## Performance improvements in PCA\n:class:`~decomposition.PCA` has a new solver, \"covariance_eigh\", which is faster\nand more memory efficient than the other solvers for datasets with a large number\nof samples and a small number of features.\n\n"
"The \"full\" solver has also been improved to use less memory and allows to\ntransform faster. The \"auto\" option for the solver takes advantage of the\nnew solver and is now able to select an appropriate solver for sparse\ndatasets.\n\n"
"## ColumnTransformer is subscriptable\nThe transformers of a :class:`~compose.ColumnTransformer` can now be directly\naccessed using indexing by name.\n\n"
"## Custom imputation strategies for the SimpleImputer\n:class:`~impute.SimpleImputer` now supports custom strategies for imputation,\nusing a callable that computes a scalar value from the non missing values of\na column vector.\n\n"
148
+
]
149
+
},
150
+
{
151
+
"cell_type": "code",
152
+
"execution_count": null,
153
+
"metadata": {
154
+
"collapsed": false
155
+
},
156
+
"outputs": [],
157
+
"source": [
158
+
"from sklearn.impute import SimpleImputer\n\nX = np.array(\n [\n [-1.1, 1.1, 1.1],\n [3.9, -1.2, np.nan],\n [np.nan, 1.3, np.nan],\n [-0.1, -1.4, -1.4],\n [-4.9, 1.5, -1.5],\n [np.nan, 1.6, 1.6],\n ]\n)\n\n\ndef smallest_abs(arr):\n\"\"\"Return the smallest absolute value of a 1D array.\"\"\"\n return np.min(np.abs(arr))\n\n\nimputer = SimpleImputer(strategy=smallest_abs)\n\nimputer.fit_transform(X)"
159
+
]
160
+
},
161
+
{
162
+
"cell_type": "markdown",
163
+
"metadata": {},
164
+
"source": [
165
+
"## Pairwise distances with non-numeric arrays\n:func:`~metrics.pairwise_distances` can now compute distances between\nnon-numeric arrays using a callable metric.\n\n"
166
+
]
167
+
},
168
+
{
169
+
"cell_type": "code",
170
+
"execution_count": null,
171
+
"metadata": {
172
+
"collapsed": false
173
+
},
174
+
"outputs": [],
175
+
"source": [
176
+
"from sklearn.metrics import pairwise_distances\n\nX = [\"cat\", \"dog\"]\nY = [\"cat\", \"fox\"]\n\n\ndef levenshtein_distance(x, y):\n\"\"\"Return the Levenshtein distance between two strings.\"\"\"\n if x == \"\" or y == \"\":\n return max(len(x), len(y))\n if x[0] == y[0]:\n return levenshtein_distance(x[1:], y[1:])\n return 1 + min(\n levenshtein_distance(x[1:], y),\n levenshtein_distance(x, y[1:]),\n levenshtein_distance(x[1:], y[1:]),\n )\n\n\npairwise_distances(X, Y, metric=levenshtein_distance)"
0 commit comments