Skip to content

Commit 099426e

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 38a06e4be504f3971d109d6741b8b4c7192d7323
1 parent 84b9e07 commit 099426e

File tree

1,292 files changed

+6053
-5789
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,292 files changed

+6053
-5789
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 5963acf6320151f44a6521a3488d2eb6
3+
config: 59053f3c78059526e40748d46d97a8a3
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.

dev/_downloads/47f024d726d245e034c7690b4664721f/plot_classification.ipynb

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,32 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Nearest Neighbors Classification\n\nSample usage of Nearest Neighbors classification.\nIt will plot the decision boundaries for each class.\n"
7+
"\n# Nearest Neighbors Classification\n\nThis example shows how to use :class:`~sklearn.neighbors.KNeighborsClassifier`.\nWe train such a classifier on the iris dataset and observe the difference of the\ndecision boundary obtained with regards to the parameter `weights`.\n"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"## Load the data\n\nIn this example, we use the iris dataset. We split the data into a train and test\ndataset.\n\n"
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"metadata": {
21+
"collapsed": false
22+
},
23+
"outputs": [],
24+
"source": [
25+
"from sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\n\niris = load_iris(as_frame=True)\nX = iris.data[[\"sepal length (cm)\", \"sepal width (cm)\"]]\ny = iris.target\nX_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)"
26+
]
27+
},
28+
{
29+
"cell_type": "markdown",
30+
"metadata": {},
31+
"source": [
32+
"## K-nearest neighbors classifier\n\nWe want to use a k-nearest neighbors classifier considering a neighborhood of 11 data\npoints. Since our k-nearest neighbors model uses euclidean distance to find the\nnearest neighbors, it is therefore important to scale the data beforehand. Refer to\nthe example entitled\n`sphx_glr_auto_examples_preprocessing_plot_scaling_importance.py` for more\ndetailed information.\n\nThus, we use a :class:`~sklearn.pipeline.Pipeline` to chain a scaler before to use\nour classifier.\n\n"
833
]
934
},
1035
{
@@ -15,7 +40,32 @@
1540
},
1641
"outputs": [],
1742
"source": [
18-
"import matplotlib.pyplot as plt\nimport seaborn as sns\nfrom matplotlib.colors import ListedColormap\n\nfrom sklearn import datasets, neighbors\nfrom sklearn.inspection import DecisionBoundaryDisplay\n\nn_neighbors = 15\n\n# import some data to play with\niris = datasets.load_iris()\n\n# we only take the first two features. We could avoid this ugly\n# slicing by using a two-dim dataset\nX = iris.data[:, :2]\ny = iris.target\n\n# Create color maps\ncmap_light = ListedColormap([\"orange\", \"cyan\", \"cornflowerblue\"])\ncmap_bold = [\"darkorange\", \"c\", \"darkblue\"]\n\nfor weights in [\"uniform\", \"distance\"]:\n # we create an instance of Neighbours Classifier and fit the data.\n clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)\n clf.fit(X, y)\n\n _, ax = plt.subplots()\n DecisionBoundaryDisplay.from_estimator(\n clf,\n X,\n cmap=cmap_light,\n ax=ax,\n response_method=\"predict\",\n plot_method=\"pcolormesh\",\n xlabel=iris.feature_names[0],\n ylabel=iris.feature_names[1],\n shading=\"auto\",\n )\n\n # Plot also the training points\n sns.scatterplot(\n x=X[:, 0],\n y=X[:, 1],\n hue=iris.target_names[y],\n palette=cmap_bold,\n alpha=1.0,\n edgecolor=\"black\",\n )\n plt.title(\n \"3-Class classification (k = %i, weights = '%s')\" % (n_neighbors, weights)\n )\n\nplt.show()"
43+
"from sklearn.neighbors import KNeighborsClassifier\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nclf = Pipeline(\n steps=[(\"scaler\", StandardScaler()), (\"knn\", KNeighborsClassifier(n_neighbors=11))]\n)"
44+
]
45+
},
46+
{
47+
"cell_type": "markdown",
48+
"metadata": {},
49+
"source": [
50+
"## Decision boundary\n\nNow, we fit two classifiers with different values of the parameter\n`weights`. We plot the decision boundary of each classifier as well as the original\ndataset to observe the difference.\n\n"
51+
]
52+
},
53+
{
54+
"cell_type": "code",
55+
"execution_count": null,
56+
"metadata": {
57+
"collapsed": false
58+
},
59+
"outputs": [],
60+
"source": [
61+
"import matplotlib.pyplot as plt\n\nfrom sklearn.inspection import DecisionBoundaryDisplay\n\n_, axs = plt.subplots(ncols=2, figsize=(12, 5))\n\nfor ax, weights in zip(axs, (\"uniform\", \"distance\")):\n clf.set_params(knn__weights=weights).fit(X_train, y_train)\n disp = DecisionBoundaryDisplay.from_estimator(\n clf,\n X_test,\n response_method=\"predict\",\n plot_method=\"pcolormesh\",\n xlabel=iris.feature_names[0],\n ylabel=iris.feature_names[1],\n shading=\"auto\",\n alpha=0.5,\n ax=ax,\n )\n scatter = disp.ax_.scatter(X.iloc[:, 0], X.iloc[:, 1], c=y, edgecolors=\"k\")\n disp.ax_.legend(\n scatter.legend_elements()[0],\n iris.target_names,\n loc=\"lower left\",\n title=\"Classes\",\n )\n _ = disp.ax_.set_title(\n f\"3-Class classification\\n(k={clf[-1].n_neighbors}, weights={weights!r})\"\n )\n\nplt.show()"
62+
]
63+
},
64+
{
65+
"cell_type": "markdown",
66+
"metadata": {},
67+
"source": [
68+
"## Conclusion\n\nWe observe that the parameter `weights` has an impact on the decision boundary. When\n`weights=\"unifom\"` all nearest neighbors will have the same impact on the decision.\nWhereas when `weights=\"distance\"` the weight given to each neighbor is proportional\nto the inverse of the distance from that neighbor to the query point.\n\nIn some cases, taking the distance into account might improve the model.\n\n"
1969
]
2070
}
2171
],
Binary file not shown.

dev/_downloads/8d0cc737ca20800f70d8aa80d8b8fb7d/plot_classification.py

Lines changed: 69 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -3,61 +3,92 @@
33
Nearest Neighbors Classification
44
================================
55
6-
Sample usage of Nearest Neighbors classification.
7-
It will plot the decision boundaries for each class.
8-
6+
This example shows how to use :class:`~sklearn.neighbors.KNeighborsClassifier`.
7+
We train such a classifier on the iris dataset and observe the difference of the
8+
decision boundary obtained with regards to the parameter `weights`.
99
"""
1010

11-
import matplotlib.pyplot as plt
12-
import seaborn as sns
13-
from matplotlib.colors import ListedColormap
11+
# %%
12+
# Load the data
13+
# -------------
14+
#
15+
# In this example, we use the iris dataset. We split the data into a train and test
16+
# dataset.
17+
from sklearn.datasets import load_iris
18+
from sklearn.model_selection import train_test_split
1419

15-
from sklearn import datasets, neighbors
16-
from sklearn.inspection import DecisionBoundaryDisplay
20+
iris = load_iris(as_frame=True)
21+
X = iris.data[["sepal length (cm)", "sepal width (cm)"]]
22+
y = iris.target
23+
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)
1724

18-
n_neighbors = 15
25+
# %%
26+
# K-nearest neighbors classifier
27+
# ------------------------------
28+
#
29+
# We want to use a k-nearest neighbors classifier considering a neighborhood of 11 data
30+
# points. Since our k-nearest neighbors model uses euclidean distance to find the
31+
# nearest neighbors, it is therefore important to scale the data beforehand. Refer to
32+
# the example entitled
33+
# :ref:`sphx_glr_auto_examples_preprocessing_plot_scaling_importance.py` for more
34+
# detailed information.
35+
#
36+
# Thus, we use a :class:`~sklearn.pipeline.Pipeline` to chain a scaler before to use
37+
# our classifier.
38+
from sklearn.neighbors import KNeighborsClassifier
39+
from sklearn.pipeline import Pipeline
40+
from sklearn.preprocessing import StandardScaler
1941

20-
# import some data to play with
21-
iris = datasets.load_iris()
42+
clf = Pipeline(
43+
steps=[("scaler", StandardScaler()), ("knn", KNeighborsClassifier(n_neighbors=11))]
44+
)
2245

23-
# we only take the first two features. We could avoid this ugly
24-
# slicing by using a two-dim dataset
25-
X = iris.data[:, :2]
26-
y = iris.target
46+
# %%
47+
# Decision boundary
48+
# -----------------
49+
#
50+
# Now, we fit two classifiers with different values of the parameter
51+
# `weights`. We plot the decision boundary of each classifier as well as the original
52+
# dataset to observe the difference.
53+
import matplotlib.pyplot as plt
2754

28-
# Create color maps
29-
cmap_light = ListedColormap(["orange", "cyan", "cornflowerblue"])
30-
cmap_bold = ["darkorange", "c", "darkblue"]
55+
from sklearn.inspection import DecisionBoundaryDisplay
3156

32-
for weights in ["uniform", "distance"]:
33-
# we create an instance of Neighbours Classifier and fit the data.
34-
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
35-
clf.fit(X, y)
57+
_, axs = plt.subplots(ncols=2, figsize=(12, 5))
3658

37-
_, ax = plt.subplots()
38-
DecisionBoundaryDisplay.from_estimator(
59+
for ax, weights in zip(axs, ("uniform", "distance")):
60+
clf.set_params(knn__weights=weights).fit(X_train, y_train)
61+
disp = DecisionBoundaryDisplay.from_estimator(
3962
clf,
40-
X,
41-
cmap=cmap_light,
42-
ax=ax,
63+
X_test,
4364
response_method="predict",
4465
plot_method="pcolormesh",
4566
xlabel=iris.feature_names[0],
4667
ylabel=iris.feature_names[1],
4768
shading="auto",
69+
alpha=0.5,
70+
ax=ax,
4871
)
49-
50-
# Plot also the training points
51-
sns.scatterplot(
52-
x=X[:, 0],
53-
y=X[:, 1],
54-
hue=iris.target_names[y],
55-
palette=cmap_bold,
56-
alpha=1.0,
57-
edgecolor="black",
72+
scatter = disp.ax_.scatter(X.iloc[:, 0], X.iloc[:, 1], c=y, edgecolors="k")
73+
disp.ax_.legend(
74+
scatter.legend_elements()[0],
75+
iris.target_names,
76+
loc="lower left",
77+
title="Classes",
5878
)
59-
plt.title(
60-
"3-Class classification (k = %i, weights = '%s')" % (n_neighbors, weights)
79+
_ = disp.ax_.set_title(
80+
f"3-Class classification\n(k={clf[-1].n_neighbors}, weights={weights!r})"
6181
)
6282

6383
plt.show()
84+
85+
# %%
86+
# Conclusion
87+
# ----------
88+
#
89+
# We observe that the parameter `weights` has an impact on the decision boundary. When
90+
# `weights="unifom"` all nearest neighbors will have the same impact on the decision.
91+
# Whereas when `weights="distance"` the weight given to each neighbor is proportional
92+
# to the inverse of the distance from that neighbor to the query point.
93+
#
94+
# In some cases, taking the distance into account might improve the model.

dev/_downloads/scikit-learn-docs.zip

295 Bytes
Binary file not shown.
-121 Bytes
-61 Bytes
-27 Bytes
16 Bytes

0 commit comments

Comments
 (0)