Skip to content

Commit bd2fa37

Browse files
committed
Pushing the docs to dev/ for branch: main, commit e718c763fde3777aa05fe06c158ce4d6d1e85991
1 parent 8a2224d commit bd2fa37

File tree

1,316 files changed

+8016
-5848
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,316 files changed

+8016
-5848
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 43379054ed6539d6d19e38b82e5dd49b
3+
config: 057ad4f143b34077d93d9e3f613f8448
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
"""
2+
===============================================
3+
Overview of multiclass training meta-estimators
4+
===============================================
5+
6+
In this example, we discuss the problem of classification when the target
7+
variable is composed of more than two classes. This is called multiclass
8+
classification.
9+
10+
In scikit-learn, all estimators support multiclass classification out of the
11+
box: the most sensible strategy was implemented for the end-user. The
12+
:mod:`sklearn.multiclass` module implements various strategies that one can use
13+
for experimenting or developing third-party estimators that only support binary
14+
classification.
15+
16+
:mod:`sklearn.multiclass` includes OvO/OvR strategies used to train a
17+
multiclass classifier by fitting a set of binary classifiers (the
18+
:class:`~sklearn.multiclass.OneVsOneClassifier` and
19+
:class:`~sklearn.multiclass.OneVsRestClassifier` meta-estimators). This example
20+
will review them.
21+
"""
22+
23+
# %%
24+
# The Yeast UCI dataset
25+
# ---------------------
26+
#
27+
# In this example, we use a UCI dataset [1]_, generally referred as the Yeast
28+
# dataset. We use the :func:`sklearn.datasets.fetch_openml` function to load
29+
# the dataset from OpenML.
30+
from sklearn.datasets import fetch_openml
31+
32+
X, y = fetch_openml(data_id=181, as_frame=True, return_X_y=True, parser="pandas")
33+
34+
# %%
35+
# To know the type of data science problem we are dealing with, we can check
36+
# the target for which we want to build a predictive model.
37+
y.value_counts().sort_index()
38+
39+
# %%
40+
# We see that the target is discrete and composed of 10 classes. We therefore
41+
# deal with a multiclass classification problem.
42+
#
43+
# Strategies comparison
44+
# ---------------------
45+
#
46+
# In the following experiment, we use a
47+
# :class:`~sklearn.tree.DecisionTreeClassifier` and a
48+
# :class:`~sklearn.model_selection.RepeatedStratifiedKFold` cross-validation
49+
# with 3 splits and 5 repetitions.
50+
#
51+
# We compare the following strategies:
52+
#
53+
# * :class:~sklearn.tree.DecisionTreeClassifier can handle multiclass
54+
# classification without needing any special adjustments. It works by breaking
55+
# down the training data into smaller subsets and focusing on the most common
56+
# class in each subset. By repeating this process, the model can accurately
57+
# classify input data into multiple different classes.
58+
# * :class:`~sklearn.multiclass.OneVsOneClassifier` trains a set of binary
59+
# classifiers where each classifier is trained to distinguish between
60+
# two classes.
61+
# * :class:`~sklearn.multiclass.OneVsRestClassifier`: trains a set of binary
62+
# classifiers where each classifier is trained to distinguish between
63+
# one class and the rest of the classes.
64+
# * :class:`~sklearn.multiclass.OutputCodeClassifier`: trains a set of binary
65+
# classifiers where each classifier is trained to distinguish between
66+
# a set of classes from the rest of the classes. The set of classes is
67+
# defined by a codebook, which is randomly generated in scikit-learn. This
68+
# method exposes a parameter `code_size` to control the size of the codebook.
69+
# We set it above one since we are not interested in compressing the class
70+
# representation.
71+
import pandas as pd
72+
73+
from sklearn.model_selection import RepeatedStratifiedKFold, cross_validate
74+
from sklearn.multiclass import (
75+
OneVsOneClassifier,
76+
OneVsRestClassifier,
77+
OutputCodeClassifier,
78+
)
79+
from sklearn.tree import DecisionTreeClassifier
80+
81+
cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=5, random_state=0)
82+
83+
tree = DecisionTreeClassifier(random_state=0)
84+
ovo_tree = OneVsOneClassifier(tree)
85+
ovr_tree = OneVsRestClassifier(tree)
86+
ecoc = OutputCodeClassifier(tree, code_size=2)
87+
88+
cv_results_tree = cross_validate(tree, X, y, cv=cv, n_jobs=2)
89+
cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2)
90+
cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2)
91+
cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2)
92+
93+
# %%
94+
# We can now compare the statistical performance of the different strategies.
95+
# We plot the score distribution of the different strategies.
96+
from matplotlib import pyplot as plt
97+
98+
scores = pd.DataFrame(
99+
{
100+
"DecisionTreeClassifier": cv_results_tree["test_score"],
101+
"OneVsOneClassifier": cv_results_ovo["test_score"],
102+
"OneVsRestClassifier": cv_results_ovr["test_score"],
103+
"OutputCodeClassifier": cv_results_ecoc["test_score"],
104+
}
105+
)
106+
ax = scores.plot.kde(legend=True)
107+
ax.set_xlabel("Accuracy score")
108+
ax.set_xlim([0, 0.7])
109+
_ = ax.set_title(
110+
"Density of the accuracy scores for the different multiclass strategies"
111+
)
112+
113+
# %%
114+
# At a first glance, we can see that the built-in strategy of the decision
115+
# tree classifier is working quite well. One-vs-one and the error-correcting
116+
# output code strategies are working even better. However, the
117+
# one-vs-rest strategy is not working as well as the other strategies.
118+
#
119+
# Indeed, these results reproduce something reported in the literature
120+
# as in [2]_. However, the story is not as simple as it seems.
121+
#
122+
# The importance of hyperparameters search
123+
# ----------------------------------------
124+
#
125+
# It was later shown in [3]_ that the multiclass strategies would show similar
126+
# scores if the hyperparameters of the base classifiers are first optimized.
127+
#
128+
# Here we try to reproduce such result by at least optimizing the depth of the
129+
# base decision tree.
130+
from sklearn.model_selection import GridSearchCV
131+
132+
param_grid = {"max_depth": [3, 5, 8]}
133+
tree_optimized = GridSearchCV(tree, param_grid=param_grid, cv=3)
134+
ovo_tree = OneVsOneClassifier(tree_optimized)
135+
ovr_tree = OneVsRestClassifier(tree_optimized)
136+
ecoc = OutputCodeClassifier(tree_optimized, code_size=2)
137+
138+
cv_results_tree = cross_validate(tree_optimized, X, y, cv=cv, n_jobs=2)
139+
cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2)
140+
cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2)
141+
cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2)
142+
143+
scores = pd.DataFrame(
144+
{
145+
"DecisionTreeClassifier": cv_results_tree["test_score"],
146+
"OneVsOneClassifier": cv_results_ovo["test_score"],
147+
"OneVsRestClassifier": cv_results_ovr["test_score"],
148+
"OutputCodeClassifier": cv_results_ecoc["test_score"],
149+
}
150+
)
151+
ax = scores.plot.kde(legend=True)
152+
ax.set_xlabel("Accuracy score")
153+
ax.set_xlim([0, 0.7])
154+
_ = ax.set_title(
155+
"Density of the accuracy scores for the different multiclass strategies"
156+
)
157+
158+
plt.show()
159+
160+
# %%
161+
# We can see that once the hyperparameters are optimized, all multiclass
162+
# strategies have similar performance as discussed in [3]_.
163+
#
164+
# Conclusion
165+
# ----------
166+
#
167+
# We can get some intuition behind those results.
168+
#
169+
# First, the reason for which one-vs-one and error-correcting output code are
170+
# outperforming the tree when the hyperparameters are not optimized relies on
171+
# fact that they ensemble a larger number of classifiers. The ensembling
172+
# improves the generalization performance. This is a bit similar why a bagging
173+
# classifier generally performs better than a single decision tree if no care
174+
# is taken to optimize the hyperparameters.
175+
#
176+
# Then, we see the importance of optimizing the hyperparameters. Indeed, it
177+
# should be regularly explored when developing predictive models even if
178+
# techniques such as ensembling help at reducing this impact.
179+
#
180+
# Finally, it is important to recall that the estimators in scikit-learn
181+
# are developed with a specific strategy to handle multiclass classification
182+
# out of the box. So for these estimators, it means that there is no need to
183+
# use different strategies. These strategies are mainly useful for third-party
184+
# estimators supporting only binary classification. In all cases, we also show
185+
# that the hyperparameters should be optimized.
186+
#
187+
# References
188+
# ----------
189+
#
190+
# .. [1] https://archive.ics.uci.edu/ml/datasets/Yeast
191+
#
192+
# .. [2] `"Reducing multiclass to binary: A unifying approach for margin classifiers."
193+
# Allwein, Erin L., Robert E. Schapire, and Yoram Singer.
194+
# Journal of machine learning research 1
195+
# Dec (2000): 113-141.
196+
# <https://www.jmlr.org/papers/volume1/allwein00a/allwein00a.pdf>`_.
197+
#
198+
# .. [3] `"In defense of one-vs-all classification."
199+
# Journal of Machine Learning Research 5
200+
# Jan (2004): 101-141.
201+
# <https://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf>`_.

0 commit comments

Comments
 (0)