partev
diff --git a/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion b/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
8.17 KB b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
8.17 KB
diff --git a/‎dev/_downloads/6ab77a144e3d99c2ecfc30e7fb84ed8e/plot_multiclass_overview.py
Lines changed: 201 additions & 0 deletions b/‎dev/_downloads/6ab77a144e3d99c2ecfc30e7fb84ed8e/plot_multiclass_overview.py
Lines changed: 201 additions & 0 deletions
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 43379054ed6539d6d19e38b82e5dd49b
+config: 057ad4f143b34077d93d9e3f613f8448
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -0,0 +1,201 @@
+"""
+===============================================
+Overview of multiclass training meta-estimators
+===============================================
+
+In this example, we discuss the problem of classification when the target
+variable is composed of more than two classes. This is called multiclass
+classification.
+
+In scikit-learn, all estimators support multiclass classification out of the
+box: the most sensible strategy was implemented for the end-user. The
+:mod:`sklearn.multiclass` module implements various strategies that one can use
+for experimenting or developing third-party estimators that only support binary
+classification.
+
+:mod:`sklearn.multiclass` includes OvO/OvR strategies used to train a
+multiclass classifier by fitting a set of binary classifiers (the
+:class:`~sklearn.multiclass.OneVsOneClassifier` and
+:class:`~sklearn.multiclass.OneVsRestClassifier` meta-estimators). This example
+will review them.
+"""
+
+# %%
+# The Yeast UCI dataset
+# ---------------------
+#
+# In this example, we use a UCI dataset [1]_, generally referred as the Yeast
+# dataset. We use the :func:`sklearn.datasets.fetch_openml` function to load
+# the dataset from OpenML.
+from sklearn.datasets import fetch_openml
+
+X, y = fetch_openml(data_id=181, as_frame=True, return_X_y=True, parser="pandas")
+
+# %%
+# To know the type of data science problem we are dealing with, we can check
+# the target for which we want to build a predictive model.
+y.value_counts().sort_index()
+
+# %%
+# We see that the target is discrete and composed of 10 classes. We therefore
+# deal with a multiclass classification problem.
+#
+# Strategies comparison
+# ---------------------
+#
+# In the following experiment, we use a
+# :class:`~sklearn.tree.DecisionTreeClassifier` and a
+# :class:`~sklearn.model_selection.RepeatedStratifiedKFold` cross-validation
+# with 3 splits and 5 repetitions.
+#
+# We compare the following strategies:
+#
+# * :class:~sklearn.tree.DecisionTreeClassifier can handle multiclass
+#   classification without needing any special adjustments. It works by breaking
+#   down the training data into smaller subsets and focusing on the most common
+#   class in each subset. By repeating this process, the model can accurately
+#   classify input data into multiple different classes.
+# * :class:`~sklearn.multiclass.OneVsOneClassifier` trains a set of binary
+#   classifiers where each classifier is trained to distinguish between
+#   two classes.
+# * :class:`~sklearn.multiclass.OneVsRestClassifier`: trains a set of binary
+#   classifiers where each classifier is trained to distinguish between
+#   one class and the rest of the classes.
+# * :class:`~sklearn.multiclass.OutputCodeClassifier`: trains a set of binary
+#   classifiers where each classifier is trained to distinguish between
+#   a set of classes from the rest of the classes. The set of classes is
+#   defined by a codebook, which is randomly generated in scikit-learn. This
+#   method exposes a parameter `code_size` to control the size of the codebook.
+#   We set it above one since we are not interested in compressing the class
+#   representation.
+import pandas as pd
+
+from sklearn.model_selection import RepeatedStratifiedKFold, cross_validate
+from sklearn.multiclass import (
+    OneVsOneClassifier,
+    OneVsRestClassifier,
+    OutputCodeClassifier,
+)
+from sklearn.tree import DecisionTreeClassifier
+
+cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=5, random_state=0)
+
+tree = DecisionTreeClassifier(random_state=0)
+ovo_tree = OneVsOneClassifier(tree)
+ovr_tree = OneVsRestClassifier(tree)
+ecoc = OutputCodeClassifier(tree, code_size=2)
+
+cv_results_tree = cross_validate(tree, X, y, cv=cv, n_jobs=2)
+cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2)
+cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2)
+cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2)
+
+# %%
+# We can now compare the statistical performance of the different strategies.
+# We plot the score distribution of the different strategies.
+from matplotlib import pyplot as plt
+
+scores = pd.DataFrame(
+    {
+        "DecisionTreeClassifier": cv_results_tree["test_score"],
+        "OneVsOneClassifier": cv_results_ovo["test_score"],
+        "OneVsRestClassifier": cv_results_ovr["test_score"],
+        "OutputCodeClassifier": cv_results_ecoc["test_score"],
+    }
+)
+ax = scores.plot.kde(legend=True)
+ax.set_xlabel("Accuracy score")
+ax.set_xlim([0, 0.7])
+_ = ax.set_title(
+    "Density of the accuracy scores for the different multiclass strategies"
+)
+
+# %%
+# At a first glance, we can see that the built-in strategy of the decision
+# tree classifier is working quite well. One-vs-one and the error-correcting
+# output code strategies are working even better. However, the
+# one-vs-rest strategy is not working as well as the other strategies.
+#
+# Indeed, these results reproduce something reported in the literature
+# as in [2]_. However, the story is not as simple as it seems.
+#
+# The importance of hyperparameters search
+# ----------------------------------------
+#
+# It was later shown in [3]_ that the multiclass strategies would show similar
+# scores if the hyperparameters of the base classifiers are first optimized.
+#
+# Here we try to reproduce such result by at least optimizing the depth of the
+# base decision tree.
+from sklearn.model_selection import GridSearchCV
+
+param_grid = {"max_depth": [3, 5, 8]}
+tree_optimized = GridSearchCV(tree, param_grid=param_grid, cv=3)
+ovo_tree = OneVsOneClassifier(tree_optimized)
+ovr_tree = OneVsRestClassifier(tree_optimized)
+ecoc = OutputCodeClassifier(tree_optimized, code_size=2)
+
+cv_results_tree = cross_validate(tree_optimized, X, y, cv=cv, n_jobs=2)
+cv_results_ovo = cross_validate(ovo_tree, X, y, cv=cv, n_jobs=2)
+cv_results_ovr = cross_validate(ovr_tree, X, y, cv=cv, n_jobs=2)
+cv_results_ecoc = cross_validate(ecoc, X, y, cv=cv, n_jobs=2)
+
+scores = pd.DataFrame(
+    {
+        "DecisionTreeClassifier": cv_results_tree["test_score"],
+        "OneVsOneClassifier": cv_results_ovo["test_score"],
+        "OneVsRestClassifier": cv_results_ovr["test_score"],
+        "OutputCodeClassifier": cv_results_ecoc["test_score"],
+    }
+)
+ax = scores.plot.kde(legend=True)
+ax.set_xlabel("Accuracy score")
+ax.set_xlim([0, 0.7])
+_ = ax.set_title(
+    "Density of the accuracy scores for the different multiclass strategies"
+)
+
+plt.show()
+
+# %%
+# We can see that once the hyperparameters are optimized, all multiclass
+# strategies have similar performance as discussed in [3]_.
+#
+# Conclusion
+# ----------
+#
+# We can get some intuition behind those results.
+#
+# First, the reason for which one-vs-one and error-correcting output code are
+# outperforming the tree when the hyperparameters are not optimized relies on
+# fact that they ensemble a larger number of classifiers. The ensembling
+# improves the generalization performance. This is a bit similar why a bagging
+# classifier generally performs better than a single decision tree if no care
+# is taken to optimize the hyperparameters.
+#
+# Then, we see the importance of optimizing the hyperparameters. Indeed, it
+# should be regularly explored when developing predictive models even if
+# techniques such as ensembling help at reducing this impact.
+#
+# Finally, it is important to recall that the estimators in scikit-learn
+# are developed with a specific strategy to handle multiclass classification
+# out of the box. So for these estimators, it means that there is no need to
+# use different strategies. These strategies are mainly useful for third-party
+# estimators supporting only binary classification. In all cases, we also show
+# that the hyperparameters should be optimized.
+#
+# References
+# ----------
+#
+#   .. [1] https://archive.ics.uci.edu/ml/datasets/Yeast
+#
+#   .. [2] `"Reducing multiclass to binary: A unifying approach for margin classifiers."
+#      Allwein, Erin L., Robert E. Schapire, and Yoram Singer.
+#      Journal of machine learning research 1
+#      Dec (2000): 113-141.
+#      <https://www.jmlr.org/papers/volume1/allwein00a/allwein00a.pdf>`_.
+#
+#   .. [3] `"In defense of one-vs-all classification."
+#      Journal of Machine Learning Research 5
+#      Jan (2004): 101-141.
+#      <https://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf>`_.