scikit-learn
diff --git a/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion b/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
433 Bytes b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
433 Bytes
diff --git a/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
430 Bytes b/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
430 Bytes
diff --git a/‎dev/_downloads/b7e32fe54d613dce0d3c376377af061d/plot_outlier_detection_bench.py
Lines changed: 10 additions & 2 deletions b/‎dev/_downloads/b7e32fe54d613dce0d3c376377af061d/plot_outlier_detection_bench.py
Lines changed: 10 additions & 2 deletions
diff --git a/‎dev/_downloads/eacb6a63c887dafcff02b3cee64854ef/plot_outlier_detection_bench.ipynb
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/eacb6a63c887dafcff02b3cee64854ef/plot_outlier_detection_bench.ipynb
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/scikit-learn-docs.zip
-5.74 KB b/‎dev/_downloads/scikit-learn-docs.zip
-5.74 KB
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
-45 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
-45 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
-120 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
-120 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
-149 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
-149 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
-237 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
-237 Bytes
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 1318e898b1acb9ccaaee7efe1913e534
+config: 92b63cdbfd0ba0356b4514ff74477abb
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -6,9 +6,11 @@
 This example compares two outlier detection algorithms, namely
 :ref:`local_outlier_factor` (LOF) and :ref:`isolation_forest` (IForest), on
 real-world datasets available in :class:`sklearn.datasets`. The goal is to show
-that different algorithms perform well on different datasets.
+that different algorithms perform well on different datasets and contrast their
+training speed and sensitivity to hyperparameters.
 
-The algorithms are trained in an outlier detection context:
+The algorithms are trained (without labels) on the whole dataset assumed to
+contain outliers.
 
 1. The ROC curves are computed using knowledge of the ground-truth labels
 and displayed using :class:`~sklearn.metrics.RocCurveDisplay`.
@@ -314,6 +316,12 @@ def fit_predict(estimator, X):
 # datasets. The score for IForest is slightly better for the SA dataset and LOF
 # performs considerably better on the Ames housing dataset than IForest.
 #
+# Recall however that Isolation Forest tends to train much faster than LOF on
+# datasets with a large number of samples. LOF needs to compute pairwise
+# distances to find nearest neighbors, which has a quadratic complexity with respect
+# to the number of observations. This can make this method prohibitive on large
+# datasets.
+#
 # Ablation study
 # ==============
 #
 
@@ -4,7 +4,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Evaluation of outlier detection estimators\n\nThis example compares two outlier detection algorithms, namely\n`local_outlier_factor` (LOF) and `isolation_forest` (IForest), on\nreal-world datasets available in :class:`sklearn.datasets`. The goal is to show\nthat different algorithms perform well on different datasets.\n\nThe algorithms are trained in an outlier detection context:\n\n1. The ROC curves are computed using knowledge of the ground-truth labels\nand displayed using :class:`~sklearn.metrics.RocCurveDisplay`.\n\n2. The performance is assessed in terms of the ROC-AUC.\n"
+        "\n# Evaluation of outlier detection estimators\n\nThis example compares two outlier detection algorithms, namely\n`local_outlier_factor` (LOF) and `isolation_forest` (IForest), on\nreal-world datasets available in :class:`sklearn.datasets`. The goal is to show\nthat different algorithms perform well on different datasets and contrast their\ntraining speed and sensitivity to hyperparameters.\n\nThe algorithms are trained (without labels) on the whole dataset assumed to\ncontain outliers.\n\n1. The ROC curves are computed using knowledge of the ground-truth labels\nand displayed using :class:`~sklearn.metrics.RocCurveDisplay`.\n\n2. The performance is assessed in terms of the ROC-AUC.\n"
       ]
     },
     {
@@ -217,7 +217,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "We observe that once the number of neighbors is tuned, LOF and IForest perform\nsimilarly in terms of ROC AUC for the forestcover and cardiotocography\ndatasets. The score for IForest is slightly better for the SA dataset and LOF\nperforms considerably better on the Ames housing dataset than IForest.\n\n## Ablation study\n\nIn this section we explore the impact of the hyperparameter `n_neighbors` and\nthe choice of scaling the numerical variables on the LOF model. Here we use\nthe `covtype_dataset` dataset as the binary encoded categories introduce\na natural scale of euclidean distances between 0 and 1. We then want a scaling\nmethod to avoid granting a privilege to non-binary features and that is robust\nenough to outliers so that the task of finding them does not become too\ndifficult.\n\n"
+        "We observe that once the number of neighbors is tuned, LOF and IForest perform\nsimilarly in terms of ROC AUC for the forestcover and cardiotocography\ndatasets. The score for IForest is slightly better for the SA dataset and LOF\nperforms considerably better on the Ames housing dataset than IForest.\n\nRecall however that Isolation Forest tends to train much faster than LOF on\ndatasets with a large number of samples. LOF needs to compute pairwise\ndistances to find nearest neighbors, which has a quadratic complexity with respect\nto the number of observations. This can make this method prohibitive on large\ndatasets.\n\n## Ablation study\n\nIn this section we explore the impact of the hyperparameter `n_neighbors` and\nthe choice of scaling the numerical variables on the LOF model. Here we use\nthe `covtype_dataset` dataset as the binary encoded categories introduce\na natural scale of euclidean distances between 0 and 1. We then want a scaling\nmethod to avoid granting a privilege to non-binary features and that is robust\nenough to outliers so that the task of finding them does not become too\ndifficult.\n\n"
       ]
     },
     {
Original file line number	Diff line number	Diff line change
`@@ -4,7 +4,7 @@`
`4`	`4`	`"cell_type": "markdown",`
`5`	`5`	`"metadata": {},`
`6`	`6`	`"source": [`
`7`		- "\n# Evaluation of outlier detection estimators\n\nThis example compares two outlier detection algorithms, namely\n`local_outlier_factor` (LOF) and `isolation_forest` (IForest), on\nreal-world datasets available in :class:`sklearn.datasets`. The goal is to show\nthat different algorithms perform well on different datasets.\n\nThe algorithms are trained in an outlier detection context:\n\n1. The ROC curves are computed using knowledge of the ground-truth labels\nand displayed using :class:`~sklearn.metrics.RocCurveDisplay`.\n\n2. The performance is assessed in terms of the ROC-AUC.\n"
	`7`	+ "\n# Evaluation of outlier detection estimators\n\nThis example compares two outlier detection algorithms, namely\n`local_outlier_factor` (LOF) and `isolation_forest` (IForest), on\nreal-world datasets available in :class:`sklearn.datasets`. The goal is to show\nthat different algorithms perform well on different datasets and contrast their\ntraining speed and sensitivity to hyperparameters.\n\nThe algorithms are trained (without labels) on the whole dataset assumed to\ncontain outliers.\n\n1. The ROC curves are computed using knowledge of the ground-truth labels\nand displayed using :class:`~sklearn.metrics.RocCurveDisplay`.\n\n2. The performance is assessed in terms of the ROC-AUC.\n"
`8`	`8`	`]`
`9`	`9`	`},`
`10`	`10`	`{`
`@@ -217,7 +217,7 @@`
`217`	`217`	`"cell_type": "markdown",`
`218`	`218`	`"metadata": {},`
`219`	`219`	`"source": [`
`220`		- "We observe that once the number of neighbors is tuned, LOF and IForest perform\nsimilarly in terms of ROC AUC for the forestcover and cardiotocography\ndatasets. The score for IForest is slightly better for the SA dataset and LOF\nperforms considerably better on the Ames housing dataset than IForest.\n\n## Ablation study\n\nIn this section we explore the impact of the hyperparameter `n_neighbors` and\nthe choice of scaling the numerical variables on the LOF model. Here we use\nthe `covtype_dataset` dataset as the binary encoded categories introduce\na natural scale of euclidean distances between 0 and 1. We then want a scaling\nmethod to avoid granting a privilege to non-binary features and that is robust\nenough to outliers so that the task of finding them does not become too\ndifficult.\n\n"
	`220`	+ "We observe that once the number of neighbors is tuned, LOF and IForest perform\nsimilarly in terms of ROC AUC for the forestcover and cardiotocography\ndatasets. The score for IForest is slightly better for the SA dataset and LOF\nperforms considerably better on the Ames housing dataset than IForest.\n\nRecall however that Isolation Forest tends to train much faster than LOF on\ndatasets with a large number of samples. LOF needs to compute pairwise\ndistances to find nearest neighbors, which has a quadratic complexity with respect\nto the number of observations. This can make this method prohibitive on large\ndatasets.\n\n## Ablation study\n\nIn this section we explore the impact of the hyperparameter `n_neighbors` and\nthe choice of scaling the numerical variables on the LOF model. Here we use\nthe `covtype_dataset` dataset as the binary encoded categories introduce\na natural scale of euclidean distances between 0 and 1. We then want a scaling\nmethod to avoid granting a privilege to non-binary features and that is robust\nenough to outliers so that the task of finding them does not become too\ndifficult.\n\n"
`221`	`221`	`]`
`222`	`222`	`},`
`223`	`223`	`{`