Skip to content

Commit 81950f5

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 006ccdb0f744ed8447ba3b855059653e495ea2ab
1 parent f25cd50 commit 81950f5

File tree

1,316 files changed

+7198
-7175
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,316 files changed

+7198
-7175
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 1318e898b1acb9ccaaee7efe1913e534
3+
config: 92b63cdbfd0ba0356b4514ff74477abb
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file not shown.

dev/_downloads/b7e32fe54d613dce0d3c376377af061d/plot_outlier_detection_bench.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@
66
This example compares two outlier detection algorithms, namely
77
:ref:`local_outlier_factor` (LOF) and :ref:`isolation_forest` (IForest), on
88
real-world datasets available in :class:`sklearn.datasets`. The goal is to show
9-
that different algorithms perform well on different datasets.
9+
that different algorithms perform well on different datasets and contrast their
10+
training speed and sensitivity to hyperparameters.
1011
11-
The algorithms are trained in an outlier detection context:
12+
The algorithms are trained (without labels) on the whole dataset assumed to
13+
contain outliers.
1214
1315
1. The ROC curves are computed using knowledge of the ground-truth labels
1416
and displayed using :class:`~sklearn.metrics.RocCurveDisplay`.
@@ -314,6 +316,12 @@ def fit_predict(estimator, X):
314316
# datasets. The score for IForest is slightly better for the SA dataset and LOF
315317
# performs considerably better on the Ames housing dataset than IForest.
316318
#
319+
# Recall however that Isolation Forest tends to train much faster than LOF on
320+
# datasets with a large number of samples. LOF needs to compute pairwise
321+
# distances to find nearest neighbors, which has a quadratic complexity with respect
322+
# to the number of observations. This can make this method prohibitive on large
323+
# datasets.
324+
#
317325
# Ablation study
318326
# ==============
319327
#

dev/_downloads/eacb6a63c887dafcff02b3cee64854ef/plot_outlier_detection_bench.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Evaluation of outlier detection estimators\n\nThis example compares two outlier detection algorithms, namely\n`local_outlier_factor` (LOF) and `isolation_forest` (IForest), on\nreal-world datasets available in :class:`sklearn.datasets`. The goal is to show\nthat different algorithms perform well on different datasets.\n\nThe algorithms are trained in an outlier detection context:\n\n1. The ROC curves are computed using knowledge of the ground-truth labels\nand displayed using :class:`~sklearn.metrics.RocCurveDisplay`.\n\n2. The performance is assessed in terms of the ROC-AUC.\n"
7+
"\n# Evaluation of outlier detection estimators\n\nThis example compares two outlier detection algorithms, namely\n`local_outlier_factor` (LOF) and `isolation_forest` (IForest), on\nreal-world datasets available in :class:`sklearn.datasets`. The goal is to show\nthat different algorithms perform well on different datasets and contrast their\ntraining speed and sensitivity to hyperparameters.\n\nThe algorithms are trained (without labels) on the whole dataset assumed to\ncontain outliers.\n\n1. The ROC curves are computed using knowledge of the ground-truth labels\nand displayed using :class:`~sklearn.metrics.RocCurveDisplay`.\n\n2. The performance is assessed in terms of the ROC-AUC.\n"
88
]
99
},
1010
{
@@ -217,7 +217,7 @@
217217
"cell_type": "markdown",
218218
"metadata": {},
219219
"source": [
220-
"We observe that once the number of neighbors is tuned, LOF and IForest perform\nsimilarly in terms of ROC AUC for the forestcover and cardiotocography\ndatasets. The score for IForest is slightly better for the SA dataset and LOF\nperforms considerably better on the Ames housing dataset than IForest.\n\n## Ablation study\n\nIn this section we explore the impact of the hyperparameter `n_neighbors` and\nthe choice of scaling the numerical variables on the LOF model. Here we use\nthe `covtype_dataset` dataset as the binary encoded categories introduce\na natural scale of euclidean distances between 0 and 1. We then want a scaling\nmethod to avoid granting a privilege to non-binary features and that is robust\nenough to outliers so that the task of finding them does not become too\ndifficult.\n\n"
220+
"We observe that once the number of neighbors is tuned, LOF and IForest perform\nsimilarly in terms of ROC AUC for the forestcover and cardiotocography\ndatasets. The score for IForest is slightly better for the SA dataset and LOF\nperforms considerably better on the Ames housing dataset than IForest.\n\nRecall however that Isolation Forest tends to train much faster than LOF on\ndatasets with a large number of samples. LOF needs to compute pairwise\ndistances to find nearest neighbors, which has a quadratic complexity with respect\nto the number of observations. This can make this method prohibitive on large\ndatasets.\n\n## Ablation study\n\nIn this section we explore the impact of the hyperparameter `n_neighbors` and\nthe choice of scaling the numerical variables on the LOF model. Here we use\nthe `covtype_dataset` dataset as the binary encoded categories introduce\na natural scale of euclidean distances between 0 and 1. We then want a scaling\nmethod to avoid granting a privilege to non-binary features and that is robust\nenough to outliers so that the task of finding them does not become too\ndifficult.\n\n"
221221
]
222222
},
223223
{

dev/_downloads/scikit-learn-docs.zip

-5.74 KB
Binary file not shown.
-45 Bytes
-120 Bytes
-149 Bytes
-237 Bytes

0 commit comments

Comments
 (0)