Skip to content

Commit e261b4a

Browse files
committed
Rebuild dev docs at master=425407b
1 parent 0347d7e commit e261b4a

File tree

229 files changed

+1412
-1402
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

229 files changed

+1412
-1402
lines changed

dev/_sources/modules/outlier_detection.txt

Lines changed: 58 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -128,9 +128,53 @@ This strategy is illustrated below.
128128

129129
.. [RD1999] Rousseeuw, P.J., Van Driessen, K. "A fast algorithm for the minimum
130130
covariance determinant estimator" Technometrics 41(3), 212 (1999)
131+
132+
133+
Isolation Forest
134+
----------------------------
135+
136+
One efficient way of performing outlier detection in high-dimensional datasets
137+
is to use random forests.
138+
The :class:`ensemble.IsolationForest` 'isolates' observations by randomly selecting
139+
a feature and then randomly selecting a split value between the maximum and
140+
minimum values of the selected feature.
141+
142+
Since recursive partitioning can be represented by a tree structure, the
143+
number of splittings required to isolate a sample is equivalent to the path
144+
length from the root node to the terminating node.
145+
146+
This path length, averaged over a forest of such random trees, is a
147+
measure of abnormality and our decision function.
148+
149+
Random partitioning produces noticeably shorter paths for anomalies.
150+
Hence, when a forest of random trees collectively produce shorter path
151+
lengths for particular samples, they are highly likely to be anomalies.
152+
153+
This strategy is illustrated below.
154+
155+
.. figure:: ../auto_examples/ensemble/images/plot_isolation_forest_001.png
156+
:target: ../auto_examples/ensemble/plot_isolation_forest.html
157+
:align: center
158+
:scale: 75%
159+
160+
.. topic:: Examples:
161+
162+
* See :ref:`example_ensemble_plot_isolation_forest.py` for
163+
an illustration of the use of IsolationForest.
164+
165+
* See :ref:`example_covariance_plot_outlier_detection.py` for a
166+
comparison of :class:`ensemble.IsolationForest` with
167+
:class:`svm.OneClassSVM` (tuned to perform like an outlier detection
168+
method) and a covariance-based outlier detection with
169+
:class:`covariance.MinCovDet`.
170+
171+
.. topic:: References:
172+
.. [LTZ2008] Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. "Isolation forest."
173+
Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on.
174+
131175

132-
One-class SVM versus elliptic envelope
133-
--------------------------------------
176+
One-class SVM versus Elliptic Envelope versus Isolation Forest
177+
--------------------------------------------------------------
134178

135179
Strictly-speaking, the One-class SVM is not an outlier-detection method,
136180
but a novelty-detection method: its training set should not be
@@ -141,8 +185,8 @@ results in these situations.
141185

142186
The examples below illustrate how the performance of the
143187
:class:`covariance.EllipticEnvelope` degrades as the data is less and
144-
less unimodal. :class:`svm.OneClassSVM` works better on data with
145-
multiple modes.
188+
less unimodal. The :class:`svm.OneClassSVM` works better on data with
189+
multiple modes and :class:`ensemble.IsolationForest` performs well in every cases.
146190

147191
.. |outlier1| image:: ../auto_examples/covariance/images/plot_outlier_detection_001.png
148192
:target: ../auto_examples/covariance/plot_outlier_detection.html
@@ -166,70 +210,32 @@ multiple modes.
166210
fits a bit the outliers present in the training set. On the
167211
opposite, the decision rule based on fitting an
168212
:class:`covariance.EllipticEnvelope` learns an ellipse, which
169-
fits well the inlier distribution.
213+
fits well the inlier distribution. The :class:`ensemble.IsolationForest`
214+
performs as well.
170215
- |outlier1|
171216

172217
*
173218
- As the inlier distribution becomes bimodal, the
174219
:class:`covariance.EllipticEnvelope` does not fit well the
175-
inliers. However, we can see that the :class:`svm.OneClassSVM`
220+
inliers. However, we can see that both :class:`ensemble.IsolationForest`
221+
and :class:`svm.OneClassSVM` have difficulties to detect the two modes,
222+
and that the :class:`svm.OneClassSVM`
176223
tends to overfit: because it has not model of inliers, it
177224
interprets a region where, by chance some outliers are
178-
clustered, as inliers.
225+
clustered, as inliers.
179226
- |outlier2|
180227

181228
*
182229
- If the inlier distribution is strongly non Gaussian, the
183230
:class:`svm.OneClassSVM` is able to recover a reasonable
184-
approximation, whereas the :class:`covariance.EllipticEnvelope`
185-
completely fails.
231+
approximation as well as :class:`ensemble.IsolationForest`,
232+
whereas the :class:`covariance.EllipticEnvelope` completely fails.
186233
- |outlier3|
187234

188235
.. topic:: Examples:
189236

190237
* See :ref:`example_covariance_plot_outlier_detection.py` for a
191238
comparison of the :class:`svm.OneClassSVM` (tuned to perform like
192-
an outlier detection method) and a covariance-based outlier
239+
an outlier detection method), the :class:`ensemble.IsolationForest`
240+
and a covariance-based outlier
193241
detection with :class:`covariance.MinCovDet`.
194-
195-
Isolation Forest
196-
----------------------------
197-
198-
One efficient way of performing outlier detection in high-dimensional datasets
199-
is to use random forests.
200-
The :class:`ensemble.IsolationForest` 'isolates' observations by randomly selecting
201-
a feature and then randomly selecting a split value between the maximum and
202-
minimum values of the selected feature.
203-
204-
Since recursive partitioning can be represented by a tree structure, the
205-
number of splittings required to isolate a sample is equivalent to the path
206-
length from the root node to the terminating node.
207-
208-
This path length, averaged over a forest of such random trees, is a
209-
measure of abnormality and our decision function.
210-
211-
Random partitioning produces noticeably shorter paths for anomalies.
212-
Hence, when a forest of random trees collectively produce shorter path
213-
lengths for particular samples, they are highly likely to be anomalies.
214-
215-
This strategy is illustrated below.
216-
217-
.. figure:: ../auto_examples/ensemble/images/plot_isolation_forest_001.png
218-
:target: ../auto_examples/ensemble/plot_isolation_forest.html
219-
:align: center
220-
:scale: 75%
221-
222-
.. topic:: Examples:
223-
224-
* See :ref:`example_ensemble_plot_isolation_forest.py` for
225-
an illustration of the use of IsolationForest.
226-
227-
* See :ref:`example_covariance_plot_outlier_detection.py` for a
228-
comparison of :class:`ensemble.IsolationForest` with
229-
:class:`svm.OneClassSVM` (tuned to perform like an outlier detection
230-
method) and a covariance-based outlier detection with
231-
:class:`covariance.MinCovDet`.
232-
233-
.. topic:: References:
234-
.. [LTZ2008] Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. "Isolation forest."
235-
Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on.

dev/modules/generated/sklearn.base.BaseEstimator.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@
172172
<h1><a class="reference internal" href="../classes.html#module-sklearn.base" title="sklearn.base"><tt class="xref py py-mod docutils literal"><span class="pre">sklearn.base</span></tt></a>.BaseEstimator<a class="headerlink" href="#sklearn-base-baseestimator" title="Permalink to this headline"></a></h1>
173173
<dl class="class">
174174
<dt id="sklearn.base.BaseEstimator">
175-
<em class="property">class </em><tt class="descclassname">sklearn.base.</tt><tt class="descname">BaseEstimator</tt><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/e3a95bf/sklearn/base.py#L169"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator" title="Permalink to this definition"></a></dt>
175+
<em class="property">class </em><tt class="descclassname">sklearn.base.</tt><tt class="descname">BaseEstimator</tt><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/425407b/sklearn/base.py#L169"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator" title="Permalink to this definition"></a></dt>
176176
<dd><p>Base class for all estimators in scikit-learn</p>
177177
<p class="rubric">Notes</p>
178178
<p>All estimators should specify all the parameters that can be set
@@ -201,7 +201,7 @@ <h1><a class="reference internal" href="../classes.html#module-sklearn.base" tit
201201

202202
<dl class="method">
203203
<dt id="sklearn.base.BaseEstimator.get_params">
204-
<tt class="descname">get_params</tt><big>(</big><em>deep=True</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/e3a95bf/sklearn/base.py#L206"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.get_params" title="Permalink to this definition"></a></dt>
204+
<tt class="descname">get_params</tt><big>(</big><em>deep=True</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/425407b/sklearn/base.py#L206"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.get_params" title="Permalink to this definition"></a></dt>
205205
<dd><p>Get parameters for this estimator.</p>
206206
<table class="docutils field-list" frame="void" rules="none">
207207
<col class="field-name" />
@@ -226,7 +226,7 @@ <h1><a class="reference internal" href="../classes.html#module-sklearn.base" tit
226226

227227
<dl class="method">
228228
<dt id="sklearn.base.BaseEstimator.set_params">
229-
<tt class="descname">set_params</tt><big>(</big><em>**params</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/e3a95bf/sklearn/base.py#L243"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.set_params" title="Permalink to this definition"></a></dt>
229+
<tt class="descname">set_params</tt><big>(</big><em>**params</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/425407b/sklearn/base.py#L243"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.BaseEstimator.set_params" title="Permalink to this definition"></a></dt>
230230
<dd><p>Set the parameters of this estimator.</p>
231231
<p>The method works on simple estimators as well as on nested objects
232232
(such as pipelines). The former have parameters of the form

dev/modules/generated/sklearn.base.TransformerMixin.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@
172172
<h1><a class="reference internal" href="../classes.html#module-sklearn.base" title="sklearn.base"><tt class="xref py py-mod docutils literal"><span class="pre">sklearn.base</span></tt></a>.TransformerMixin<a class="headerlink" href="#sklearn-base-transformermixin" title="Permalink to this headline"></a></h1>
173173
<dl class="class">
174174
<dt id="sklearn.base.TransformerMixin">
175-
<em class="property">class </em><tt class="descclassname">sklearn.base.</tt><tt class="descname">TransformerMixin</tt><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/e3a95bf/sklearn/base.py#L435"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.TransformerMixin" title="Permalink to this definition"></a></dt>
175+
<em class="property">class </em><tt class="descclassname">sklearn.base.</tt><tt class="descname">TransformerMixin</tt><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/425407b/sklearn/base.py#L435"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.TransformerMixin" title="Permalink to this definition"></a></dt>
176176
<dd><p>Mixin class for all transformers in scikit-learn.</p>
177177
<p class="rubric">Methods</p>
178178
<table border="1" class="longtable docutils">
@@ -194,7 +194,7 @@ <h1><a class="reference internal" href="../classes.html#module-sklearn.base" tit
194194

195195
<dl class="method">
196196
<dt id="sklearn.base.TransformerMixin.fit_transform">
197-
<tt class="descname">fit_transform</tt><big>(</big><em>X</em>, <em>y=None</em>, <em>**fit_params</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/e3a95bf/sklearn/base.py#L438"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.TransformerMixin.fit_transform" title="Permalink to this definition"></a></dt>
197+
<tt class="descname">fit_transform</tt><big>(</big><em>X</em>, <em>y=None</em>, <em>**fit_params</em><big>)</big><a class="reference external" href="https://github.com/scikit-learn/scikit-learn/blob/425407b/sklearn/base.py#L438"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#sklearn.base.TransformerMixin.fit_transform" title="Permalink to this definition"></a></dt>
198198
<dd><p>Fit to data, then transform it.</p>
199199
<p>Fits transformer to X and y with optional parameters fit_params
200200
and returns a transformed version of X.</p>

0 commit comments

Comments
 (0)