Skip to content

Commit 6361f01

Browse files
committed
Pushing the docs to dev/ for branch: master, commit 7bc20c0e5013bd79f257c0ea5abdf6d6d8d8f269
1 parent 7800e3d commit 6361f01

File tree

1,080 files changed

+3925
-3865
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,080 files changed

+3925
-3865
lines changed
11 Bytes
Binary file not shown.
11 Bytes
Binary file not shown.

dev/_downloads/plot_robust_vs_empirical_covariance.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Robust vs Empirical covariance estimate\n\n\nThe usual covariance maximum likelihood estimate is very sensitive to the\npresence of outliers in the data set. In such a case, it would be better to\nuse a robust estimator of covariance to guarantee that the estimation is\nresistant to \"erroneous\" observations in the data set.\n\nMinimum Covariance Determinant Estimator\n----------------------------------------\nThe Minimum Covariance Determinant estimator is a robust, high-breakdown point\n(i.e. it can be used to estimate the covariance matrix of highly contaminated\ndatasets, up to\n$\\frac{n_\\text{samples} - n_\\text{features}-1}{2}$ outliers) estimator of\ncovariance. The idea is to find\n$\\frac{n_\\text{samples} + n_\\text{features}+1}{2}$\nobservations whose empirical covariance has the smallest determinant, yielding\na \"pure\" subset of observations from which to compute standards estimates of\nlocation and covariance. After a correction step aiming at compensating the\nfact that the estimates were learned from only a portion of the initial data,\nwe end up with robust estimates of the data set ___location and covariance.\n\nThe Minimum Covariance Determinant estimator (MCD) has been introduced by\nP.J.Rousseuw in [1]_.\n\nEvaluation\n----------\nIn this example, we compare the estimation errors that are made when using\nvarious types of ___location and covariance estimates on contaminated Gaussian\ndistributed data sets:\n\n- The mean and the empirical covariance of the full dataset, which break\n down as soon as there are outliers in the data set\n- The robust MCD, that has a low error provided\n $n_\\text{samples} > 5n_\\text{features}$\n- The mean and the empirical covariance of the observations that are known\n to be good ones. This can be considered as a \"perfect\" MCD estimation,\n so one can trust our implementation by comparing to this case.\n\n\nReferences\n----------\n.. [1] P. J. Rousseeuw. Least median of squares regression. Journal of American\n Statistical Ass., 79:871, 1984.\n.. [2] Johanna Hardin, David M Rocke. The distribution of robust distances.\n Journal of Computational and Graphical Statistics. December 1, 2005,\n 14(4): 928-946.\n.. [3] Zoubir A., Koivunen V., Chakhchoukh Y. and Muma M. (2012). Robust\n estimation in signal processing: A tutorial-style treatment of\n fundamental concepts. IEEE Signal Processing Magazine 29(4), 61-80.\n\n\n"
18+
"\n# Robust vs Empirical covariance estimate\n\n\nThe usual covariance maximum likelihood estimate is very sensitive to the\npresence of outliers in the data set. In such a case, it would be better to\nuse a robust estimator of covariance to guarantee that the estimation is\nresistant to \"erroneous\" observations in the data set. [1]_, [2]_\n\nMinimum Covariance Determinant Estimator\n----------------------------------------\nThe Minimum Covariance Determinant estimator is a robust, high-breakdown point\n(i.e. it can be used to estimate the covariance matrix of highly contaminated\ndatasets, up to\n$\\frac{n_\\text{samples} - n_\\text{features}-1}{2}$ outliers) estimator of\ncovariance. The idea is to find\n$\\frac{n_\\text{samples} + n_\\text{features}+1}{2}$\nobservations whose empirical covariance has the smallest determinant, yielding\na \"pure\" subset of observations from which to compute standards estimates of\nlocation and covariance. After a correction step aiming at compensating the\nfact that the estimates were learned from only a portion of the initial data,\nwe end up with robust estimates of the data set ___location and covariance.\n\nThe Minimum Covariance Determinant estimator (MCD) has been introduced by\nP.J.Rousseuw in [3]_.\n\nEvaluation\n----------\nIn this example, we compare the estimation errors that are made when using\nvarious types of ___location and covariance estimates on contaminated Gaussian\ndistributed data sets:\n\n- The mean and the empirical covariance of the full dataset, which break\n down as soon as there are outliers in the data set\n- The robust MCD, that has a low error provided\n $n_\\text{samples} > 5n_\\text{features}$\n- The mean and the empirical covariance of the observations that are known\n to be good ones. This can be considered as a \"perfect\" MCD estimation,\n so one can trust our implementation by comparing to this case.\n\n\nReferences\n----------\n.. [1] Johanna Hardin, David M Rocke. The distribution of robust distances.\n Journal of Computational and Graphical Statistics. December 1, 2005,\n 14(4): 928-946.\n.. [2] Zoubir A., Koivunen V., Chakhchoukh Y. and Muma M. (2012). Robust\n estimation in signal processing: A tutorial-style treatment of\n fundamental concepts. IEEE Signal Processing Magazine 29(4), 61-80.\n.. [3] P. J. Rousseeuw. Least median of squares regression. Journal of American\n Statistical Ass., 79:871, 1984.\n\n\n"
1919
]
2020
},
2121
{

dev/_downloads/plot_robust_vs_empirical_covariance.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
The usual covariance maximum likelihood estimate is very sensitive to the
77
presence of outliers in the data set. In such a case, it would be better to
88
use a robust estimator of covariance to guarantee that the estimation is
9-
resistant to "erroneous" observations in the data set.
9+
resistant to "erroneous" observations in the data set. [1]_, [2]_
1010
1111
Minimum Covariance Determinant Estimator
1212
----------------------------------------
@@ -23,7 +23,7 @@
2323
we end up with robust estimates of the data set ___location and covariance.
2424
2525
The Minimum Covariance Determinant estimator (MCD) has been introduced by
26-
P.J.Rousseuw in [1]_.
26+
P.J.Rousseuw in [3]_.
2727
2828
Evaluation
2929
----------
@@ -42,14 +42,14 @@
4242
4343
References
4444
----------
45-
.. [1] P. J. Rousseeuw. Least median of squares regression. Journal of American
46-
Statistical Ass., 79:871, 1984.
47-
.. [2] Johanna Hardin, David M Rocke. The distribution of robust distances.
45+
.. [1] Johanna Hardin, David M Rocke. The distribution of robust distances.
4846
Journal of Computational and Graphical Statistics. December 1, 2005,
4947
14(4): 928-946.
50-
.. [3] Zoubir A., Koivunen V., Chakhchoukh Y. and Muma M. (2012). Robust
48+
.. [2] Zoubir A., Koivunen V., Chakhchoukh Y. and Muma M. (2012). Robust
5149
estimation in signal processing: A tutorial-style treatment of
5250
fundamental concepts. IEEE Signal Processing Magazine 29(4), 61-80.
51+
.. [3] P. J. Rousseeuw. Least median of squares regression. Journal of American
52+
Statistical Ass., 79:871, 1984.
5353
5454
"""
5555
print(__doc__)

dev/_downloads/scikit-learn-docs.pdf

-658 KB
Binary file not shown.

dev/_images/iris.png

0 Bytes

0 commit comments

Comments
 (0)