scikit-learn
diff --git a/‎dev/_downloads/plot_outlier_detection_housing.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/plot_outlier_detection_housing.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/plot_outlier_detection_housing.py
Lines changed: 2 additions & 1 deletion b/‎dev/_downloads/plot_outlier_detection_housing.py
Lines changed: 2 additions & 1 deletion
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
-14 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
-14 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0011.png
-14 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0011.png
-14 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
-23 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
-23 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0021.png
-23 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0021.png
-23 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
38 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
38 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0031.png
38 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0031.png
38 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
198 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
198 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0041.png
198 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0041.png
198 Bytes
@@ -15,7 +15,7 @@
     }, 
     {
       "source": [
-        "\n# Outlier detection on a real data set\n\n\nThis example illustrates the need for robust covariance estimation\non a real data set. It is useful both for outlier detection and for\na better understanding of the data structure.\n\nWe selected two sets of two variables from the Boston housing data set\nas an illustration of what kind of analysis can be done with several\noutlier detection tools. For the purpose of visualization, we are working\nwith two-dimensional examples, but one should be aware that things are\nnot so trivial in high-dimension, as it will be pointed out.\n\nIn both examples below, the main result is that the empirical covariance\nestimate, as a non-robust one, is highly influenced by the heterogeneous\nstructure of the observations. Although the robust covariance estimate is\nable to focus on the main mode of the data distribution, it sticks to the\nassumption that the data should be Gaussian distributed, yielding some biased\nestimation of the data structure, but yet accurate to some extent.\nThe One-Class SVM algorithm\n\nFirst example\n-------------\nThe first example illustrates how robust covariance estimation can help\nconcentrating on a relevant cluster when another one exists. Here, many\nobservations are confounded into one and break down the empirical covariance\nestimation.\nOf course, some screening tools would have pointed out the presence of two\nclusters (Support Vector Machines, Gaussian Mixture Models, univariate\noutlier detection, ...). But had it been a high-dimensional example, none\nof these could be applied that easily.\n\nSecond example\n--------------\nThe second example shows the ability of the Minimum Covariance Determinant\nrobust estimator of covariance to concentrate on the main mode of the data\ndistribution: the ___location seems to be well estimated, although the covariance\nis hard to estimate due to the banana-shaped distribution. Anyway, we can\nget rid of some outlying observations.\nThe One-Class SVM is able to capture the real data structure, but the\ndifficulty is to adjust its kernel bandwidth parameter so as to obtain\na good compromise between the shape of the data scatter matrix and the\nrisk of over-fitting the data.\n\n"
+        "\n# Outlier detection on a real data set\n\n\nThis example illustrates the need for robust covariance estimation\non a real data set. It is useful both for outlier detection and for\na better understanding of the data structure.\n\nWe selected two sets of two variables from the Boston housing data set\nas an illustration of what kind of analysis can be done with several\noutlier detection tools. For the purpose of visualization, we are working\nwith two-dimensional examples, but one should be aware that things are\nnot so trivial in high-dimension, as it will be pointed out.\n\nIn both examples below, the main result is that the empirical covariance\nestimate, as a non-robust one, is highly influenced by the heterogeneous\nstructure of the observations. Although the robust covariance estimate is\nable to focus on the main mode of the data distribution, it sticks to the\nassumption that the data should be Gaussian distributed, yielding some biased\nestimation of the data structure, but yet accurate to some extent.\nThe One-Class SVM does not assume any parametric form of the data distribution\nand can therefore model the complex shape of the data much better.\n\nFirst example\n-------------\nThe first example illustrates how robust covariance estimation can help\nconcentrating on a relevant cluster when another one exists. Here, many\nobservations are confounded into one and break down the empirical covariance\nestimation.\nOf course, some screening tools would have pointed out the presence of two\nclusters (Support Vector Machines, Gaussian Mixture Models, univariate\noutlier detection, ...). But had it been a high-dimensional example, none\nof these could be applied that easily.\n\nSecond example\n--------------\nThe second example shows the ability of the Minimum Covariance Determinant\nrobust estimator of covariance to concentrate on the main mode of the data\ndistribution: the ___location seems to be well estimated, although the covariance\nis hard to estimate due to the banana-shaped distribution. Anyway, we can\nget rid of some outlying observations.\nThe One-Class SVM is able to capture the real data structure, but the\ndifficulty is to adjust its kernel bandwidth parameter so as to obtain\na good compromise between the shape of the data scatter matrix and the\nrisk of over-fitting the data.\n\n"
       ], 
       "cell_type": "markdown", 
       "metadata": {}
 
@@ -19,7 +19,8 @@
 able to focus on the main mode of the data distribution, it sticks to the
 assumption that the data should be Gaussian distributed, yielding some biased
 estimation of the data structure, but yet accurate to some extent.
-The One-Class SVM algorithm
+The One-Class SVM does not assume any parametric form of the data distribution
+and can therefore model the complex shape of the data much better.
 
 First example
 -------------
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`},`
`16`	`16`	`{`
`17`	`17`	`"source": [`
`18`		- "\n# Outlier detection on a real data set\n\n\nThis example illustrates the need for robust covariance estimation\non a real data set. It is useful both for outlier detection and for\na better understanding of the data structure.\n\nWe selected two sets of two variables from the Boston housing data set\nas an illustration of what kind of analysis can be done with several\noutlier detection tools. For the purpose of visualization, we are working\nwith two-dimensional examples, but one should be aware that things are\nnot so trivial in high-dimension, as it will be pointed out.\n\nIn both examples below, the main result is that the empirical covariance\nestimate, as a non-robust one, is highly influenced by the heterogeneous\nstructure of the observations. Although the robust covariance estimate is\nable to focus on the main mode of the data distribution, it sticks to the\nassumption that the data should be Gaussian distributed, yielding some biased\nestimation of the data structure, but yet accurate to some extent.\nThe One-Class SVM algorithm\n\nFirst example\n-------------\nThe first example illustrates how robust covariance estimation can help\nconcentrating on a relevant cluster when another one exists. Here, many\nobservations are confounded into one and break down the empirical covariance\nestimation.\nOf course, some screening tools would have pointed out the presence of two\nclusters (Support Vector Machines, Gaussian Mixture Models, univariate\noutlier detection, ...). But had it been a high-dimensional example, none\nof these could be applied that easily.\n\nSecond example\n--------------\nThe second example shows the ability of the Minimum Covariance Determinant\nrobust estimator of covariance to concentrate on the main mode of the data\ndistribution: the ___location seems to be well estimated, although the covariance\nis hard to estimate due to the banana-shaped distribution. Anyway, we can\nget rid of some outlying observations.\nThe One-Class SVM is able to capture the real data structure, but the\ndifficulty is to adjust its kernel bandwidth parameter so as to obtain\na good compromise between the shape of the data scatter matrix and the\nrisk of over-fitting the data.\n\n"
	`18`	+ "\n# Outlier detection on a real data set\n\n\nThis example illustrates the need for robust covariance estimation\non a real data set. It is useful both for outlier detection and for\na better understanding of the data structure.\n\nWe selected two sets of two variables from the Boston housing data set\nas an illustration of what kind of analysis can be done with several\noutlier detection tools. For the purpose of visualization, we are working\nwith two-dimensional examples, but one should be aware that things are\nnot so trivial in high-dimension, as it will be pointed out.\n\nIn both examples below, the main result is that the empirical covariance\nestimate, as a non-robust one, is highly influenced by the heterogeneous\nstructure of the observations. Although the robust covariance estimate is\nable to focus on the main mode of the data distribution, it sticks to the\nassumption that the data should be Gaussian distributed, yielding some biased\nestimation of the data structure, but yet accurate to some extent.\nThe One-Class SVM does not assume any parametric form of the data distribution\nand can therefore model the complex shape of the data much better.\n\nFirst example\n-------------\nThe first example illustrates how robust covariance estimation can help\nconcentrating on a relevant cluster when another one exists. Here, many\nobservations are confounded into one and break down the empirical covariance\nestimation.\nOf course, some screening tools would have pointed out the presence of two\nclusters (Support Vector Machines, Gaussian Mixture Models, univariate\noutlier detection, ...). But had it been a high-dimensional example, none\nof these could be applied that easily.\n\nSecond example\n--------------\nThe second example shows the ability of the Minimum Covariance Determinant\nrobust estimator of covariance to concentrate on the main mode of the data\ndistribution: the ___location seems to be well estimated, although the covariance\nis hard to estimate due to the banana-shaped distribution. Anyway, we can\nget rid of some outlying observations.\nThe One-Class SVM is able to capture the real data structure, but the\ndifficulty is to adjust its kernel bandwidth parameter so as to obtain\na good compromise between the shape of the data scatter matrix and the\nrisk of over-fitting the data.\n\n"
`19`	`19`	`],`
`20`	`20`	`"cell_type": "markdown",`
`21`	`21`	`"metadata": {}`