scikit-learn
diff --git a/‎dev/_downloads/auto_examples_jupyter.zip
214 Bytes b/‎dev/_downloads/auto_examples_jupyter.zip
214 Bytes
diff --git a/‎dev/_downloads/auto_examples_python.zip
211 Bytes b/‎dev/_downloads/auto_examples_python.zip
211 Bytes
diff --git a/‎dev/_downloads/plot_all_scaling.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/plot_all_scaling.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/plot_all_scaling.py
Lines changed: 7 additions & 8 deletions b/‎dev/_downloads/plot_all_scaling.py
Lines changed: 7 additions & 8 deletions
diff --git a/‎dev/_downloads/plot_power_transformer.ipynb
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/plot_power_transformer.ipynb
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/plot_power_transformer.py
Lines changed: 5 additions & 3 deletions b/‎dev/_downloads/plot_power_transformer.py
Lines changed: 5 additions & 3 deletions
diff --git a/‎dev/_downloads/scikit-learn-docs.pdf
-10 KB b/‎dev/_downloads/scikit-learn-docs.pdf
-10 KB
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
-208 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
-208 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0031.png
-208 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0031.png
-208 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
-457 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
-457 Bytes
@@ -141,7 +141,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "PowerTransformer (Box-Cox)\n--------------------------\n\n``PowerTransformer`` applies a power transformation to each\nfeature to make the data more Gaussian-like. Currently,\n``PowerTransformer`` implements the Box-Cox transform. It differs from\nQuantileTransformer (Gaussian output) in that it does not map the\ndata to a zero-mean, unit-variance Gaussian distribution. Instead, Box-Cox\nfinds the optimal scaling factor to stabilize variance and mimimize skewness\nthrough maximum likelihood estimation. Note that Box-Cox can only be applied\nto positive, non-zero data. Income and number of households happen to be\nstrictly positive, but if negative values are present, a constant can be\nadded to each feature to shift it into the positive range - this is known as\nthe two-parameter Box-Cox transform.\n\n"
+        "PowerTransformer (Box-Cox)\n--------------------------\n\n``PowerTransformer`` applies a power transformation to each\nfeature to make the data more Gaussian-like. Currently,\n``PowerTransformer`` implements the Box-Cox transform. The Box-Cox transform\nfinds the optimal scaling factor to stabilize variance and mimimize skewness\nthrough maximum likelihood estimation. By default, ``PowerTransformer`` also\napplies zero-mean, unit variance normalization to the transformed output.\nNote that Box-Cox can only be applied to positive, non-zero data. Income and\nnumber of households happen to be strictly positive, but if negative values\nare present, a constant can be added to each feature to shift it into the\npositive range - this is known as the two-parameter Box-Cox transform.\n\n"
       ]
     },
     {
 
@@ -297,15 +297,14 @@ def make_plot(item_idx):
 #
 # ``PowerTransformer`` applies a power transformation to each
 # feature to make the data more Gaussian-like. Currently,
-# ``PowerTransformer`` implements the Box-Cox transform. It differs from
-# QuantileTransformer (Gaussian output) in that it does not map the
-# data to a zero-mean, unit-variance Gaussian distribution. Instead, Box-Cox
+# ``PowerTransformer`` implements the Box-Cox transform. The Box-Cox transform
 # finds the optimal scaling factor to stabilize variance and mimimize skewness
-# through maximum likelihood estimation. Note that Box-Cox can only be applied
-# to positive, non-zero data. Income and number of households happen to be
-# strictly positive, but if negative values are present, a constant can be
-# added to each feature to shift it into the positive range - this is known as
-# the two-parameter Box-Cox transform.
+# through maximum likelihood estimation. By default, ``PowerTransformer`` also
+# applies zero-mean, unit variance normalization to the transformed output.
+# Note that Box-Cox can only be applied to positive, non-zero data. Income and
+# number of households happen to be strictly positive, but if negative values
+# are present, a constant can be added to each feature to shift it into the
+# positive range - this is known as the two-parameter Box-Cox transform.
 
 make_plot(5)
 
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Using PowerTransformer to apply the Box-Cox transformation\n\n\nThis example demonstrates the use of the Box-Cox transform through\n:class:`preprocessing.PowerTransformer` to map data from various distributions\nto a normal distribution.\n\nBox-Cox is useful as a transformation in modeling problems where\nhomoscedasticity and normality are desired. Below are examples of Box-Cox\napplied to six different probability distributions: Lognormal, Chi-squared,\nWeibull, Gaussian, Uniform, and Bimodal.\n\nNote that the transformation successfully maps the data to a normal\ndistribution when applied to certain datasets, but is ineffective with others.\nThis highlights the importance of visualizing the data before and after\ntransformation.\n\n"
+        "\n# Using PowerTransformer to apply the Box-Cox transformation\n\n\nThis example demonstrates the use of the Box-Cox transform through\n:class:`preprocessing.PowerTransformer` to map data from various distributions\nto a normal distribution.\n\nBox-Cox is useful as a transformation in modeling problems where\nhomoscedasticity and normality are desired. Below are examples of Box-Cox\napplied to six different probability distributions: Lognormal, Chi-squared,\nWeibull, Gaussian, Uniform, and Bimodal.\n\nNote that the transformation successfully maps the data to a normal\ndistribution when applied to certain datasets, but is ineffective with others.\nThis highlights the importance of visualizing the data before and after\ntransformation. Also note that while the standardize option is set to False for\nthe plot examples, by default, :class:`preprocessing.PowerTransformer` also\napplies zero-mean, unit-variance standardization to the transformed outputs.\n\n"
       ]
     },
     {
@@ -26,7 +26,7 @@
       },
       "outputs": [],
       "source": [
-        "# Author: Eric Chang <[email protected]>\n# License: BSD 3 clause\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.preprocessing import PowerTransformer, minmax_scale\n\nprint(__doc__)\n\n\nN_SAMPLES = 3000\nFONT_SIZE = 6\nBINS = 100\n\n\npt = PowerTransformer(method='box-cox')\nrng = np.random.RandomState(304)\nsize = (N_SAMPLES, 1)\n\n\n# lognormal distribution\nX_lognormal = rng.lognormal(size=size)\n\n# chi-squared distribution\ndf = 3\nX_chisq = rng.chisquare(df=df, size=size)\n\n# weibull distribution\na = 50\nX_weibull = rng.weibull(a=a, size=size)\n\n# gaussian distribution\nloc = 100\nX_gaussian = rng.normal(loc=loc, size=size)\n\n# uniform distirbution\nX_uniform = rng.uniform(low=0, high=1, size=size)\n\n# bimodal distribution\nloc_a, loc_b = 100, 105\nX_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)\nX_bimodal = np.concatenate([X_a, X_b], axis=0)\n\n\n# create plots\ndistributions = [\n    ('Lognormal', X_lognormal),\n    ('Chi-squared', X_chisq),\n    ('Weibull', X_weibull),\n    ('Gaussian', X_gaussian),\n    ('Uniform', X_uniform),\n    ('Bimodal', X_bimodal)\n]\n\ncolors = ['firebrick', 'darkorange', 'goldenrod',\n          'seagreen', 'royalblue', 'darkorchid']\n\nfig, axes = plt.subplots(nrows=4, ncols=3)\naxes = axes.flatten()\naxes_idxs = [(0, 3), (1, 4), (2, 5), (6, 9), (7, 10), (8, 11)]\naxes_list = [(axes[i], axes[j]) for i, j in axes_idxs]\n\n\nfor distribution, color, axes in zip(distributions, colors, axes_list):\n    name, X = distribution\n    # scale all distributions to the range [0, 10]\n    X = minmax_scale(X, feature_range=(1e-10, 10))\n\n    # perform power transform\n    X_trans = pt.fit_transform(X)\n    lmbda = round(pt.lambdas_[0], 2)\n\n    ax_original, ax_trans = axes\n\n    ax_original.hist(X, color=color, bins=BINS)\n    ax_original.set_title(name, fontsize=FONT_SIZE)\n    ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n    ax_trans.hist(X_trans, color=color, bins=BINS)\n    ax_trans.set_title('{} after Box-Cox, $\\lambda$ = {}'.format(name, lmbda),\n                       fontsize=FONT_SIZE)\n    ax_trans.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n\nplt.tight_layout()\nplt.show()"
+        "# Author: Eric Chang <[email protected]>\n# License: BSD 3 clause\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.preprocessing import PowerTransformer, minmax_scale\n\nprint(__doc__)\n\n\nN_SAMPLES = 3000\nFONT_SIZE = 6\nBINS = 100\n\n\npt = PowerTransformer(method='box-cox', standardize=False)\nrng = np.random.RandomState(304)\nsize = (N_SAMPLES, 1)\n\n\n# lognormal distribution\nX_lognormal = rng.lognormal(size=size)\n\n# chi-squared distribution\ndf = 3\nX_chisq = rng.chisquare(df=df, size=size)\n\n# weibull distribution\na = 50\nX_weibull = rng.weibull(a=a, size=size)\n\n# gaussian distribution\nloc = 100\nX_gaussian = rng.normal(loc=loc, size=size)\n\n# uniform distribution\nX_uniform = rng.uniform(low=0, high=1, size=size)\n\n# bimodal distribution\nloc_a, loc_b = 100, 105\nX_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)\nX_bimodal = np.concatenate([X_a, X_b], axis=0)\n\n\n# create plots\ndistributions = [\n    ('Lognormal', X_lognormal),\n    ('Chi-squared', X_chisq),\n    ('Weibull', X_weibull),\n    ('Gaussian', X_gaussian),\n    ('Uniform', X_uniform),\n    ('Bimodal', X_bimodal)\n]\n\ncolors = ['firebrick', 'darkorange', 'goldenrod',\n          'seagreen', 'royalblue', 'darkorchid']\n\nfig, axes = plt.subplots(nrows=4, ncols=3)\naxes = axes.flatten()\naxes_idxs = [(0, 3), (1, 4), (2, 5), (6, 9), (7, 10), (8, 11)]\naxes_list = [(axes[i], axes[j]) for i, j in axes_idxs]\n\n\nfor distribution, color, axes in zip(distributions, colors, axes_list):\n    name, X = distribution\n    # scale all distributions to the range [0, 10]\n    X = minmax_scale(X, feature_range=(1e-10, 10))\n\n    # perform power transform\n    X_trans = pt.fit_transform(X)\n    lmbda = round(pt.lambdas_[0], 2)\n\n    ax_original, ax_trans = axes\n\n    ax_original.hist(X, color=color, bins=BINS)\n    ax_original.set_title(name, fontsize=FONT_SIZE)\n    ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n    ax_trans.hist(X_trans, color=color, bins=BINS)\n    ax_trans.set_title('{} after Box-Cox, $\\lambda$ = {}'.format(name, lmbda),\n                       fontsize=FONT_SIZE)\n    ax_trans.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n\nplt.tight_layout()\nplt.show()"
       ]
     }
   ],
 
@@ -15,7 +15,9 @@
 Note that the transformation successfully maps the data to a normal
 distribution when applied to certain datasets, but is ineffective with others.
 This highlights the importance of visualizing the data before and after
-transformation.
+transformation. Also note that while the standardize option is set to False for
+the plot examples, by default, :class:`preprocessing.PowerTransformer` also
+applies zero-mean, unit-variance standardization to the transformed outputs.
 """
 
 # Author: Eric Chang <[email protected]>
@@ -34,7 +36,7 @@
 BINS = 100
 
 
-pt = PowerTransformer(method='box-cox')
+pt = PowerTransformer(method='box-cox', standardize=False)
 rng = np.random.RandomState(304)
 size = (N_SAMPLES, 1)
 
@@ -54,7 +56,7 @@
 loc = 100
 X_gaussian = rng.normal(loc=loc, size=size)
 
-# uniform distirbution
+# uniform distribution
 X_uniform = rng.uniform(low=0, high=1, size=size)
 
 # bimodal distribution
Original file line number	Diff line number	Diff line change
`@@ -141,7 +141,7 @@`
`141`	`141`	`"cell_type": "markdown",`
`142`	`142`	`"metadata": {},`
`143`	`143`	`"source": [`
`144`		- "PowerTransformer (Box-Cox)\n--------------------------\n\n``PowerTransformer`` applies a power transformation to each\nfeature to make the data more Gaussian-like. Currently,\n``PowerTransformer`` implements the Box-Cox transform. It differs from\nQuantileTransformer (Gaussian output) in that it does not map the\ndata to a zero-mean, unit-variance Gaussian distribution. Instead, Box-Cox\nfinds the optimal scaling factor to stabilize variance and mimimize skewness\nthrough maximum likelihood estimation. Note that Box-Cox can only be applied\nto positive, non-zero data. Income and number of households happen to be\nstrictly positive, but if negative values are present, a constant can be\nadded to each feature to shift it into the positive range - this is known as\nthe two-parameter Box-Cox transform.\n\n"
	`144`	+ "PowerTransformer (Box-Cox)\n--------------------------\n\n``PowerTransformer`` applies a power transformation to each\nfeature to make the data more Gaussian-like. Currently,\n``PowerTransformer`` implements the Box-Cox transform. The Box-Cox transform\nfinds the optimal scaling factor to stabilize variance and mimimize skewness\nthrough maximum likelihood estimation. By default, ``PowerTransformer`` also\napplies zero-mean, unit variance normalization to the transformed output.\nNote that Box-Cox can only be applied to positive, non-zero data. Income and\nnumber of households happen to be strictly positive, but if negative values\nare present, a constant can be added to each feature to shift it into the\npositive range - this is known as the two-parameter Box-Cox transform.\n\n"
`145`	`145`	`]`
`146`	`146`	`},`
`147`	`147`	`{`
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`"cell_type": "markdown",`
`16`	`16`	`"metadata": {},`
`17`	`17`	`"source": [`
`18`		- "\n# Using PowerTransformer to apply the Box-Cox transformation\n\n\nThis example demonstrates the use of the Box-Cox transform through\n:class:`preprocessing.PowerTransformer` to map data from various distributions\nto a normal distribution.\n\nBox-Cox is useful as a transformation in modeling problems where\nhomoscedasticity and normality are desired. Below are examples of Box-Cox\napplied to six different probability distributions: Lognormal, Chi-squared,\nWeibull, Gaussian, Uniform, and Bimodal.\n\nNote that the transformation successfully maps the data to a normal\ndistribution when applied to certain datasets, but is ineffective with others.\nThis highlights the importance of visualizing the data before and after\ntransformation.\n\n"
	`18`	+ "\n# Using PowerTransformer to apply the Box-Cox transformation\n\n\nThis example demonstrates the use of the Box-Cox transform through\n:class:`preprocessing.PowerTransformer` to map data from various distributions\nto a normal distribution.\n\nBox-Cox is useful as a transformation in modeling problems where\nhomoscedasticity and normality are desired. Below are examples of Box-Cox\napplied to six different probability distributions: Lognormal, Chi-squared,\nWeibull, Gaussian, Uniform, and Bimodal.\n\nNote that the transformation successfully maps the data to a normal\ndistribution when applied to certain datasets, but is ineffective with others.\nThis highlights the importance of visualizing the data before and after\ntransformation. Also note that while the standardize option is set to False for\nthe plot examples, by default, :class:`preprocessing.PowerTransformer` also\napplies zero-mean, unit-variance standardization to the transformed outputs.\n\n"
`19`	`19`	`]`
`20`	`20`	`},`
`21`	`21`	`{`
`@@ -26,7 +26,7 @@`
`26`	`26`	`},`
`27`	`27`	`"outputs": [],`
`28`	`28`	`"source": [`
`29`		- "# Author: Eric Chang <[email protected]>\n# License: BSD 3 clause\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.preprocessing import PowerTransformer, minmax_scale\n\nprint(__doc__)\n\n\nN_SAMPLES = 3000\nFONT_SIZE = 6\nBINS = 100\n\n\npt = PowerTransformer(method='box-cox')\nrng = np.random.RandomState(304)\nsize = (N_SAMPLES, 1)\n\n\n# lognormal distribution\nX_lognormal = rng.lognormal(size=size)\n\n# chi-squared distribution\ndf = 3\nX_chisq = rng.chisquare(df=df, size=size)\n\n# weibull distribution\na = 50\nX_weibull = rng.weibull(a=a, size=size)\n\n# gaussian distribution\nloc = 100\nX_gaussian = rng.normal(loc=loc, size=size)\n\n# uniform distirbution\nX_uniform = rng.uniform(low=0, high=1, size=size)\n\n# bimodal distribution\nloc_a, loc_b = 100, 105\nX_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)\nX_bimodal = np.concatenate([X_a, X_b], axis=0)\n\n\n# create plots\ndistributions = [\n ('Lognormal', X_lognormal),\n ('Chi-squared', X_chisq),\n ('Weibull', X_weibull),\n ('Gaussian', X_gaussian),\n ('Uniform', X_uniform),\n ('Bimodal', X_bimodal)\n]\n\ncolors = ['firebrick', 'darkorange', 'goldenrod',\n 'seagreen', 'royalblue', 'darkorchid']\n\nfig, axes = plt.subplots(nrows=4, ncols=3)\naxes = axes.flatten()\naxes_idxs = [(0, 3), (1, 4), (2, 5), (6, 9), (7, 10), (8, 11)]\naxes_list = [(axes[i], axes[j]) for i, j in axes_idxs]\n\n\nfor distribution, color, axes in zip(distributions, colors, axes_list):\n name, X = distribution\n # scale all distributions to the range [0, 10]\n X = minmax_scale(X, feature_range=(1e-10, 10))\n\n # perform power transform\n X_trans = pt.fit_transform(X)\n lmbda = round(pt.lambdas_[0], 2)\n\n ax_original, ax_trans = axes\n\n ax_original.hist(X, color=color, bins=BINS)\n ax_original.set_title(name, fontsize=FONT_SIZE)\n ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n ax_trans.hist(X_trans, color=color, bins=BINS)\n ax_trans.set_title('{} after Box-Cox, $\\lambda$ = {}'.format(name, lmbda),\n fontsize=FONT_SIZE)\n ax_trans.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n\nplt.tight_layout()\nplt.show()"
	`29`	+ "# Author: Eric Chang <[email protected]>\n# License: BSD 3 clause\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.preprocessing import PowerTransformer, minmax_scale\n\nprint(__doc__)\n\n\nN_SAMPLES = 3000\nFONT_SIZE = 6\nBINS = 100\n\n\npt = PowerTransformer(method='box-cox', standardize=False)\nrng = np.random.RandomState(304)\nsize = (N_SAMPLES, 1)\n\n\n# lognormal distribution\nX_lognormal = rng.lognormal(size=size)\n\n# chi-squared distribution\ndf = 3\nX_chisq = rng.chisquare(df=df, size=size)\n\n# weibull distribution\na = 50\nX_weibull = rng.weibull(a=a, size=size)\n\n# gaussian distribution\nloc = 100\nX_gaussian = rng.normal(loc=loc, size=size)\n\n# uniform distribution\nX_uniform = rng.uniform(low=0, high=1, size=size)\n\n# bimodal distribution\nloc_a, loc_b = 100, 105\nX_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)\nX_bimodal = np.concatenate([X_a, X_b], axis=0)\n\n\n# create plots\ndistributions = [\n ('Lognormal', X_lognormal),\n ('Chi-squared', X_chisq),\n ('Weibull', X_weibull),\n ('Gaussian', X_gaussian),\n ('Uniform', X_uniform),\n ('Bimodal', X_bimodal)\n]\n\ncolors = ['firebrick', 'darkorange', 'goldenrod',\n 'seagreen', 'royalblue', 'darkorchid']\n\nfig, axes = plt.subplots(nrows=4, ncols=3)\naxes = axes.flatten()\naxes_idxs = [(0, 3), (1, 4), (2, 5), (6, 9), (7, 10), (8, 11)]\naxes_list = [(axes[i], axes[j]) for i, j in axes_idxs]\n\n\nfor distribution, color, axes in zip(distributions, colors, axes_list):\n name, X = distribution\n # scale all distributions to the range [0, 10]\n X = minmax_scale(X, feature_range=(1e-10, 10))\n\n # perform power transform\n X_trans = pt.fit_transform(X)\n lmbda = round(pt.lambdas_[0], 2)\n\n ax_original, ax_trans = axes\n\n ax_original.hist(X, color=color, bins=BINS)\n ax_original.set_title(name, fontsize=FONT_SIZE)\n ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n ax_trans.hist(X_trans, color=color, bins=BINS)\n ax_trans.set_title('{} after Box-Cox, $\\lambda$ = {}'.format(name, lmbda),\n fontsize=FONT_SIZE)\n ax_trans.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n\nplt.tight_layout()\nplt.show()"
`30`	`30`	`]`
`31`	`31`	`}`
`32`	`32`	`],`