Skip to content

Commit 607f631

Browse files
committed
Pushing the docs to dev/ for branch: master, commit d8483dfee068b082cff0829e9e1a7dce698bc4b3
1 parent 3c54c3e commit 607f631

File tree

981 files changed

+3051
-3038
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

981 files changed

+3051
-3038
lines changed
214 Bytes
Binary file not shown.
211 Bytes
Binary file not shown.

dev/_downloads/plot_all_scaling.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@
141141
"cell_type": "markdown",
142142
"metadata": {},
143143
"source": [
144-
"PowerTransformer (Box-Cox)\n--------------------------\n\n``PowerTransformer`` applies a power transformation to each\nfeature to make the data more Gaussian-like. Currently,\n``PowerTransformer`` implements the Box-Cox transform. It differs from\nQuantileTransformer (Gaussian output) in that it does not map the\ndata to a zero-mean, unit-variance Gaussian distribution. Instead, Box-Cox\nfinds the optimal scaling factor to stabilize variance and mimimize skewness\nthrough maximum likelihood estimation. Note that Box-Cox can only be applied\nto positive, non-zero data. Income and number of households happen to be\nstrictly positive, but if negative values are present, a constant can be\nadded to each feature to shift it into the positive range - this is known as\nthe two-parameter Box-Cox transform.\n\n"
144+
"PowerTransformer (Box-Cox)\n--------------------------\n\n``PowerTransformer`` applies a power transformation to each\nfeature to make the data more Gaussian-like. Currently,\n``PowerTransformer`` implements the Box-Cox transform. The Box-Cox transform\nfinds the optimal scaling factor to stabilize variance and mimimize skewness\nthrough maximum likelihood estimation. By default, ``PowerTransformer`` also\napplies zero-mean, unit variance normalization to the transformed output.\nNote that Box-Cox can only be applied to positive, non-zero data. Income and\nnumber of households happen to be strictly positive, but if negative values\nare present, a constant can be added to each feature to shift it into the\npositive range - this is known as the two-parameter Box-Cox transform.\n\n"
145145
]
146146
},
147147
{

dev/_downloads/plot_all_scaling.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -297,15 +297,14 @@ def make_plot(item_idx):
297297
#
298298
# ``PowerTransformer`` applies a power transformation to each
299299
# feature to make the data more Gaussian-like. Currently,
300-
# ``PowerTransformer`` implements the Box-Cox transform. It differs from
301-
# QuantileTransformer (Gaussian output) in that it does not map the
302-
# data to a zero-mean, unit-variance Gaussian distribution. Instead, Box-Cox
300+
# ``PowerTransformer`` implements the Box-Cox transform. The Box-Cox transform
303301
# finds the optimal scaling factor to stabilize variance and mimimize skewness
304-
# through maximum likelihood estimation. Note that Box-Cox can only be applied
305-
# to positive, non-zero data. Income and number of households happen to be
306-
# strictly positive, but if negative values are present, a constant can be
307-
# added to each feature to shift it into the positive range - this is known as
308-
# the two-parameter Box-Cox transform.
302+
# through maximum likelihood estimation. By default, ``PowerTransformer`` also
303+
# applies zero-mean, unit variance normalization to the transformed output.
304+
# Note that Box-Cox can only be applied to positive, non-zero data. Income and
305+
# number of households happen to be strictly positive, but if negative values
306+
# are present, a constant can be added to each feature to shift it into the
307+
# positive range - this is known as the two-parameter Box-Cox transform.
309308

310309
make_plot(5)
311310

dev/_downloads/plot_power_transformer.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Using PowerTransformer to apply the Box-Cox transformation\n\n\nThis example demonstrates the use of the Box-Cox transform through\n:class:`preprocessing.PowerTransformer` to map data from various distributions\nto a normal distribution.\n\nBox-Cox is useful as a transformation in modeling problems where\nhomoscedasticity and normality are desired. Below are examples of Box-Cox\napplied to six different probability distributions: Lognormal, Chi-squared,\nWeibull, Gaussian, Uniform, and Bimodal.\n\nNote that the transformation successfully maps the data to a normal\ndistribution when applied to certain datasets, but is ineffective with others.\nThis highlights the importance of visualizing the data before and after\ntransformation.\n\n"
18+
"\n# Using PowerTransformer to apply the Box-Cox transformation\n\n\nThis example demonstrates the use of the Box-Cox transform through\n:class:`preprocessing.PowerTransformer` to map data from various distributions\nto a normal distribution.\n\nBox-Cox is useful as a transformation in modeling problems where\nhomoscedasticity and normality are desired. Below are examples of Box-Cox\napplied to six different probability distributions: Lognormal, Chi-squared,\nWeibull, Gaussian, Uniform, and Bimodal.\n\nNote that the transformation successfully maps the data to a normal\ndistribution when applied to certain datasets, but is ineffective with others.\nThis highlights the importance of visualizing the data before and after\ntransformation. Also note that while the standardize option is set to False for\nthe plot examples, by default, :class:`preprocessing.PowerTransformer` also\napplies zero-mean, unit-variance standardization to the transformed outputs.\n\n"
1919
]
2020
},
2121
{
@@ -26,7 +26,7 @@
2626
},
2727
"outputs": [],
2828
"source": [
29-
"# Author: Eric Chang <[email protected]>\n# License: BSD 3 clause\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.preprocessing import PowerTransformer, minmax_scale\n\nprint(__doc__)\n\n\nN_SAMPLES = 3000\nFONT_SIZE = 6\nBINS = 100\n\n\npt = PowerTransformer(method='box-cox')\nrng = np.random.RandomState(304)\nsize = (N_SAMPLES, 1)\n\n\n# lognormal distribution\nX_lognormal = rng.lognormal(size=size)\n\n# chi-squared distribution\ndf = 3\nX_chisq = rng.chisquare(df=df, size=size)\n\n# weibull distribution\na = 50\nX_weibull = rng.weibull(a=a, size=size)\n\n# gaussian distribution\nloc = 100\nX_gaussian = rng.normal(loc=loc, size=size)\n\n# uniform distirbution\nX_uniform = rng.uniform(low=0, high=1, size=size)\n\n# bimodal distribution\nloc_a, loc_b = 100, 105\nX_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)\nX_bimodal = np.concatenate([X_a, X_b], axis=0)\n\n\n# create plots\ndistributions = [\n ('Lognormal', X_lognormal),\n ('Chi-squared', X_chisq),\n ('Weibull', X_weibull),\n ('Gaussian', X_gaussian),\n ('Uniform', X_uniform),\n ('Bimodal', X_bimodal)\n]\n\ncolors = ['firebrick', 'darkorange', 'goldenrod',\n 'seagreen', 'royalblue', 'darkorchid']\n\nfig, axes = plt.subplots(nrows=4, ncols=3)\naxes = axes.flatten()\naxes_idxs = [(0, 3), (1, 4), (2, 5), (6, 9), (7, 10), (8, 11)]\naxes_list = [(axes[i], axes[j]) for i, j in axes_idxs]\n\n\nfor distribution, color, axes in zip(distributions, colors, axes_list):\n name, X = distribution\n # scale all distributions to the range [0, 10]\n X = minmax_scale(X, feature_range=(1e-10, 10))\n\n # perform power transform\n X_trans = pt.fit_transform(X)\n lmbda = round(pt.lambdas_[0], 2)\n\n ax_original, ax_trans = axes\n\n ax_original.hist(X, color=color, bins=BINS)\n ax_original.set_title(name, fontsize=FONT_SIZE)\n ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n ax_trans.hist(X_trans, color=color, bins=BINS)\n ax_trans.set_title('{} after Box-Cox, $\\lambda$ = {}'.format(name, lmbda),\n fontsize=FONT_SIZE)\n ax_trans.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n\nplt.tight_layout()\nplt.show()"
29+
"# Author: Eric Chang <[email protected]>\n# License: BSD 3 clause\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn.preprocessing import PowerTransformer, minmax_scale\n\nprint(__doc__)\n\n\nN_SAMPLES = 3000\nFONT_SIZE = 6\nBINS = 100\n\n\npt = PowerTransformer(method='box-cox', standardize=False)\nrng = np.random.RandomState(304)\nsize = (N_SAMPLES, 1)\n\n\n# lognormal distribution\nX_lognormal = rng.lognormal(size=size)\n\n# chi-squared distribution\ndf = 3\nX_chisq = rng.chisquare(df=df, size=size)\n\n# weibull distribution\na = 50\nX_weibull = rng.weibull(a=a, size=size)\n\n# gaussian distribution\nloc = 100\nX_gaussian = rng.normal(loc=loc, size=size)\n\n# uniform distribution\nX_uniform = rng.uniform(low=0, high=1, size=size)\n\n# bimodal distribution\nloc_a, loc_b = 100, 105\nX_a, X_b = rng.normal(loc=loc_a, size=size), rng.normal(loc=loc_b, size=size)\nX_bimodal = np.concatenate([X_a, X_b], axis=0)\n\n\n# create plots\ndistributions = [\n ('Lognormal', X_lognormal),\n ('Chi-squared', X_chisq),\n ('Weibull', X_weibull),\n ('Gaussian', X_gaussian),\n ('Uniform', X_uniform),\n ('Bimodal', X_bimodal)\n]\n\ncolors = ['firebrick', 'darkorange', 'goldenrod',\n 'seagreen', 'royalblue', 'darkorchid']\n\nfig, axes = plt.subplots(nrows=4, ncols=3)\naxes = axes.flatten()\naxes_idxs = [(0, 3), (1, 4), (2, 5), (6, 9), (7, 10), (8, 11)]\naxes_list = [(axes[i], axes[j]) for i, j in axes_idxs]\n\n\nfor distribution, color, axes in zip(distributions, colors, axes_list):\n name, X = distribution\n # scale all distributions to the range [0, 10]\n X = minmax_scale(X, feature_range=(1e-10, 10))\n\n # perform power transform\n X_trans = pt.fit_transform(X)\n lmbda = round(pt.lambdas_[0], 2)\n\n ax_original, ax_trans = axes\n\n ax_original.hist(X, color=color, bins=BINS)\n ax_original.set_title(name, fontsize=FONT_SIZE)\n ax_original.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n ax_trans.hist(X_trans, color=color, bins=BINS)\n ax_trans.set_title('{} after Box-Cox, $\\lambda$ = {}'.format(name, lmbda),\n fontsize=FONT_SIZE)\n ax_trans.tick_params(axis='both', which='major', labelsize=FONT_SIZE)\n\n\nplt.tight_layout()\nplt.show()"
3030
]
3131
}
3232
],

dev/_downloads/plot_power_transformer.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@
1515
Note that the transformation successfully maps the data to a normal
1616
distribution when applied to certain datasets, but is ineffective with others.
1717
This highlights the importance of visualizing the data before and after
18-
transformation.
18+
transformation. Also note that while the standardize option is set to False for
19+
the plot examples, by default, :class:`preprocessing.PowerTransformer` also
20+
applies zero-mean, unit-variance standardization to the transformed outputs.
1921
"""
2022

2123
# Author: Eric Chang <[email protected]>
@@ -34,7 +36,7 @@
3436
BINS = 100
3537

3638

37-
pt = PowerTransformer(method='box-cox')
39+
pt = PowerTransformer(method='box-cox', standardize=False)
3840
rng = np.random.RandomState(304)
3941
size = (N_SAMPLES, 1)
4042

@@ -54,7 +56,7 @@
5456
loc = 100
5557
X_gaussian = rng.normal(loc=loc, size=size)
5658

57-
# uniform distirbution
59+
# uniform distribution
5860
X_uniform = rng.uniform(low=0, high=1, size=size)
5961

6062
# bimodal distribution

dev/_downloads/scikit-learn-docs.pdf

-10 KB
Binary file not shown.

0 commit comments

Comments
 (0)