Skip to content

Commit c34f6b5

Browse files
committed
Pushing the docs to dev/ for branch: main, commit b948fdba24a4f5064485627ab1d2b934026312b6
1 parent 0cd2340 commit c34f6b5

File tree

1,296 files changed

+5801
-5790
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,296 files changed

+5801
-5790
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: c65caea7dd935ad3829460e9f374daab
3+
config: bccc80489749706009befbc2448fc488
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.

dev/_downloads/4f6558a73e0c79834afc005bac34dc13/plot_target_encoder_cross_val.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
"""
2-
==========================================
3-
Target Encoder's Internal Cross Validation
4-
==========================================
2+
=======================================
3+
Target Encoder's Internal Cross fitting
4+
=======================================
55
66
.. currentmodule:: sklearn.preprocessing
77
88
The :class:`TargetEnocoder` replaces each category of a categorical feature with
99
the mean of the target variable for that category. This method is useful
1010
in cases where there is a strong relationship between the categorical feature
1111
and the target. To prevent overfitting, :meth:`TargetEncoder.fit_transform` uses
12-
interval cross validation to encode the training data to be used by a downstream
13-
model. In this example, we demonstrate the importance of the cross validation
12+
an internal cross fitting scheme to encode the training data to be used by a
13+
downstream model. In this example, we demonstrate the importance of the cross fitting
1414
procedure to prevent overfitting.
1515
"""
1616

@@ -49,11 +49,11 @@
4949

5050
# %%
5151
# The uninformative feature with high cardinality is generated so that is independent of
52-
# the target variable. We will show that target encoding without cross validation will
52+
# the target variable. We will show that target encoding without cross fitting will
5353
# cause catastrophic overfitting for the downstream regressor. These high cardinality
5454
# features are basically unique identifiers for samples which should generally be
5555
# removed from machine learning dataset. In this example, we generate them to show how
56-
# :class:`TargetEncoder`'s default cross validation behavior mitigates the overfitting
56+
# :class:`TargetEncoder`'s default cross fitting behavior mitigates the overfitting
5757
# issue automatically.
5858
X_near_unique_categories = rng.choice(
5959
int(0.9 * n_samples), size=n_samples, replace=True
@@ -79,7 +79,7 @@
7979
# ==========================
8080
# In this section, we train a ridge regressor on the dataset with and without
8181
# encoding and explore the influence of target encoder with and without the
82-
# interval cross validation. First, we see the Ridge model trained on the
82+
# internal cross fitting. First, we see the Ridge model trained on the
8383
# raw features will have low performance, because the order of the informative
8484
# feature is not informative:
8585
import sklearn
@@ -96,7 +96,7 @@
9696

9797
# %%
9898
# Next, we create a pipeline with the target encoder and ridge model. The pipeline
99-
# uses :meth:`TargetEncoder.fit_transform` which uses cross validation. We see that
99+
# uses :meth:`TargetEncoder.fit_transform` which uses cross fitting. We see that
100100
# the model fits the data well and generalizes to the test set:
101101
from sklearn.pipeline import make_pipeline
102102
from sklearn.preprocessing import TargetEncoder
@@ -120,11 +120,11 @@
120120
_ = coefs_cv.plot(kind="barh")
121121

122122
# %%
123-
# While :meth:`TargetEncoder.fit_transform` uses an interval cross validation,
124-
# :meth:`TargetEncoder.transform` itself does not perform any cross validation.
123+
# While :meth:`TargetEncoder.fit_transform` uses an internal cross fitting scheme,
124+
# :meth:`TargetEncoder.transform` itself does not perform any cross fitting.
125125
# It uses the aggregation of the complete training set to transform the categorical
126126
# features. Thus, we can use :meth:`TargetEncoder.fit` followed by
127-
# :meth:`TargetEncoder.transform` to disable the cross validation. This encoding
127+
# :meth:`TargetEncoder.transform` to disable the cross fitting. This encoding
128128
# is then passed to the ridge model.
129129
target_encoder = TargetEncoder(random_state=0)
130130
target_encoder.fit(X_train, y_train)
@@ -154,8 +154,8 @@
154154
# %%
155155
# Conclusion
156156
# ==========
157-
# This example demonstrates the importance of :class:`TargetEncoder`'s interval cross
158-
# validation. It is important to use :meth:`TargetEncoder.fit_transform` to encode
157+
# This example demonstrates the importance of :class:`TargetEncoder`'s internal cross
158+
# fitting. It is important to use :meth:`TargetEncoder.fit_transform` to encode
159159
# training data before passing it to a machine learning model. When a
160160
# :class:`TargetEncoder` is a part of a :class:`~sklearn.pipeline.Pipeline` and the
161161
# pipeline is fitted, the pipeline will correctly call
Binary file not shown.

dev/_downloads/7b414ce0c39e11cf961fd4fa23008246/plot_target_encoder.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Comparing Target Encoder with Other Encoders\n\n.. currentmodule:: sklearn.preprocessing\n\nThe :class:`TargetEncoder` uses the value of the target to encode each\ncategorical feature. In this example, we will compare three different approaches\nfor handling categorical features: :class:`TargetEncoder`,\n:class:`OrdinalEncoder`, :class:`OneHotEncoder` and dropping the category.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>`fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a\n cross-validation scheme is used in `fit_transform` for encoding. See the\n `User Guide <target_encoder>`. for details.</p></div>\n"
7+
"\n# Comparing Target Encoder with Other Encoders\n\n.. currentmodule:: sklearn.preprocessing\n\nThe :class:`TargetEncoder` uses the value of the target to encode each\ncategorical feature. In this example, we will compare three different approaches\nfor handling categorical features: :class:`TargetEncoder`,\n:class:`OrdinalEncoder`, :class:`OneHotEncoder` and dropping the category.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>`fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a\n cross fitting scheme is used in `fit_transform` for encoding. See the\n `User Guide <target_encoder>`. for details.</p></div>\n"
88
]
99
},
1010
{

dev/_downloads/c3f95dc25241c64632f9c3378fd4e89b/plot_target_encoder_cross_val.ipynb

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Target Encoder's Internal Cross Validation\n\n.. currentmodule:: sklearn.preprocessing\n\nThe :class:`TargetEnocoder` replaces each category of a categorical feature with\nthe mean of the target variable for that category. This method is useful\nin cases where there is a strong relationship between the categorical feature\nand the target. To prevent overfitting, :meth:`TargetEncoder.fit_transform` uses\ninterval cross validation to encode the training data to be used by a downstream\nmodel. In this example, we demonstrate the importance of the cross validation\nprocedure to prevent overfitting.\n"
7+
"\n# Target Encoder's Internal Cross fitting\n\n.. currentmodule:: sklearn.preprocessing\n\nThe :class:`TargetEnocoder` replaces each category of a categorical feature with\nthe mean of the target variable for that category. This method is useful\nin cases where there is a strong relationship between the categorical feature\nand the target. To prevent overfitting, :meth:`TargetEncoder.fit_transform` uses\nan internal cross fitting scheme to encode the training data to be used by a\ndownstream model. In this example, we demonstrate the importance of the cross fitting\nprocedure to prevent overfitting.\n"
88
]
99
},
1010
{
@@ -47,7 +47,7 @@
4747
"cell_type": "markdown",
4848
"metadata": {},
4949
"source": [
50-
"The uninformative feature with high cardinality is generated so that is independent of\nthe target variable. We will show that target encoding without cross validation will\ncause catastrophic overfitting for the downstream regressor. These high cardinality\nfeatures are basically unique identifiers for samples which should generally be\nremoved from machine learning dataset. In this example, we generate them to show how\n:class:`TargetEncoder`'s default cross validation behavior mitigates the overfitting\nissue automatically.\n\n"
50+
"The uninformative feature with high cardinality is generated so that is independent of\nthe target variable. We will show that target encoding without cross fitting will\ncause catastrophic overfitting for the downstream regressor. These high cardinality\nfeatures are basically unique identifiers for samples which should generally be\nremoved from machine learning dataset. In this example, we generate them to show how\n:class:`TargetEncoder`'s default cross fitting behavior mitigates the overfitting\nissue automatically.\n\n"
5151
]
5252
},
5353
{
@@ -83,7 +83,7 @@
8383
"cell_type": "markdown",
8484
"metadata": {},
8585
"source": [
86-
"## Training a Ridge Regressor\nIn this section, we train a ridge regressor on the dataset with and without\nencoding and explore the influence of target encoder with and without the\ninterval cross validation. First, we see the Ridge model trained on the\nraw features will have low performance, because the order of the informative\nfeature is not informative:\n\n"
86+
"## Training a Ridge Regressor\nIn this section, we train a ridge regressor on the dataset with and without\nencoding and explore the influence of target encoder with and without the\ninternal cross fitting. First, we see the Ridge model trained on the\nraw features will have low performance, because the order of the informative\nfeature is not informative:\n\n"
8787
]
8888
},
8989
{
@@ -101,7 +101,7 @@
101101
"cell_type": "markdown",
102102
"metadata": {},
103103
"source": [
104-
"Next, we create a pipeline with the target encoder and ridge model. The pipeline\nuses :meth:`TargetEncoder.fit_transform` which uses cross validation. We see that\nthe model fits the data well and generalizes to the test set:\n\n"
104+
"Next, we create a pipeline with the target encoder and ridge model. The pipeline\nuses :meth:`TargetEncoder.fit_transform` which uses cross fitting. We see that\nthe model fits the data well and generalizes to the test set:\n\n"
105105
]
106106
},
107107
{
@@ -137,7 +137,7 @@
137137
"cell_type": "markdown",
138138
"metadata": {},
139139
"source": [
140-
"While :meth:`TargetEncoder.fit_transform` uses an interval cross validation,\n:meth:`TargetEncoder.transform` itself does not perform any cross validation.\nIt uses the aggregation of the complete training set to transform the categorical\nfeatures. Thus, we can use :meth:`TargetEncoder.fit` followed by\n:meth:`TargetEncoder.transform` to disable the cross validation. This encoding\nis then passed to the ridge model.\n\n"
140+
"While :meth:`TargetEncoder.fit_transform` uses an internal cross fitting scheme,\n:meth:`TargetEncoder.transform` itself does not perform any cross fitting.\nIt uses the aggregation of the complete training set to transform the categorical\nfeatures. Thus, we can use :meth:`TargetEncoder.fit` followed by\n:meth:`TargetEncoder.transform` to disable the cross fitting. This encoding\nis then passed to the ridge model.\n\n"
141141
]
142142
},
143143
{
@@ -191,7 +191,7 @@
191191
"cell_type": "markdown",
192192
"metadata": {},
193193
"source": [
194-
"## Conclusion\nThis example demonstrates the importance of :class:`TargetEncoder`'s interval cross\nvalidation. It is important to use :meth:`TargetEncoder.fit_transform` to encode\ntraining data before passing it to a machine learning model. When a\n:class:`TargetEncoder` is a part of a :class:`~sklearn.pipeline.Pipeline` and the\npipeline is fitted, the pipeline will correctly call\n:meth:`TargetEncoder.fit_transform` and pass the encoding along.\n\n"
194+
"## Conclusion\nThis example demonstrates the importance of :class:`TargetEncoder`'s internal cross\nfitting. It is important to use :meth:`TargetEncoder.fit_transform` to encode\ntraining data before passing it to a machine learning model. When a\n:class:`TargetEncoder` is a part of a :class:`~sklearn.pipeline.Pipeline` and the\npipeline is fitted, the pipeline will correctly call\n:meth:`TargetEncoder.fit_transform` and pass the encoding along.\n\n"
195195
]
196196
}
197197
],

dev/_downloads/c62ac915428f3a173ccfc19ab3de33bd/plot_target_encoder.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
1313
.. note::
1414
`fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
15-
cross-validation scheme is used in `fit_transform` for encoding. See the
15+
cross fitting scheme is used in `fit_transform` for encoding. See the
1616
:ref:`User Guide <target_encoder>`. for details.
1717
"""
1818

dev/_downloads/scikit-learn-docs.zip

804 Bytes
Binary file not shown.
-20 Bytes
-23 Bytes

0 commit comments

Comments
 (0)