Skip to content

Commit f6e96bc

Browse files
committed
Pushing the docs to dev/ for branch: master, commit b9403f62ac65e7e6575168ef74b43fb012010599
1 parent 5c7354c commit f6e96bc

File tree

1,194 files changed

+3717
-3690
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,194 files changed

+3717
-3690
lines changed

dev/_downloads/0ca65f327d0d82be7fdda748f857d5b4/plot_poisson_regression_non_normal_loss.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@
123123
"cell_type": "markdown",
124124
"metadata": {},
125125
"source": [
126-
"(Generalized) Linear models\n---------------------------\n\nWe start by modeling the target variable with the (l2 penalized) least\nsquares linear regression model, more comonly known as Ridge regression. We\nuse a low penalization `alpha`, as we expect such a linear model to under-fit\non such a large dataset.\n\n"
126+
"(Generalized) linear models\n---------------------------\n\nWe start by modeling the target variable with the (l2 penalized) least\nsquares linear regression model, more comonly known as Ridge regression. We\nuse a low penalization `alpha`, as we expect such a linear model to under-fit\non such a large dataset.\n\n"
127127
]
128128
},
129129
{
@@ -159,7 +159,7 @@
159159
"cell_type": "markdown",
160160
"metadata": {},
161161
"source": [
162-
"Next we fit the Poisson regressor on the target variable. We set the\nregularization strength ``alpha`` to approximately 1e-6 over number of\nsamples (i.e. `1e-12`) in order to mimic the Ridge regressor whose L2 penalty\nterm scales differently with the number of samples.\n\n"
162+
"Next we fit the Poisson regressor on the target variable. We set the\nregularization strength ``alpha`` to approximately 1e-6 over number of\nsamples (i.e. `1e-12`) in order to mimic the Ridge regressor whose L2 penalty\nterm scales differently with the number of samples.\n\nSince the Poisson regressor internally models the log of the expected target\nvalue instead of the expected value directly (log vs identity link function),\nthe relationship between X and y is not exactly linear anymore. Therefore the\nPoisson regressor is called a Generalized Linear Model (GLM) rather than a\nvanilla linear model as is the case for Ridge regression.\n\n"
163163
]
164164
},
165165
{
@@ -177,7 +177,7 @@
177177
"cell_type": "markdown",
178178
"metadata": {},
179179
"source": [
180-
"Finally, we will consider a non-linear model, namely Gradient Boosting\nRegression Trees. Tree-based models do not require the categorical data to be\none-hot encoded: instead, we can encode each category label with an arbitrary\ninteger using :class:`~sklearn.preprocessing.OrdinalEncoder`. With this\nencoding, the trees will treat the categorical features as ordered features,\nwhich might not be always a desired behavior. However this effect is limited\nfor deep enough trees which are able to recover the categorical nature of the\nfeatures. The main advantage of the\n:class:`~sklearn.preprocessing.OrdinalEncoder` over the\n:class:`~sklearn.preprocessing.OneHotEncoder` is that it will make training\nfaster.\n\nGradient Boosting also gives the possibility to fit the trees with a Poisson\nloss (with an implicit log-link function) instead of the default\nleast-squares loss. Here we only fit trees with the Poisson loss to keep this\nexample concise.\n\n"
180+
"Gradient Boosting Regression Trees for Poisson regression\n---------------------------------------------------------\n\nFinally, we will consider a non-linear model, namely Gradient Boosting\nRegression Trees. Tree-based models do not require the categorical data to be\none-hot encoded: instead, we can encode each category label with an arbitrary\ninteger using :class:`~sklearn.preprocessing.OrdinalEncoder`. With this\nencoding, the trees will treat the categorical features as ordered features,\nwhich might not be always a desired behavior. However this effect is limited\nfor deep enough trees which are able to recover the categorical nature of the\nfeatures. The main advantage of the\n:class:`~sklearn.preprocessing.OrdinalEncoder` over the\n:class:`~sklearn.preprocessing.OneHotEncoder` is that it will make training\nfaster.\n\nGradient Boosting also gives the possibility to fit the trees with a Poisson\nloss (with an implicit log-link function) instead of the default\nleast-squares loss. Here we only fit trees with the Poisson loss to keep this\nexample concise.\n\n"
181181
]
182182
},
183183
{
Binary file not shown.
Binary file not shown.

dev/_downloads/f686bae9e47a0517ddbf86ced97151b6/plot_poisson_regression_non_normal_loss.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ def score_estimator(estimator, df_test):
184184
score_estimator(dummy, df_test)
185185

186186
##############################################################################
187-
# (Generalized) Linear models
187+
# (Generalized) linear models
188188
# ---------------------------
189189
#
190190
# We start by modeling the target variable with the (l2 penalized) least
@@ -217,6 +217,12 @@ def score_estimator(estimator, df_test):
217217
# regularization strength ``alpha`` to approximately 1e-6 over number of
218218
# samples (i.e. `1e-12`) in order to mimic the Ridge regressor whose L2 penalty
219219
# term scales differently with the number of samples.
220+
#
221+
# Since the Poisson regressor internally models the log of the expected target
222+
# value instead of the expected value directly (log vs identity link function),
223+
# the relationship between X and y is not exactly linear anymore. Therefore the
224+
# Poisson regressor is called a Generalized Linear Model (GLM) rather than a
225+
# vanilla linear model as is the case for Ridge regression.
220226

221227
from sklearn.linear_model import PoissonRegressor
222228

@@ -233,6 +239,9 @@ def score_estimator(estimator, df_test):
233239
score_estimator(poisson_glm, df_test)
234240

235241
##############################################################################
242+
# Gradient Boosting Regression Trees for Poisson regression
243+
# ---------------------------------------------------------
244+
#
236245
# Finally, we will consider a non-linear model, namely Gradient Boosting
237246
# Regression Trees. Tree-based models do not require the categorical data to be
238247
# one-hot encoded: instead, we can encode each category label with an arbitrary

dev/_downloads/scikit-learn-docs.pdf

31 KB
Binary file not shown.

dev/_images/iris.png

0 Bytes
-276 Bytes
-276 Bytes
-4 Bytes
-4 Bytes

0 commit comments

Comments
 (0)