Skip to content

Commit 29e2de7

Browse files
committed
Pushing the docs to dev/ for branch: master, commit 1c69a8a55cc18b461b1befbd68c99a5020140363
1 parent 4a46af0 commit 29e2de7

File tree

1,186 files changed

+3697
-3700
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,186 files changed

+3697
-3700
lines changed

dev/_downloads/0ca65f327d0d82be7fdda748f857d5b4/plot_poisson_regression_non_normal_loss.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@
195195
"cell_type": "markdown",
196196
"metadata": {},
197197
"source": [
198-
"Like the Ridge regression above, the gradient boosted trees model minimizes\nthe conditional squared error. However, because of a higher predictive power,\nit also results in a smaller Poisson deviance than the linear Poisson\nregression model.\n\nEvaluating models with a single train / test split is prone to random\nfluctuations. If computing resources allow, it should be verified that\ncross-validated performance metrics would lead to similar conclusions.\n\nThe qualitative difference between these models can also be visualized by\ncomparing the histogram of observed target values with that of predicted\nvalues:\n\n"
198+
"Like the Poisson GLM above, the gradient boosted trees model minimizes\nthe Poisson deviance. However, because of a higher predictive power,\nit reaches lower values of Poisson deviance.\n\nEvaluating models with a single train / test split is prone to random\nfluctuations. If computing resources allow, it should be verified that\ncross-validated performance metrics would lead to similar conclusions.\n\nThe qualitative difference between these models can also be visualized by\ncomparing the histogram of observed target values with that of predicted\nvalues:\n\n"
199199
]
200200
},
201201
{
@@ -213,7 +213,7 @@
213213
"cell_type": "markdown",
214214
"metadata": {},
215215
"source": [
216-
"The experimental data presents a long tail distribution for ``y``. In all\nmodels, we predict the expected frequency of a random variable, so we will\nhave necessarily fewer extreme values than for the observed realizations of\nthat random variable. This explains that the mode of the histograms of model\npredictions doesn't necessarily correspond to the smallest value.\nAdditionally, the normal distribution used in ``Ridge`` has a constant\nvariance, while for the Poisson distribution used in ``PoissonRegressor`` and\n``HistGradientBoostingRegressor``, the variance is proportional to the\npredicted expected value.\n\nThus, among the considered estimators, ``PoissonRegressor`` and\n``HistGradientBoostingRegressor`` are a-priori better suited for modeling the\nlong tail distribution of the non-negative data as compared to the ``Ridge``\nmodel which makes a wrong assumption on the distribution of the target\nvariable.\n\nThe ``HistGradientBoostingRegressor`` estimator has the most flexibility and\nis able to predict higher expected values.\n\nNote that we could have used the least squares loss for the\n``HistGradientBoostingRegressor`` model. This would wrongly assume a normal\ndistribution the response variable as for the `Ridge` model, and possibly\nalso lead to slightly negative predictions. However the gradient boosted\ntrees would still perform relatively well and in particular better than\n``PoissonRegressor`` thanks to the flexibility of the trees combined with the\nlarge number of training samples.\n\nEvaluation of the calibration of predictions\n--------------------------------------------\n\nTo ensure that estimators yield reasonable predictions for different\npolicyholder types, we can bin test samples according to ``y_pred`` returned\nby each model. Then for each bin, we compare the mean predicted ``y_pred``,\nwith the mean observed target:\n\n"
216+
"The experimental data presents a long tail distribution for ``y``. In all\nmodels, we predict the expected frequency of a random variable, so we will\nhave necessarily fewer extreme values than for the observed realizations of\nthat random variable. This explains that the mode of the histograms of model\npredictions doesn't necessarily correspond to the smallest value.\nAdditionally, the normal distribution used in ``Ridge`` has a constant\nvariance, while for the Poisson distribution used in ``PoissonRegressor`` and\n``HistGradientBoostingRegressor``, the variance is proportional to the\npredicted expected value.\n\nThus, among the considered estimators, ``PoissonRegressor`` and\n``HistGradientBoostingRegressor`` are a-priori better suited for modeling the\nlong tail distribution of the non-negative data as compared to the ``Ridge``\nmodel which makes a wrong assumption on the distribution of the target\nvariable.\n\nThe ``HistGradientBoostingRegressor`` estimator has the most flexibility and\nis able to predict higher expected values.\n\nNote that we could have used the least squares loss for the\n``HistGradientBoostingRegressor`` model. This would wrongly assume a normal\ndistributed response variable as does the `Ridge` model, and possibly\nalso lead to slightly negative predictions. However the gradient boosted\ntrees would still perform relatively well and in particular better than\n``PoissonRegressor`` thanks to the flexibility of the trees combined with the\nlarge number of training samples.\n\nEvaluation of the calibration of predictions\n--------------------------------------------\n\nTo ensure that estimators yield reasonable predictions for different\npolicyholder types, we can bin test samples according to ``y_pred`` returned\nby each model. Then for each bin, we compare the mean predicted ``y_pred``,\nwith the mean observed target:\n\n"
217217
]
218218
},
219219
{
@@ -249,7 +249,7 @@
249249
"cell_type": "markdown",
250250
"metadata": {},
251251
"source": [
252-
"As expected, the dummy regressor is unable to correctly rank the samples and\ntherefore performs the worst on this plot.\n\nThe tree-based model is significantly better at ranking policyholders by risk\nwhile the two linear models perform similarly.\n\nAll three models are significantly better than chance but also very far from\nmaking perfect predictions.\n\nThis last point is expected due to the nature of the problem: the occurrence\nof accidents is mostly dominated by circumstantial causes that are not\ncaptured in the columns of the dataset and can indeed be considered as purely\nrandom.\n\nThe linear models assume no interactions between the input variables which\nlikely causes under-fitting. Inserting a polynomial feature extractor\n(:func:`~sklearn.preprocessing.PolynomialFeatures`) indeed increases their\ndiscrimative power by 2 points of Gini index. In particular it improves the\nability of the models to identify the top 5% riskiest profiles.\n\nMain takeaways\n--------------\n\n- The performance of the models can be evaluted by their ability to yield\n well-calibrated predictions and a good ranking.\n\n- The Gini index reflects the ability of a model to rank predictions\n irrespective of their absolute values, and therefore only assess their\n ranking power.\n\n- The calibration of the model can be assessed by plotting the mean observed\n value vs the mean predicted value on groups of test samples binned by\n predicted risk.\n\n- The least squares loss (along with the implicit use of the identity link\n function) of the Ridge regression model seems to cause this model to be\n badly calibrated. In particular, it tends to underestimate the risk and can\n even predict invalid negative frequencies.\n\n- Using the Poisson loss with a log-link can correct these problems and lead\n to a well-calibrated linear model.\n\n- Despite the improvement in calibration, the ranking power of both linear\n models are comparable and well below the ranking power of the Gradient\n Boosting Regression Trees.\n\n- The Poisson deviance computed as an evaluation metric reflects both the\n calibration and the ranking power of the model. It also makes a linear\n assumption on the ideal relationship between the expected value and the\n variance of the response variable. For the sake of conciseness we did not\n check whether this assumption holds.\n\n- Traditional regression metrics such as Mean Squared Error and Mean Absolute\n Error are hard to meaningfully interpret on count values with many zeros.\n\n"
252+
"As expected, the dummy regressor is unable to correctly rank the samples and\ntherefore performs the worst on this plot.\n\nThe tree-based model is significantly better at ranking policyholders by risk\nwhile the two linear models perform similarly.\n\nAll three models are significantly better than chance but also very far from\nmaking perfect predictions.\n\nThis last point is expected due to the nature of the problem: the occurrence\nof accidents is mostly dominated by circumstantial causes that are not\ncaptured in the columns of the dataset and can indeed be considered as purely\nrandom.\n\nThe linear models assume no interactions between the input variables which\nlikely causes under-fitting. Inserting a polynomial feature extractor\n(:func:`~sklearn.preprocessing.PolynomialFeatures`) indeed increases their\ndiscrimative power by 2 points of Gini index. In particular it improves the\nability of the models to identify the top 5% riskiest profiles.\n\nMain takeaways\n--------------\n\n- The performance of the models can be evaluated by their ability to yield\n well-calibrated predictions and a good ranking.\n\n- The calibration of the model can be assessed by plotting the mean observed\n value vs the mean predicted value on groups of test samples binned by\n predicted risk.\n\n- The least squares loss (along with the implicit use of the identity link\n function) of the Ridge regression model seems to cause this model to be\n badly calibrated. In particular, it tends to underestimate the risk and can\n even predict invalid negative frequencies.\n\n- Using the Poisson loss with a log-link can correct these problems and lead\n to a well-calibrated linear model.\n\n- The Gini index reflects the ability of a model to rank predictions\n irrespective of their absolute values, and therefore only assess their\n ranking power.\n\n- Despite the improvement in calibration, the ranking power of both linear\n models are comparable and well below the ranking power of the Gradient\n Boosting Regression Trees.\n\n- The Poisson deviance computed as an evaluation metric reflects both the\n calibration and the ranking power of the model. It also makes a linear\n assumption on the ideal relationship between the expected value and the\n variance of the response variable. For the sake of conciseness we did not\n check whether this assumption holds.\n\n- Traditional regression metrics such as Mean Squared Error and Mean Absolute\n Error are hard to meaningfully interpret on count values with many zeros.\n\n"
253253
]
254254
},
255255
{
Binary file not shown.
Binary file not shown.

dev/_downloads/f686bae9e47a0517ddbf86ced97151b6/plot_poisson_regression_non_normal_loss.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -277,10 +277,9 @@ def score_estimator(estimator, df_test):
277277

278278

279279
##############################################################################
280-
# Like the Ridge regression above, the gradient boosted trees model minimizes
281-
# the conditional squared error. However, because of a higher predictive power,
282-
# it also results in a smaller Poisson deviance than the linear Poisson
283-
# regression model.
280+
# Like the Poisson GLM above, the gradient boosted trees model minimizes
281+
# the Poisson deviance. However, because of a higher predictive power,
282+
# it reaches lower values of Poisson deviance.
284283
#
285284
# Evaluating models with a single train / test split is prone to random
286285
# fluctuations. If computing resources allow, it should be verified that
@@ -339,7 +338,7 @@ def score_estimator(estimator, df_test):
339338
#
340339
# Note that we could have used the least squares loss for the
341340
# ``HistGradientBoostingRegressor`` model. This would wrongly assume a normal
342-
# distribution the response variable as for the `Ridge` model, and possibly
341+
# distributed response variable as does the `Ridge` model, and possibly
343342
# also lead to slightly negative predictions. However the gradient boosted
344343
# trees would still perform relatively well and in particular better than
345344
# ``PoissonRegressor`` thanks to the flexibility of the trees combined with the
@@ -533,13 +532,9 @@ def lorenz_curve(y_true, y_pred, exposure):
533532
# Main takeaways
534533
# --------------
535534
#
536-
# - The performance of the models can be evaluted by their ability to yield
535+
# - The performance of the models can be evaluated by their ability to yield
537536
# well-calibrated predictions and a good ranking.
538537
#
539-
# - The Gini index reflects the ability of a model to rank predictions
540-
# irrespective of their absolute values, and therefore only assess their
541-
# ranking power.
542-
#
543538
# - The calibration of the model can be assessed by plotting the mean observed
544539
# value vs the mean predicted value on groups of test samples binned by
545540
# predicted risk.
@@ -552,6 +547,10 @@ def lorenz_curve(y_true, y_pred, exposure):
552547
# - Using the Poisson loss with a log-link can correct these problems and lead
553548
# to a well-calibrated linear model.
554549
#
550+
# - The Gini index reflects the ability of a model to rank predictions
551+
# irrespective of their absolute values, and therefore only assess their
552+
# ranking power.
553+
#
555554
# - Despite the improvement in calibration, the ranking power of both linear
556555
# models are comparable and well below the ranking power of the Gradient
557556
# Boosting Regression Trees.

dev/_downloads/scikit-learn-docs.pdf

1.38 KB
Binary file not shown.

dev/_images/iris.png

0 Bytes
707 Bytes
707 Bytes
91 Bytes
91 Bytes

0 commit comments

Comments
 (0)