scikit-learn
diff --git a/‎dev/_downloads/0ca65f327d0d82be7fdda748f857d5b4/plot_poisson_regression_non_normal_loss.ipynb
Lines changed: 3 additions & 3 deletions b/‎dev/_downloads/0ca65f327d0d82be7fdda748f857d5b4/plot_poisson_regression_non_normal_loss.ipynb
Lines changed: 3 additions & 3 deletions
diff --git a/‎dev/_downloads/3409d9766d352cc9f9b169d4a799a87a/auto_examples_python.zip
-62 Bytes b/‎dev/_downloads/3409d9766d352cc9f9b169d4a799a87a/auto_examples_python.zip
-62 Bytes
diff --git a/‎dev/_downloads/d34667f097c619f8afda4bc936e7af21/auto_examples_jupyter.zip
-61 Bytes b/‎dev/_downloads/d34667f097c619f8afda4bc936e7af21/auto_examples_jupyter.zip
-61 Bytes
diff --git a/‎dev/_downloads/f686bae9e47a0517ddbf86ced97151b6/plot_poisson_regression_non_normal_loss.py
Lines changed: 9 additions & 10 deletions b/‎dev/_downloads/f686bae9e47a0517ddbf86ced97151b6/plot_poisson_regression_non_normal_loss.py
Lines changed: 9 additions & 10 deletions
diff --git a/‎dev/_downloads/scikit-learn-docs.pdf
1.38 KB b/‎dev/_downloads/scikit-learn-docs.pdf
1.38 KB
diff --git a/‎dev/_images/iris.png
0 Bytes b/‎dev/_images/iris.png
0 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
707 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
707 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0021.png
707 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0021.png
707 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
91 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
91 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0031.png
91 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_0031.png
91 Bytes
@@ -195,7 +195,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Like the Ridge regression above, the gradient boosted trees model minimizes\nthe conditional squared error. However, because of a higher predictive power,\nit also results in a smaller Poisson deviance than the linear Poisson\nregression model.\n\nEvaluating models with a single train / test split is prone to random\nfluctuations. If computing resources allow, it should be verified that\ncross-validated performance metrics would lead to similar conclusions.\n\nThe qualitative difference between these models can also be visualized by\ncomparing the histogram of observed target values with that of predicted\nvalues:\n\n"
+        "Like the Poisson GLM above, the gradient boosted trees model minimizes\nthe Poisson deviance. However, because of a higher predictive power,\nit reaches lower values of Poisson deviance.\n\nEvaluating models with a single train / test split is prone to random\nfluctuations. If computing resources allow, it should be verified that\ncross-validated performance metrics would lead to similar conclusions.\n\nThe qualitative difference between these models can also be visualized by\ncomparing the histogram of observed target values with that of predicted\nvalues:\n\n"
       ]
     },
     {
@@ -213,7 +213,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "The experimental data presents a long tail distribution for ``y``. In all\nmodels, we predict the expected frequency of a random variable, so we will\nhave necessarily fewer extreme values than for the observed realizations of\nthat random variable. This explains that the mode of the histograms of model\npredictions doesn't necessarily correspond to the smallest value.\nAdditionally, the normal distribution used in ``Ridge`` has a constant\nvariance, while for the Poisson distribution used in ``PoissonRegressor`` and\n``HistGradientBoostingRegressor``, the variance is proportional to the\npredicted expected value.\n\nThus, among the considered estimators, ``PoissonRegressor`` and\n``HistGradientBoostingRegressor`` are a-priori better suited for modeling the\nlong tail distribution of the non-negative data as compared to the ``Ridge``\nmodel which makes a wrong assumption on the distribution of the target\nvariable.\n\nThe ``HistGradientBoostingRegressor`` estimator has the most flexibility and\nis able to predict higher expected values.\n\nNote that we could have used the least squares loss for the\n``HistGradientBoostingRegressor`` model. This would wrongly assume a normal\ndistribution the response variable as for the `Ridge` model, and possibly\nalso lead to slightly negative predictions. However the gradient boosted\ntrees would still perform relatively well and in particular better than\n``PoissonRegressor`` thanks to the flexibility of the trees combined with the\nlarge number of training samples.\n\nEvaluation of the calibration of predictions\n--------------------------------------------\n\nTo ensure that estimators yield reasonable predictions for different\npolicyholder types, we can bin test samples according to ``y_pred`` returned\nby each model. Then for each bin, we compare the mean predicted ``y_pred``,\nwith the mean observed target:\n\n"
+        "The experimental data presents a long tail distribution for ``y``. In all\nmodels, we predict the expected frequency of a random variable, so we will\nhave necessarily fewer extreme values than for the observed realizations of\nthat random variable. This explains that the mode of the histograms of model\npredictions doesn't necessarily correspond to the smallest value.\nAdditionally, the normal distribution used in ``Ridge`` has a constant\nvariance, while for the Poisson distribution used in ``PoissonRegressor`` and\n``HistGradientBoostingRegressor``, the variance is proportional to the\npredicted expected value.\n\nThus, among the considered estimators, ``PoissonRegressor`` and\n``HistGradientBoostingRegressor`` are a-priori better suited for modeling the\nlong tail distribution of the non-negative data as compared to the ``Ridge``\nmodel which makes a wrong assumption on the distribution of the target\nvariable.\n\nThe ``HistGradientBoostingRegressor`` estimator has the most flexibility and\nis able to predict higher expected values.\n\nNote that we could have used the least squares loss for the\n``HistGradientBoostingRegressor`` model. This would wrongly assume a normal\ndistributed response variable as does the `Ridge` model, and possibly\nalso lead to slightly negative predictions. However the gradient boosted\ntrees would still perform relatively well and in particular better than\n``PoissonRegressor`` thanks to the flexibility of the trees combined with the\nlarge number of training samples.\n\nEvaluation of the calibration of predictions\n--------------------------------------------\n\nTo ensure that estimators yield reasonable predictions for different\npolicyholder types, we can bin test samples according to ``y_pred`` returned\nby each model. Then for each bin, we compare the mean predicted ``y_pred``,\nwith the mean observed target:\n\n"
       ]
     },
     {
@@ -249,7 +249,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "As expected, the dummy regressor is unable to correctly rank the samples and\ntherefore performs the worst on this plot.\n\nThe tree-based model is significantly better at ranking policyholders by risk\nwhile the two linear models perform similarly.\n\nAll three models are significantly better than chance but also very far from\nmaking perfect predictions.\n\nThis last point is expected due to the nature of the problem: the occurrence\nof accidents is mostly dominated by circumstantial causes that are not\ncaptured in the columns of the dataset and can indeed be considered as purely\nrandom.\n\nThe linear models assume no interactions between the input variables which\nlikely causes under-fitting. Inserting a polynomial feature extractor\n(:func:`~sklearn.preprocessing.PolynomialFeatures`) indeed increases their\ndiscrimative power by 2 points of Gini index. In particular it improves the\nability of the models to identify the top 5% riskiest profiles.\n\nMain takeaways\n--------------\n\n- The performance of the models can be evaluted by their ability to yield\n  well-calibrated predictions and a good ranking.\n\n- The Gini index reflects the ability of a model to rank predictions\n  irrespective of their absolute values, and therefore only assess their\n  ranking power.\n\n- The calibration of the model can be assessed by plotting the mean observed\n  value vs the mean predicted value on groups of test samples binned by\n  predicted risk.\n\n- The least squares loss (along with the implicit use of the identity link\n  function) of the Ridge regression model seems to cause this model to be\n  badly calibrated. In particular, it tends to underestimate the risk and can\n  even predict invalid negative frequencies.\n\n- Using the Poisson loss with a log-link can correct these problems and lead\n  to a well-calibrated linear model.\n\n- Despite the improvement in calibration, the ranking power of both linear\n  models are comparable and well below the ranking power of the Gradient\n  Boosting Regression Trees.\n\n- The Poisson deviance computed as an evaluation metric reflects both the\n  calibration and the ranking power of the model. It also makes a linear\n  assumption on the ideal relationship between the expected value and the\n  variance of the response variable. For the sake of conciseness we did not\n  check whether this assumption holds.\n\n- Traditional regression metrics such as Mean Squared Error and Mean Absolute\n  Error are hard to meaningfully interpret on count values with many zeros.\n\n"
+        "As expected, the dummy regressor is unable to correctly rank the samples and\ntherefore performs the worst on this plot.\n\nThe tree-based model is significantly better at ranking policyholders by risk\nwhile the two linear models perform similarly.\n\nAll three models are significantly better than chance but also very far from\nmaking perfect predictions.\n\nThis last point is expected due to the nature of the problem: the occurrence\nof accidents is mostly dominated by circumstantial causes that are not\ncaptured in the columns of the dataset and can indeed be considered as purely\nrandom.\n\nThe linear models assume no interactions between the input variables which\nlikely causes under-fitting. Inserting a polynomial feature extractor\n(:func:`~sklearn.preprocessing.PolynomialFeatures`) indeed increases their\ndiscrimative power by 2 points of Gini index. In particular it improves the\nability of the models to identify the top 5% riskiest profiles.\n\nMain takeaways\n--------------\n\n- The performance of the models can be evaluated by their ability to yield\n  well-calibrated predictions and a good ranking.\n\n- The calibration of the model can be assessed by plotting the mean observed\n  value vs the mean predicted value on groups of test samples binned by\n  predicted risk.\n\n- The least squares loss (along with the implicit use of the identity link\n  function) of the Ridge regression model seems to cause this model to be\n  badly calibrated. In particular, it tends to underestimate the risk and can\n  even predict invalid negative frequencies.\n\n- Using the Poisson loss with a log-link can correct these problems and lead\n  to a well-calibrated linear model.\n\n- The Gini index reflects the ability of a model to rank predictions\n  irrespective of their absolute values, and therefore only assess their\n  ranking power.\n\n- Despite the improvement in calibration, the ranking power of both linear\n  models are comparable and well below the ranking power of the Gradient\n  Boosting Regression Trees.\n\n- The Poisson deviance computed as an evaluation metric reflects both the\n  calibration and the ranking power of the model. It also makes a linear\n  assumption on the ideal relationship between the expected value and the\n  variance of the response variable. For the sake of conciseness we did not\n  check whether this assumption holds.\n\n- Traditional regression metrics such as Mean Squared Error and Mean Absolute\n  Error are hard to meaningfully interpret on count values with many zeros.\n\n"
       ]
     },
     {
 
@@ -277,10 +277,9 @@ def score_estimator(estimator, df_test):
 
 
 ##############################################################################
-# Like the Ridge regression above, the gradient boosted trees model minimizes
-# the conditional squared error. However, because of a higher predictive power,
-# it also results in a smaller Poisson deviance than the linear Poisson
-# regression model.
+# Like the Poisson GLM above, the gradient boosted trees model minimizes
+# the Poisson deviance. However, because of a higher predictive power,
+# it reaches lower values of Poisson deviance.
 #
 # Evaluating models with a single train / test split is prone to random
 # fluctuations. If computing resources allow, it should be verified that
@@ -339,7 +338,7 @@ def score_estimator(estimator, df_test):
 #
 # Note that we could have used the least squares loss for the
 # ``HistGradientBoostingRegressor`` model. This would wrongly assume a normal
-# distribution the response variable as for the `Ridge` model, and possibly
+# distributed response variable as does the `Ridge` model, and possibly
 # also lead to slightly negative predictions. However the gradient boosted
 # trees would still perform relatively well and in particular better than
 # ``PoissonRegressor`` thanks to the flexibility of the trees combined with the
@@ -533,13 +532,9 @@ def lorenz_curve(y_true, y_pred, exposure):
 # Main takeaways
 # --------------
 #
-# - The performance of the models can be evaluted by their ability to yield
+# - The performance of the models can be evaluated by their ability to yield
 #   well-calibrated predictions and a good ranking.
 #
-# - The Gini index reflects the ability of a model to rank predictions
-#   irrespective of their absolute values, and therefore only assess their
-#   ranking power.
-#
 # - The calibration of the model can be assessed by plotting the mean observed
 #   value vs the mean predicted value on groups of test samples binned by
 #   predicted risk.
@@ -552,6 +547,10 @@ def lorenz_curve(y_true, y_pred, exposure):
 # - Using the Poisson loss with a log-link can correct these problems and lead
 #   to a well-calibrated linear model.
 #
+# - The Gini index reflects the ability of a model to rank predictions
+#   irrespective of their absolute values, and therefore only assess their
+#   ranking power.
+#
 # - Despite the improvement in calibration, the ranking power of both linear
 #   models are comparable and well below the ranking power of the Gradient
 #   Boosting Regression Trees.
Original file line number	Diff line number	Diff line change
`@@ -195,7 +195,7 @@`
`195`	`195`	`"cell_type": "markdown",`
`196`	`196`	`"metadata": {},`
`197`	`197`	`"source": [`
`198`		- "Like the Ridge regression above, the gradient boosted trees model minimizes\nthe conditional squared error. However, because of a higher predictive power,\nit also results in a smaller Poisson deviance than the linear Poisson\nregression model.\n\nEvaluating models with a single train / test split is prone to random\nfluctuations. If computing resources allow, it should be verified that\ncross-validated performance metrics would lead to similar conclusions.\n\nThe qualitative difference between these models can also be visualized by\ncomparing the histogram of observed target values with that of predicted\nvalues:\n\n"
	`198`	+ "Like the Poisson GLM above, the gradient boosted trees model minimizes\nthe Poisson deviance. However, because of a higher predictive power,\nit reaches lower values of Poisson deviance.\n\nEvaluating models with a single train / test split is prone to random\nfluctuations. If computing resources allow, it should be verified that\ncross-validated performance metrics would lead to similar conclusions.\n\nThe qualitative difference between these models can also be visualized by\ncomparing the histogram of observed target values with that of predicted\nvalues:\n\n"
`199`	`199`	`]`
`200`	`200`	`},`
`201`	`201`	`{`
`@@ -213,7 +213,7 @@`
`213`	`213`	`"cell_type": "markdown",`
`214`	`214`	`"metadata": {},`
`215`	`215`	`"source": [`
`216`		- "The experimental data presents a long tail distribution for ``y``. In all\nmodels, we predict the expected frequency of a random variable, so we will\nhave necessarily fewer extreme values than for the observed realizations of\nthat random variable. This explains that the mode of the histograms of model\npredictions doesn't necessarily correspond to the smallest value.\nAdditionally, the normal distribution used in ``Ridge`` has a constant\nvariance, while for the Poisson distribution used in ``PoissonRegressor`` and\n``HistGradientBoostingRegressor``, the variance is proportional to the\npredicted expected value.\n\nThus, among the considered estimators, ``PoissonRegressor`` and\n``HistGradientBoostingRegressor`` are a-priori better suited for modeling the\nlong tail distribution of the non-negative data as compared to the ``Ridge``\nmodel which makes a wrong assumption on the distribution of the target\nvariable.\n\nThe ``HistGradientBoostingRegressor`` estimator has the most flexibility and\nis able to predict higher expected values.\n\nNote that we could have used the least squares loss for the\n``HistGradientBoostingRegressor`` model. This would wrongly assume a normal\ndistribution the response variable as for the `Ridge` model, and possibly\nalso lead to slightly negative predictions. However the gradient boosted\ntrees would still perform relatively well and in particular better than\n``PoissonRegressor`` thanks to the flexibility of the trees combined with the\nlarge number of training samples.\n\nEvaluation of the calibration of predictions\n--------------------------------------------\n\nTo ensure that estimators yield reasonable predictions for different\npolicyholder types, we can bin test samples according to ``y_pred`` returned\nby each model. Then for each bin, we compare the mean predicted ``y_pred``,\nwith the mean observed target:\n\n"
	`216`	+ "The experimental data presents a long tail distribution for ``y``. In all\nmodels, we predict the expected frequency of a random variable, so we will\nhave necessarily fewer extreme values than for the observed realizations of\nthat random variable. This explains that the mode of the histograms of model\npredictions doesn't necessarily correspond to the smallest value.\nAdditionally, the normal distribution used in ``Ridge`` has a constant\nvariance, while for the Poisson distribution used in ``PoissonRegressor`` and\n``HistGradientBoostingRegressor``, the variance is proportional to the\npredicted expected value.\n\nThus, among the considered estimators, ``PoissonRegressor`` and\n``HistGradientBoostingRegressor`` are a-priori better suited for modeling the\nlong tail distribution of the non-negative data as compared to the ``Ridge``\nmodel which makes a wrong assumption on the distribution of the target\nvariable.\n\nThe ``HistGradientBoostingRegressor`` estimator has the most flexibility and\nis able to predict higher expected values.\n\nNote that we could have used the least squares loss for the\n``HistGradientBoostingRegressor`` model. This would wrongly assume a normal\ndistributed response variable as does the `Ridge` model, and possibly\nalso lead to slightly negative predictions. However the gradient boosted\ntrees would still perform relatively well and in particular better than\n``PoissonRegressor`` thanks to the flexibility of the trees combined with the\nlarge number of training samples.\n\nEvaluation of the calibration of predictions\n--------------------------------------------\n\nTo ensure that estimators yield reasonable predictions for different\npolicyholder types, we can bin test samples according to ``y_pred`` returned\nby each model. Then for each bin, we compare the mean predicted ``y_pred``,\nwith the mean observed target:\n\n"
`217`	`217`	`]`
`218`	`218`	`},`
`219`	`219`	`{`
`@@ -249,7 +249,7 @@`
`249`	`249`	`"cell_type": "markdown",`
`250`	`250`	`"metadata": {},`
`251`	`251`	`"source": [`
`252`		- "As expected, the dummy regressor is unable to correctly rank the samples and\ntherefore performs the worst on this plot.\n\nThe tree-based model is significantly better at ranking policyholders by risk\nwhile the two linear models perform similarly.\n\nAll three models are significantly better than chance but also very far from\nmaking perfect predictions.\n\nThis last point is expected due to the nature of the problem: the occurrence\nof accidents is mostly dominated by circumstantial causes that are not\ncaptured in the columns of the dataset and can indeed be considered as purely\nrandom.\n\nThe linear models assume no interactions between the input variables which\nlikely causes under-fitting. Inserting a polynomial feature extractor\n(:func:`~sklearn.preprocessing.PolynomialFeatures`) indeed increases their\ndiscrimative power by 2 points of Gini index. In particular it improves the\nability of the models to identify the top 5% riskiest profiles.\n\nMain takeaways\n--------------\n\n- The performance of the models can be evaluted by their ability to yield\n well-calibrated predictions and a good ranking.\n\n- The Gini index reflects the ability of a model to rank predictions\n irrespective of their absolute values, and therefore only assess their\n ranking power.\n\n- The calibration of the model can be assessed by plotting the mean observed\n value vs the mean predicted value on groups of test samples binned by\n predicted risk.\n\n- The least squares loss (along with the implicit use of the identity link\n function) of the Ridge regression model seems to cause this model to be\n badly calibrated. In particular, it tends to underestimate the risk and can\n even predict invalid negative frequencies.\n\n- Using the Poisson loss with a log-link can correct these problems and lead\n to a well-calibrated linear model.\n\n- Despite the improvement in calibration, the ranking power of both linear\n models are comparable and well below the ranking power of the Gradient\n Boosting Regression Trees.\n\n- The Poisson deviance computed as an evaluation metric reflects both the\n calibration and the ranking power of the model. It also makes a linear\n assumption on the ideal relationship between the expected value and the\n variance of the response variable. For the sake of conciseness we did not\n check whether this assumption holds.\n\n- Traditional regression metrics such as Mean Squared Error and Mean Absolute\n Error are hard to meaningfully interpret on count values with many zeros.\n\n"
	`252`	+ "As expected, the dummy regressor is unable to correctly rank the samples and\ntherefore performs the worst on this plot.\n\nThe tree-based model is significantly better at ranking policyholders by risk\nwhile the two linear models perform similarly.\n\nAll three models are significantly better than chance but also very far from\nmaking perfect predictions.\n\nThis last point is expected due to the nature of the problem: the occurrence\nof accidents is mostly dominated by circumstantial causes that are not\ncaptured in the columns of the dataset and can indeed be considered as purely\nrandom.\n\nThe linear models assume no interactions between the input variables which\nlikely causes under-fitting. Inserting a polynomial feature extractor\n(:func:`~sklearn.preprocessing.PolynomialFeatures`) indeed increases their\ndiscrimative power by 2 points of Gini index. In particular it improves the\nability of the models to identify the top 5% riskiest profiles.\n\nMain takeaways\n--------------\n\n- The performance of the models can be evaluated by their ability to yield\n well-calibrated predictions and a good ranking.\n\n- The calibration of the model can be assessed by plotting the mean observed\n value vs the mean predicted value on groups of test samples binned by\n predicted risk.\n\n- The least squares loss (along with the implicit use of the identity link\n function) of the Ridge regression model seems to cause this model to be\n badly calibrated. In particular, it tends to underestimate the risk and can\n even predict invalid negative frequencies.\n\n- Using the Poisson loss with a log-link can correct these problems and lead\n to a well-calibrated linear model.\n\n- The Gini index reflects the ability of a model to rank predictions\n irrespective of their absolute values, and therefore only assess their\n ranking power.\n\n- Despite the improvement in calibration, the ranking power of both linear\n models are comparable and well below the ranking power of the Gradient\n Boosting Regression Trees.\n\n- The Poisson deviance computed as an evaluation metric reflects both the\n calibration and the ranking power of the model. It also makes a linear\n assumption on the ideal relationship between the expected value and the\n variance of the response variable. For the sake of conciseness we did not\n check whether this assumption holds.\n\n- Traditional regression metrics such as Mean Squared Error and Mean Absolute\n Error are hard to meaningfully interpret on count values with many zeros.\n\n"
`253`	`253`	`]`
`254`	`254`	`},`
`255`	`255`	`{`