scikit-learn
diff --git a/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion b/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
-3 Bytes b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
-3 Bytes
diff --git a/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
-3 Bytes b/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip
-3 Bytes
diff --git a/‎dev/_downloads/7c8070aa163e648367101a22133c5711/plot_forest_hist_grad_boosting_comparison.ipynb
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/7c8070aa163e648367101a22133c5711/plot_forest_hist_grad_boosting_comparison.ipynb
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/96b1c8376e660407547a7226ce8b3bab/plot_forest_hist_grad_boosting_comparison.py
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/96b1c8376e660407547a7226ce8b3bab/plot_forest_hist_grad_boosting_comparison.py
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/scikit-learn-docs.zip
9.55 KB b/‎dev/_downloads/scikit-learn-docs.zip
9.55 KB
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
6 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png
6 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
41 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png
41 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
17 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png
17 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
71 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png
71 Bytes
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 75af389dc8646749d9412f4ba2379fbc
+config: a44190895cf0c1b55f66e6ef4779d0a1
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -4,7 +4,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Comparing Random Forests and Histogram Gradient Boosting models\n\nIn this example we compare the performance of Random Forest (RF) and Histogram\nGradient Boosting (HGBT) models in terms of score and computation time for a\nregression dataset, though **all the concepts here presented apply to\nclassification as well**.\n\nThe comparison is made by varying the parameters that control the number of\ntrees according to each estimator:\n\n- `n_estimators` controls the number of trees in the forest. It's a fixed number.\n- `max_iter` is the the maximum number of iterations in a gradient boosting\n  based model. The number of iterations corresponds to the number of trees for\n  regression and binary classification problems. Furthermore, the actual number\n  of trees required by the model depends on the stopping criteria.\n\nHGBT uses gradient boosting to iteratively improve the model's performance by\nfitting each tree to the negative gradient of the loss function with respect to\nthe predicted value. RFs, on the other hand, are based on bagging and use a\nmajority vote to predict the outcome.\n\nFor more information on ensemble models, see the `User Guide <ensemble>`.\n"
+        "\n# Comparing Random Forests and Histogram Gradient Boosting models\n\nIn this example we compare the performance of Random Forest (RF) and Histogram\nGradient Boosting (HGBT) models in terms of score and computation time for a\nregression dataset, though **all the concepts here presented apply to\nclassification as well**.\n\nThe comparison is made by varying the parameters that control the number of\ntrees according to each estimator:\n\n- `n_estimators` controls the number of trees in the forest. It's a fixed number.\n- `max_iter` is the maximum number of iterations in a gradient boosting\n  based model. The number of iterations corresponds to the number of trees for\n  regression and binary classification problems. Furthermore, the actual number\n  of trees required by the model depends on the stopping criteria.\n\nHGBT uses gradient boosting to iteratively improve the model's performance by\nfitting each tree to the negative gradient of the loss function with respect to\nthe predicted value. RFs, on the other hand, are based on bagging and use a\nmajority vote to predict the outcome.\n\nFor more information on ensemble models, see the `User Guide <ensemble>`.\n"
       ]
     },
     {
@@ -112,7 +112,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "Both HGBT and RF models improve when increasing the number of trees in the\nensemble. However, the scores reach a plateau where adding new trees just\nmakes fitting and scoring slower. The RF model reaches such plateau earlier\nand can never reach the test score of the largest HGBDT model.\n\nNote that the results shown on the above plot can change slightly across runs\nand even more significantly when running on other machines: try to run this\nexample on your own local machine.\n\nOverall, one should often observe that the Histogram-based gradient boosting\nmodels uniformly dominate the Random Forest models in the \"test score vs\ntraining speed trade-off\" (the HGBDT curve should be on the top left of the RF\ncurve, without ever crossing). The \"test score vs prediction speed\" trade-off\ncan also be more disputed but it's most often favorable to HGBDT. It's always\na good idea to check both kinds of model (with hyper-parameter tuning) and\ncompare their performance on your specific problem to determine which model is\nthe best fit but **HGBT almost always offers a more favorable speed-accuracy\ntrade-off than RF**, either with the default hyper-parameters or including the\nhyper-parameter tuning cost.\n\nThere is one exception to this rule of thumb though: when training a\nmulticlass classification model with a large number of possible classes, HGBDT\nfits internally one-tree per class at each boosting iteration while the trees\nused by the RF models are naturally multiclass which should improve the speed\naccuracy trade-off of the RF models in this case.\n\n"
+        "Both HGBT and RF models improve when increasing the number of trees in the\nensemble. However, the scores reach a plateau where adding new trees just\nmakes fitting and scoring slower. The RF model reaches such plateau earlier\nand can never reach the test score of the largest HGBDT model.\n\nNote that the results shown on the above plot can change slightly across runs\nand even more significantly when running on other machines: try to run this\nexample on your own local machine.\n\nOverall, one should often observe that the Histogram-based gradient boosting\nmodels uniformly dominate the Random Forest models in the \"test score vs\ntraining speed trade-off\" (the HGBDT curve should be on the top left of the RF\ncurve, without ever crossing). The \"test score vs prediction speed\" trade-off\ncan also be more disputed, but it's most often favorable to HGBDT. It's always\na good idea to check both kinds of model (with hyper-parameter tuning) and\ncompare their performance on your specific problem to determine which model is\nthe best fit but **HGBT almost always offers a more favorable speed-accuracy\ntrade-off than RF**, either with the default hyper-parameters or including the\nhyper-parameter tuning cost.\n\nThere is one exception to this rule of thumb though: when training a\nmulticlass classification model with a large number of possible classes, HGBDT\nfits internally one-tree per class at each boosting iteration while the trees\nused by the RF models are naturally multiclass which should improve the speed\naccuracy trade-off of the RF models in this case.\n\n"
       ]
     }
   ],
 
@@ -12,7 +12,7 @@
 trees according to each estimator:
 
 - `n_estimators` controls the number of trees in the forest. It's a fixed number.
-- `max_iter` is the the maximum number of iterations in a gradient boosting
+- `max_iter` is the maximum number of iterations in a gradient boosting
   based model. The number of iterations corresponds to the number of trees for
   regression and binary classification problems. Furthermore, the actual number
   of trees required by the model depends on the stopping criteria.
@@ -210,7 +210,7 @@
 # models uniformly dominate the Random Forest models in the "test score vs
 # training speed trade-off" (the HGBDT curve should be on the top left of the RF
 # curve, without ever crossing). The "test score vs prediction speed" trade-off
-# can also be more disputed but it's most often favorable to HGBDT. It's always
+# can also be more disputed, but it's most often favorable to HGBDT. It's always
 # a good idea to check both kinds of model (with hyper-parameter tuning) and
 # compare their performance on your specific problem to determine which model is
 # the best fit but **HGBT almost always offers a more favorable speed-accuracy
Original file line number	Diff line number	Diff line change
`@@ -4,7 +4,7 @@`
`4`	`4`	`"cell_type": "markdown",`
`5`	`5`	`"metadata": {},`
`6`	`6`	`"source": [`
`7`		- "\n# Comparing Random Forests and Histogram Gradient Boosting models\n\nIn this example we compare the performance of Random Forest (RF) and Histogram\nGradient Boosting (HGBT) models in terms of score and computation time for a\nregression dataset, though all the concepts here presented apply to\nclassification as well.\n\nThe comparison is made by varying the parameters that control the number of\ntrees according to each estimator:\n\n- `n_estimators` controls the number of trees in the forest. It's a fixed number.\n- `max_iter` is the the maximum number of iterations in a gradient boosting\n based model. The number of iterations corresponds to the number of trees for\n regression and binary classification problems. Furthermore, the actual number\n of trees required by the model depends on the stopping criteria.\n\nHGBT uses gradient boosting to iteratively improve the model's performance by\nfitting each tree to the negative gradient of the loss function with respect to\nthe predicted value. RFs, on the other hand, are based on bagging and use a\nmajority vote to predict the outcome.\n\nFor more information on ensemble models, see the `User Guide <ensemble>`.\n"
	`7`	+ "\n# Comparing Random Forests and Histogram Gradient Boosting models\n\nIn this example we compare the performance of Random Forest (RF) and Histogram\nGradient Boosting (HGBT) models in terms of score and computation time for a\nregression dataset, though all the concepts here presented apply to\nclassification as well.\n\nThe comparison is made by varying the parameters that control the number of\ntrees according to each estimator:\n\n- `n_estimators` controls the number of trees in the forest. It's a fixed number.\n- `max_iter` is the maximum number of iterations in a gradient boosting\n based model. The number of iterations corresponds to the number of trees for\n regression and binary classification problems. Furthermore, the actual number\n of trees required by the model depends on the stopping criteria.\n\nHGBT uses gradient boosting to iteratively improve the model's performance by\nfitting each tree to the negative gradient of the loss function with respect to\nthe predicted value. RFs, on the other hand, are based on bagging and use a\nmajority vote to predict the outcome.\n\nFor more information on ensemble models, see the `User Guide <ensemble>`.\n"
`8`	`8`	`]`
`9`	`9`	`},`
`10`	`10`	`{`
`@@ -112,7 +112,7 @@`
`112`	`112`	`"cell_type": "markdown",`
`113`	`113`	`"metadata": {},`
`114`	`114`	`"source": [`
`115`		- "Both HGBT and RF models improve when increasing the number of trees in the\nensemble. However, the scores reach a plateau where adding new trees just\nmakes fitting and scoring slower. The RF model reaches such plateau earlier\nand can never reach the test score of the largest HGBDT model.\n\nNote that the results shown on the above plot can change slightly across runs\nand even more significantly when running on other machines: try to run this\nexample on your own local machine.\n\nOverall, one should often observe that the Histogram-based gradient boosting\nmodels uniformly dominate the Random Forest models in the \"test score vs\ntraining speed trade-off\" (the HGBDT curve should be on the top left of the RF\ncurve, without ever crossing). The \"test score vs prediction speed\" trade-off\ncan also be more disputed but it's most often favorable to HGBDT. It's always\na good idea to check both kinds of model (with hyper-parameter tuning) and\ncompare their performance on your specific problem to determine which model is\nthe best fit but HGBT almost always offers a more favorable speed-accuracy\ntrade-off than RF, either with the default hyper-parameters or including the\nhyper-parameter tuning cost.\n\nThere is one exception to this rule of thumb though: when training a\nmulticlass classification model with a large number of possible classes, HGBDT\nfits internally one-tree per class at each boosting iteration while the trees\nused by the RF models are naturally multiclass which should improve the speed\naccuracy trade-off of the RF models in this case.\n\n"
	`115`	+ "Both HGBT and RF models improve when increasing the number of trees in the\nensemble. However, the scores reach a plateau where adding new trees just\nmakes fitting and scoring slower. The RF model reaches such plateau earlier\nand can never reach the test score of the largest HGBDT model.\n\nNote that the results shown on the above plot can change slightly across runs\nand even more significantly when running on other machines: try to run this\nexample on your own local machine.\n\nOverall, one should often observe that the Histogram-based gradient boosting\nmodels uniformly dominate the Random Forest models in the \"test score vs\ntraining speed trade-off\" (the HGBDT curve should be on the top left of the RF\ncurve, without ever crossing). The \"test score vs prediction speed\" trade-off\ncan also be more disputed, but it's most often favorable to HGBDT. It's always\na good idea to check both kinds of model (with hyper-parameter tuning) and\ncompare their performance on your specific problem to determine which model is\nthe best fit but HGBT almost always offers a more favorable speed-accuracy\ntrade-off than RF, either with the default hyper-parameters or including the\nhyper-parameter tuning cost.\n\nThere is one exception to this rule of thumb though: when training a\nmulticlass classification model with a large number of possible classes, HGBDT\nfits internally one-tree per class at each boosting iteration while the trees\nused by the RF models are naturally multiclass which should improve the speed\naccuracy trade-off of the RF models in this case.\n\n"
`116`	`116`	`]`
`117`	`117`	`}`
`118`	`118`	`],`