Skip to content

Commit 50e8f09

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 405a5a09503a7f2cd241f8a5b1a1d2368dcc2676
1 parent 3694a5e commit 50e8f09

File tree

1,302 files changed

+5729
-5729
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,302 files changed

+5729
-5729
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 75af389dc8646749d9412f4ba2379fbc
3+
config: a44190895cf0c1b55f66e6ef4779d0a1
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file not shown.

dev/_downloads/7c8070aa163e648367101a22133c5711/plot_forest_hist_grad_boosting_comparison.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Comparing Random Forests and Histogram Gradient Boosting models\n\nIn this example we compare the performance of Random Forest (RF) and Histogram\nGradient Boosting (HGBT) models in terms of score and computation time for a\nregression dataset, though **all the concepts here presented apply to\nclassification as well**.\n\nThe comparison is made by varying the parameters that control the number of\ntrees according to each estimator:\n\n- `n_estimators` controls the number of trees in the forest. It's a fixed number.\n- `max_iter` is the the maximum number of iterations in a gradient boosting\n based model. The number of iterations corresponds to the number of trees for\n regression and binary classification problems. Furthermore, the actual number\n of trees required by the model depends on the stopping criteria.\n\nHGBT uses gradient boosting to iteratively improve the model's performance by\nfitting each tree to the negative gradient of the loss function with respect to\nthe predicted value. RFs, on the other hand, are based on bagging and use a\nmajority vote to predict the outcome.\n\nFor more information on ensemble models, see the `User Guide <ensemble>`.\n"
7+
"\n# Comparing Random Forests and Histogram Gradient Boosting models\n\nIn this example we compare the performance of Random Forest (RF) and Histogram\nGradient Boosting (HGBT) models in terms of score and computation time for a\nregression dataset, though **all the concepts here presented apply to\nclassification as well**.\n\nThe comparison is made by varying the parameters that control the number of\ntrees according to each estimator:\n\n- `n_estimators` controls the number of trees in the forest. It's a fixed number.\n- `max_iter` is the maximum number of iterations in a gradient boosting\n based model. The number of iterations corresponds to the number of trees for\n regression and binary classification problems. Furthermore, the actual number\n of trees required by the model depends on the stopping criteria.\n\nHGBT uses gradient boosting to iteratively improve the model's performance by\nfitting each tree to the negative gradient of the loss function with respect to\nthe predicted value. RFs, on the other hand, are based on bagging and use a\nmajority vote to predict the outcome.\n\nFor more information on ensemble models, see the `User Guide <ensemble>`.\n"
88
]
99
},
1010
{
@@ -112,7 +112,7 @@
112112
"cell_type": "markdown",
113113
"metadata": {},
114114
"source": [
115-
"Both HGBT and RF models improve when increasing the number of trees in the\nensemble. However, the scores reach a plateau where adding new trees just\nmakes fitting and scoring slower. The RF model reaches such plateau earlier\nand can never reach the test score of the largest HGBDT model.\n\nNote that the results shown on the above plot can change slightly across runs\nand even more significantly when running on other machines: try to run this\nexample on your own local machine.\n\nOverall, one should often observe that the Histogram-based gradient boosting\nmodels uniformly dominate the Random Forest models in the \"test score vs\ntraining speed trade-off\" (the HGBDT curve should be on the top left of the RF\ncurve, without ever crossing). The \"test score vs prediction speed\" trade-off\ncan also be more disputed but it's most often favorable to HGBDT. It's always\na good idea to check both kinds of model (with hyper-parameter tuning) and\ncompare their performance on your specific problem to determine which model is\nthe best fit but **HGBT almost always offers a more favorable speed-accuracy\ntrade-off than RF**, either with the default hyper-parameters or including the\nhyper-parameter tuning cost.\n\nThere is one exception to this rule of thumb though: when training a\nmulticlass classification model with a large number of possible classes, HGBDT\nfits internally one-tree per class at each boosting iteration while the trees\nused by the RF models are naturally multiclass which should improve the speed\naccuracy trade-off of the RF models in this case.\n\n"
115+
"Both HGBT and RF models improve when increasing the number of trees in the\nensemble. However, the scores reach a plateau where adding new trees just\nmakes fitting and scoring slower. The RF model reaches such plateau earlier\nand can never reach the test score of the largest HGBDT model.\n\nNote that the results shown on the above plot can change slightly across runs\nand even more significantly when running on other machines: try to run this\nexample on your own local machine.\n\nOverall, one should often observe that the Histogram-based gradient boosting\nmodels uniformly dominate the Random Forest models in the \"test score vs\ntraining speed trade-off\" (the HGBDT curve should be on the top left of the RF\ncurve, without ever crossing). The \"test score vs prediction speed\" trade-off\ncan also be more disputed, but it's most often favorable to HGBDT. It's always\na good idea to check both kinds of model (with hyper-parameter tuning) and\ncompare their performance on your specific problem to determine which model is\nthe best fit but **HGBT almost always offers a more favorable speed-accuracy\ntrade-off than RF**, either with the default hyper-parameters or including the\nhyper-parameter tuning cost.\n\nThere is one exception to this rule of thumb though: when training a\nmulticlass classification model with a large number of possible classes, HGBDT\nfits internally one-tree per class at each boosting iteration while the trees\nused by the RF models are naturally multiclass which should improve the speed\naccuracy trade-off of the RF models in this case.\n\n"
116116
]
117117
}
118118
],

dev/_downloads/96b1c8376e660407547a7226ce8b3bab/plot_forest_hist_grad_boosting_comparison.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
trees according to each estimator:
1313
1414
- `n_estimators` controls the number of trees in the forest. It's a fixed number.
15-
- `max_iter` is the the maximum number of iterations in a gradient boosting
15+
- `max_iter` is the maximum number of iterations in a gradient boosting
1616
based model. The number of iterations corresponds to the number of trees for
1717
regression and binary classification problems. Furthermore, the actual number
1818
of trees required by the model depends on the stopping criteria.
@@ -210,7 +210,7 @@
210210
# models uniformly dominate the Random Forest models in the "test score vs
211211
# training speed trade-off" (the HGBDT curve should be on the top left of the RF
212212
# curve, without ever crossing). The "test score vs prediction speed" trade-off
213-
# can also be more disputed but it's most often favorable to HGBDT. It's always
213+
# can also be more disputed, but it's most often favorable to HGBDT. It's always
214214
# a good idea to check both kinds of model (with hyper-parameter tuning) and
215215
# compare their performance on your specific problem to determine which model is
216216
# the best fit but **HGBT almost always offers a more favorable speed-accuracy

dev/_downloads/scikit-learn-docs.zip

9.55 KB
Binary file not shown.
6 Bytes
41 Bytes
17 Bytes
71 Bytes

0 commit comments

Comments
 (0)