Skip to content

Commit 0ebeac1

Browse files
authored
FIX Update explanation regarding number of trees in GBDT (#799)
1 parent 528917e commit 0ebeac1

File tree

4 files changed

+34
-30
lines changed

4 files changed

+34
-30
lines changed

notebooks/ensemble_ex_03.ipynb

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -101,20 +101,21 @@
101101
"cell_type": "markdown",
102102
"metadata": {},
103103
"source": [
104-
"Both gradient boosting and random forest models improve when increasing the\n",
105-
"number of trees in the ensemble. However, the scores reach a plateau where\n",
106-
"adding new trees just makes fitting and scoring slower.\n",
104+
"Random forest models improve when increasing the number of trees in the\n",
105+
"ensemble. However, the scores reach a plateau where adding new trees just\n",
106+
"makes fitting and scoring slower.\n",
107107
"\n",
108-
"To avoid adding new unnecessary tree, unlike random-forest gradient-boosting\n",
108+
"Gradient boosting models overfit when the number of trees is too large. To\n",
109+
"avoid adding a new unnecessary tree, unlike random-forest gradient-boosting\n",
109110
"offers an early-stopping option. Internally, the algorithm uses an\n",
110111
"out-of-sample set to compute the generalization performance of the model at\n",
111112
"each addition of a tree. Thus, if the generalization performance is not\n",
112113
"improving for several iterations, it stops adding trees.\n",
113114
"\n",
114115
"Now, create a gradient-boosting model with `n_estimators=1_000`. This number\n",
115-
"of trees is certainly too large. Change the parameter `n_iter_no_change` such\n",
116-
"that the gradient boosting fitting stops after adding 5 trees that do not\n",
117-
"improve the overall generalization performance."
116+
"of trees is certainly too large. Change the parameter `n_iter_no_change`\n",
117+
"such that the gradient boosting fitting stops after adding 5 trees to avoid\n",
118+
"deterioration of the overall generalization performance."
118119
]
119120
},
120121
{

notebooks/ensemble_sol_03.ipynb

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -129,20 +129,21 @@
129129
"cell_type": "markdown",
130130
"metadata": {},
131131
"source": [
132-
"Both gradient boosting and random forest models improve when increasing the\n",
133-
"number of trees in the ensemble. However, the scores reach a plateau where\n",
134-
"adding new trees just makes fitting and scoring slower.\n",
132+
"Random forest models improve when increasing the number of trees in the\n",
133+
"ensemble. However, the scores reach a plateau where adding new trees just\n",
134+
"makes fitting and scoring slower.\n",
135135
"\n",
136-
"To avoid adding new unnecessary tree, unlike random-forest gradient-boosting\n",
136+
"Gradient boosting models overfit when the number of trees is too large. To\n",
137+
"avoid adding a new unnecessary tree, unlike random-forest gradient-boosting\n",
137138
"offers an early-stopping option. Internally, the algorithm uses an\n",
138139
"out-of-sample set to compute the generalization performance of the model at\n",
139140
"each addition of a tree. Thus, if the generalization performance is not\n",
140141
"improving for several iterations, it stops adding trees.\n",
141142
"\n",
142143
"Now, create a gradient-boosting model with `n_estimators=1_000`. This number\n",
143-
"of trees is certainly too large. Change the parameter `n_iter_no_change` such\n",
144-
"that the gradient boosting fitting stops after adding 5 trees that do not\n",
145-
"improve the overall generalization performance."
144+
"of trees is certainly too large. Change the parameter `n_iter_no_change`\n",
145+
"such that the gradient boosting fitting stops after adding 5 trees to avoid\n",
146+
"deterioration of the overall generalization performance."
146147
]
147148
},
148149
{
@@ -167,7 +168,7 @@
167168
"source": [
168169
"We see that the number of trees used is far below 1000 with the current\n",
169170
"dataset. Training the gradient boosting model with the entire 1000 trees would\n",
170-
"have been useless."
171+
"have been detrimental."
171172
]
172173
},
173174
{

python_scripts/ensemble_ex_03.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,20 +64,21 @@
6464
# Write your code here.
6565

6666
# %% [markdown]
67-
# Both gradient boosting and random forest models improve when increasing the
68-
# number of trees in the ensemble. However, the scores reach a plateau where
69-
# adding new trees just makes fitting and scoring slower.
67+
# Random forest models improve when increasing the number of trees in the
68+
# ensemble. However, the scores reach a plateau where adding new trees just
69+
# makes fitting and scoring slower.
7070
#
71-
# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
71+
# Gradient boosting models overfit when the number of trees is too large. To
72+
# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
7273
# offers an early-stopping option. Internally, the algorithm uses an
7374
# out-of-sample set to compute the generalization performance of the model at
7475
# each addition of a tree. Thus, if the generalization performance is not
7576
# improving for several iterations, it stops adding trees.
7677
#
7778
# Now, create a gradient-boosting model with `n_estimators=1_000`. This number
78-
# of trees is certainly too large. Change the parameter `n_iter_no_change` such
79-
# that the gradient boosting fitting stops after adding 5 trees that do not
80-
# improve the overall generalization performance.
79+
# of trees is certainly too large. Change the parameter `n_iter_no_change`
80+
# such that the gradient boosting fitting stops after adding 5 trees to avoid
81+
# deterioration of the overall generalization performance.
8182

8283
# %%
8384
# Write your code here.

python_scripts/ensemble_sol_03.py

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -86,20 +86,21 @@
8686
)
8787

8888
# %% [markdown]
89-
# Both gradient boosting and random forest models improve when increasing the
90-
# number of trees in the ensemble. However, the scores reach a plateau where
91-
# adding new trees just makes fitting and scoring slower.
89+
# Random forest models improve when increasing the number of trees in the
90+
# ensemble. However, the scores reach a plateau where adding new trees just
91+
# makes fitting and scoring slower.
9292
#
93-
# To avoid adding new unnecessary tree, unlike random-forest gradient-boosting
93+
# Gradient boosting models overfit when the number of trees is too large. To
94+
# avoid adding a new unnecessary tree, unlike random-forest gradient-boosting
9495
# offers an early-stopping option. Internally, the algorithm uses an
9596
# out-of-sample set to compute the generalization performance of the model at
9697
# each addition of a tree. Thus, if the generalization performance is not
9798
# improving for several iterations, it stops adding trees.
9899
#
99100
# Now, create a gradient-boosting model with `n_estimators=1_000`. This number
100-
# of trees is certainly too large. Change the parameter `n_iter_no_change` such
101-
# that the gradient boosting fitting stops after adding 5 trees that do not
102-
# improve the overall generalization performance.
101+
# of trees is certainly too large. Change the parameter `n_iter_no_change`
102+
# such that the gradient boosting fitting stops after adding 5 trees to avoid
103+
# deterioration of the overall generalization performance.
103104

104105
# %%
105106
# solution
@@ -110,7 +111,7 @@
110111
# %% [markdown] tags=["solution"]
111112
# We see that the number of trees used is far below 1000 with the current
112113
# dataset. Training the gradient boosting model with the entire 1000 trees would
113-
# have been useless.
114+
# have been detrimental.
114115

115116
# %% [markdown]
116117
# Estimate the generalization performance of this model again using the

0 commit comments

Comments
 (0)