Skip to content

Commit ea9aef8

Browse files
author
ArturoAmorQ
committed
Partial merge with INRIA#755
1 parent 3a9b423 commit ea9aef8

File tree

1 file changed

+60
-49
lines changed

1 file changed

+60
-49
lines changed

python_scripts/parameter_tuning_grid_search.py

Lines changed: 60 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -116,81 +116,92 @@
116116
# %% [markdown]
117117
# ## Tuning using a grid-search
118118
#
119-
# In the previous exercise we used one `for` loop for each hyperparameter to
120-
# find the best combination over a fixed grid of values. `GridSearchCV` is a
121-
# scikit-learn class that implements a very similar logic with less repetitive
122-
# code.
119+
# In the previous exercise (M3.01) we used two nested `for` loops (one for each
120+
# hyperparameter) to test different combinations over a fixed grid of
121+
# hyperparameter values. In each iteration of the loop, we used
122+
# `cross_val_score` to compute the mean score (as averaged across
123+
# cross-validation splits), and compared those mean scores to select the best
124+
# combination. `GridSearchCV` is a scikit-learn class that implements a very
125+
# similar logic with less repetitive code. The suffix `CV` refers to the
126+
# cross-validation it runs internally (instead of the `cross_val_score` we
127+
# "hard" coded).
123128
#
124-
# Let's see how to use the `GridSearchCV` estimator for doing such search. Since
125-
# the grid-search is costly, we only explore the combination learning-rate and
126-
# the maximum number of nodes.
127-
### Cross-Validation Visualization
128-
![Grid Search CV](../figures/cross_validation_train_test_diagram.png)
129-
130-
#This diagram illustrates how GridSearchCV uses cross-validation to split the dataset during hyperparameter tuning.
129+
# The `GridSearchCV` estimator takes a `param_grid` parameter which defines all
130+
# hyperparameters and their associated values. The grid-search is in charge of
131+
# creating all possible combinations and testing them.
132+
#
133+
# The number of combinations is equal to the product of the number of values to
134+
# explore for each parameter. Thus, adding new parameters with their associated
135+
# values to be explored rapidly becomes computationally expensive. Because of
136+
# that, here we only explore the combination learning-rate and the maximum
137+
# number of nodes for a total of 4 x 3 = 12 combinations.
131138

132139
# %%time
133-
134140
from sklearn.model_selection import GridSearchCV
135141

136142
param_grid = {
137-
"classifier__learning_rate": (0.01, 0.1, 1, 10),
138-
"classifier__max_leaf_nodes": (3, 10, 30),
139-
}
143+
"classifier__learning_rate": (0.01, 0.1, 1, 10), # 4 possible values
144+
"classifier__max_leaf_nodes": (3, 10, 30), # 3 possible values
145+
} # 12 unique combinations
140146
model_grid_search = GridSearchCV(model, param_grid=param_grid, n_jobs=2, cv=2)
141147
model_grid_search.fit(data_train, target_train)
142148

143149
# %% [markdown]
144-
# Finally, we check the accuracy of our model using the test set.
150+
# You can access the best combination of hyperparameters found by the grid
151+
# search using the `best_params_` attribute.
145152

146153
# %%
147-
accuracy = model_grid_search.score(data_test, target_test)
148-
print(
149-
f"The test accuracy score of the grid-searched pipeline is: {accuracy:.2f}"
150-
)
151-
152-
# %% [markdown]
153-
# ```{warning}
154-
# Be aware that the evaluation should normally be performed through
155-
# cross-validation by providing `model_grid_search` as a model to the
156-
# `cross_validate` function.
157-
#
158-
# Here, we used a single train-test split to evaluate `model_grid_search`. In
159-
# a future notebook will go into more detail about nested cross-validation, when
160-
# you use cross-validation both for hyperparameter tuning and model evaluation.
161-
# ```
154+
print(f"The best set of parameters is: {model_grid_search.best_params_}")
162155

163156
# %% [markdown]
164-
# The `GridSearchCV` estimator takes a `param_grid` parameter which defines all
165-
# hyperparameters and their associated values. The grid-search is in charge
166-
# of creating all possible combinations and test them.
167-
#
168-
# The number of combinations are equal to the product of the number of values to
169-
# explore for each parameter (e.g. in our example 4 x 3 combinations). Thus,
170-
# adding new parameters with their associated values to be explored become
171-
# rapidly computationally expensive.
172-
#
173-
# Once the grid-search is fitted, it can be used as any other predictor by
174-
# calling `predict` and `predict_proba`. Internally, it uses the model with the
157+
# Once the grid-search is fitted, it can be used as any other estimator, i.e. it
158+
# has `predict` and `score` methods. Internally, it uses the model with the
175159
# best parameters found during `fit`.
176160
#
177-
# Get predictions for the 5 first samples using the estimator with the best
178-
# parameters.
161+
# Let's get the predictions for the 5 first samples using the estimator with the
162+
# best parameters:
179163

180164
# %%
181165
model_grid_search.predict(data_test.iloc[0:5])
182166

183167
# %% [markdown]
184-
# You can know about these parameters by looking at the `best_params_`
185-
# attribute.
168+
# Finally, we check the accuracy of our model using the test set.
186169

187170
# %%
188-
print(f"The best set of parameters is: {model_grid_search.best_params_}")
171+
accuracy = model_grid_search.score(data_test, target_test)
172+
print(
173+
f"The test accuracy score of the grid-search pipeline is: {accuracy:.2f}"
174+
)
189175

190176
# %% [markdown]
191-
# The accuracy and the best parameters of the grid-searched pipeline are similar
177+
# The accuracy and the best parameters of the grid-search pipeline are similar
192178
# to the ones we found in the previous exercise, where we searched the best
193-
# parameters "by hand" through a double for loop.
179+
# parameters "by hand" through a double `for` loop.
180+
#
181+
# ## The need for a validation set
182+
#
183+
# In the previous section, the selection of the best hyperparameters was done
184+
# using the train set, coming from the initial train-test split. Then, we
185+
# evaluated the generalization performance of our tuned model on the left out
186+
# test set. This can be shown schematically as follows:
187+
#
188+
# ![Cross-validation tuning
189+
# diagram](../figures/cross_validation_train_test_diagram.png)
190+
#
191+
# ```{note}
192+
# This figure shows the particular case of **K-fold** cross-validation strategy
193+
# using `n_splits=5` to further split the train set coming from a train-test
194+
# split. For each cross-validation split, the procedure trains a model on all
195+
# the red samples, evaluates the score of a given set of hyperparameters on the
196+
# green samples. The best combination of hyperparameters `best_params` is selected
197+
# based on those intermediate scores.
198+
#
199+
# Then a final model is refitted using `best_params` on the concatenation of the
200+
# red and green samples and evaluated on the blue samples.
201+
#
202+
# The green samples are sometimes referred as the **validation set** to
203+
# differentiate them from the final test set in blue.
204+
# ```
194205
#
195206
# In addition, we can inspect all results which are stored in the attribute
196207
# `cv_results_` of the grid-search. We filter some specific columns from these

0 commit comments

Comments
 (0)