diff --git a/notebooks/datasets_adult_census.ipynb b/notebooks/datasets_adult_census.ipynb
index 139287829..ae274ebf5 100644
--- a/notebooks/datasets_adult_census.ipynb
+++ b/notebooks/datasets_adult_census.ipynb
@@ -105,7 +105,7 @@
     "        dimensions=plot_list,\n",
     "    )\n",
     ")\n",
-    "fig.show()"
+    "fig.show(renderer=\"notebook\")"
    ]
   },
   {
diff --git a/notebooks/linear_models_feature_engineering_classification.ipynb b/notebooks/linear_models_feature_engineering_classification.ipynb
index 6781ef734..5043c40ad 100644
--- a/notebooks/linear_models_feature_engineering_classification.ipynb
+++ b/notebooks/linear_models_feature_engineering_classification.ipynb
@@ -641,7 +641,7 @@
     "- Transformers such as `KBinsDiscretizer` and `SplineTransformer` can be used\n",
     "  to engineer non-linear features independently for each original feature.\n",
     "- As a result, these transformers cannot capture interactions between the\n",
-    "  orignal features (and then would fail on the XOR classification task).\n",
+    "  original features (and then would fail on the XOR classification task).\n",
     "- Despite this limitation they already augment the expressivity of the\n",
     "  pipeline, which can be sufficient for some datasets.\n",
     "- They also favor axis-aligned decision boundaries, in particular in the low\n",
diff --git a/notebooks/parameter_tuning_grid_search.ipynb b/notebooks/parameter_tuning_grid_search.ipynb
index 7f0e0f61f..a7fc56994 100644
--- a/notebooks/parameter_tuning_grid_search.ipynb
+++ b/notebooks/parameter_tuning_grid_search.ipynb
@@ -198,29 +198,33 @@
    "source": [
     "## Tuning using a grid-search\n",
     "\n",
-    "In the previous exercise we used one `for` loop for each hyperparameter to\n",
-    "find the best combination over a fixed grid of values. `GridSearchCV` is a\n",
-    "scikit-learn class that implements a very similar logic with less repetitive\n",
-    "code.\n",
+    "In the previous exercise (M3.01) we used two nested `for` loops (one for each\n",
+    "hyperparameter) to test different combinations over a fixed grid of\n",
+    "hyperparameter values. In each iteration of the loop, we used\n",
+    "`cross_val_score` to compute the mean score (as averaged across\n",
+    "cross-validation splits), and compared those mean scores to select the best\n",
+    "combination. `GridSearchCV` is a scikit-learn class that implements a very\n",
+    "similar logic with less repetitive code. The suffix `CV` refers to the\n",
+    "cross-validation it runs internally (instead of the `cross_val_score` we\n",
+    "\"hard\" coded).\n",
+    "\n",
+    "The `GridSearchCV` estimator takes a `param_grid` parameter which defines all\n",
+    "hyperparameters and their associated values. The grid-search is in charge of\n",
+    "creating all possible combinations and testing them.\n",
+    "\n",
+    "The number of combinations is equal to the product of the number of values to\n",
+    "explore for each parameter. Thus, adding new parameters with their associated\n",
+    "values to be explored rapidly becomes computationally expensive. Because of\n",
+    "that, here we only explore the combination learning-rate and the maximum\n",
+    "number of nodes for a total of 4 x 3 = 12 combinations.\n",
     "\n",
-    "Let's see how to use the `GridSearchCV` estimator for doing such search. Since\n",
-    "the grid-search is costly, we only explore the combination learning-rate and\n",
-    "the maximum number of nodes."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
     "%%time\n",
     "from sklearn.model_selection import GridSearchCV\n",
     "\n",
     "param_grid = {\n",
-    "    \"classifier__learning_rate\": (0.01, 0.1, 1, 10),\n",
-    "    \"classifier__max_leaf_nodes\": (3, 10, 30),\n",
-    "}\n",
+    "    \"classifier__learning_rate\": (0.01, 0.1, 1, 10),  # 4 possible values\n",
+    "    \"classifier__max_leaf_nodes\": (3, 10, 30),  # 3 possible values\n",
+    "}  # 12 unique combinations\n",
     "model_grid_search = GridSearchCV(model, param_grid=param_grid, n_jobs=2, cv=2)\n",
     "model_grid_search.fit(data_train, target_train)"
    ]
@@ -229,7 +233,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Finally, we check the accuracy of our model using the test set."
+    "You can access the best combination of hyperparameters found by the grid\n",
+    "search using the `best_params_` attribute."
    ]
   },
   {
@@ -238,46 +243,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "accuracy = model_grid_search.score(data_test, target_test)\n",
-    "print(\n",
-    "    f\"The test accuracy score of the grid-searched pipeline is: {accuracy:.2f}\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div class=\"admonition warning alert alert-danger\">\n",
-    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Warning</p>\n",
-    "<p>Be aware that the evaluation should normally be performed through\n",
-    "cross-validation by providing <tt class=\"docutils literal\">model_grid_search</tt> as a model to the\n",
-    "<tt class=\"docutils literal\">cross_validate</tt> function.</p>\n",
-    "<p class=\"last\">Here, we used a single train-test split to evaluate <tt class=\"docutils literal\">model_grid_search</tt>. In\n",
-    "a future notebook will go into more detail about nested cross-validation, when\n",
-    "you use cross-validation both for hyperparameter tuning and model evaluation.</p>\n",
-    "</div>"
+    "print(f\"The best set of parameters is: {model_grid_search.best_params_}\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The `GridSearchCV` estimator takes a `param_grid` parameter which defines all\n",
-    "hyperparameters and their associated values. The grid-search is in charge\n",
-    "of creating all possible combinations and test them.\n",
-    "\n",
-    "The number of combinations are equal to the product of the number of values to\n",
-    "explore for each parameter (e.g. in our example 4 x 3 combinations). Thus,\n",
-    "adding new parameters with their associated values to be explored become\n",
-    "rapidly computationally expensive.\n",
-    "\n",
-    "Once the grid-search is fitted, it can be used as any other predictor by\n",
-    "calling `predict` and `predict_proba`. Internally, it uses the model with the\n",
+    "Once the grid-search is fitted, it can be used as any other estimator, i.e. it\n",
+    "has `predict` and `score` methods. Internally, it uses the model with the\n",
     "best parameters found during `fit`.\n",
     "\n",
-    "Get predictions for the 5 first samples using the estimator with the best\n",
-    "parameters."
+    "Let's get the predictions for the 5 first samples using the estimator with the\n",
+    "best parameters:"
    ]
   },
   {
@@ -293,8 +271,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can know about these parameters by looking at the `best_params_`\n",
-    "attribute."
+    "Finally, we check the accuracy of our model using the test set."
    ]
   },
   {
@@ -303,16 +280,43 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "print(f\"The best set of parameters is: {model_grid_search.best_params_}\")"
+    "accuracy = model_grid_search.score(data_test, target_test)\n",
+    "print(\n",
+    "    f\"The test accuracy score of the grid-search pipeline is: {accuracy:.2f}\"\n",
+    ")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The accuracy and the best parameters of the grid-searched pipeline are similar\n",
+    "The accuracy and the best parameters of the grid-search pipeline are similar\n",
     "to the ones we found in the previous exercise, where we searched the best\n",
-    "parameters \"by hand\" through a double for loop.\n",
+    "parameters \"by hand\" through a double `for` loop.\n",
+    "\n",
+    "## The need for a validation set\n",
+    "\n",
+    "In the previous section, the selection of the best hyperparameters was done\n",
+    "using the train set, coming from the initial train-test split. Then, we\n",
+    "evaluated the generalization performance of our tuned model on the left out\n",
+    "test set. This can be shown schematically as follows:\n",
+    "\n",
+    "![Cross-validation tuning\n",
+    "diagram](../figures/cross_validation_train_test_diagram.png)\n",
+    "\n",
+    "<div class=\"admonition note alert alert-info\">\n",
+    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
+    "<p>This figure shows the particular case of <strong>K-fold</strong> cross-validation strategy\n",
+    "using <tt class=\"docutils literal\">n_splits=5</tt> to further split the train set coming from a train-test\n",
+    "split. For each cross-validation split, the procedure trains a model on all\n",
+    "the red samples, evaluates the score of a given set of hyperparameters on the\n",
+    "green samples. The best combination of hyperparameters <tt class=\"docutils literal\">best_params</tt> is selected\n",
+    "based on those intermediate scores.</p>\n",
+    "<p>Then a final model is refitted using <tt class=\"docutils literal\">best_params</tt> on the concatenation of the\n",
+    "red and green samples and evaluated on the blue samples.</p>\n",
+    "<p class=\"last\">The green samples are sometimes referred as the <strong>validation set</strong> to\n",
+    "differentiate them from the final test set in blue.</p>\n",
+    "</div>\n",
     "\n",
     "In addition, we can inspect all results which are stored in the attribute\n",
     "`cv_results_` of the grid-search. We filter some specific columns from these\n",
diff --git a/notebooks/parameter_tuning_parallel_plot.ipynb b/notebooks/parameter_tuning_parallel_plot.ipynb
index 32f411b35..806bbd9f7 100644
--- a/notebooks/parameter_tuning_parallel_plot.ipynb
+++ b/notebooks/parameter_tuning_parallel_plot.ipynb
@@ -145,7 +145,7 @@
     "    color=\"mean_test_score\",\n",
     "    color_continuous_scale=px.colors.sequential.Viridis,\n",
     ")\n",
-    "fig.show()"
+    "fig.show(renderer=\"notebook\")"
    ]
   },
   {
diff --git a/notebooks/parameter_tuning_sol_03.ipynb b/notebooks/parameter_tuning_sol_03.ipynb
index d52b48176..37c8f15da 100644
--- a/notebooks/parameter_tuning_sol_03.ipynb
+++ b/notebooks/parameter_tuning_sol_03.ipynb
@@ -266,7 +266,7 @@
     "    dimensions=[\"n_neighbors\", \"centering\", \"scaling\", \"mean test score\"],\n",
     "    color_continuous_scale=px.colors.diverging.Tealrose,\n",
     ")\n",
-    "fig.show()"
+    "fig.show(renderer=\"notebook\")"
    ]
   },
   {
diff --git a/python_scripts/datasets_adult_census.py b/python_scripts/datasets_adult_census.py
index f86bf40ef..d3d36d88f 100644
--- a/python_scripts/datasets_adult_census.py
+++ b/python_scripts/datasets_adult_census.py
@@ -91,7 +91,7 @@ def generate_dict(col):
         dimensions=plot_list,
     )
 )
-fig.show()
+fig.show(renderer="notebook")
 
 # %% [markdown]
 # The `Parcoords` plot is quite similar to the parallel coordinates plot that we
diff --git a/python_scripts/parameter_tuning_parallel_plot.py b/python_scripts/parameter_tuning_parallel_plot.py
index 340e75dd0..1be534206 100644
--- a/python_scripts/parameter_tuning_parallel_plot.py
+++ b/python_scripts/parameter_tuning_parallel_plot.py
@@ -102,7 +102,7 @@ def shorten_param(param_name):
     color="mean_test_score",
     color_continuous_scale=px.colors.sequential.Viridis,
 )
-fig.show()
+fig.show(renderer="notebook")
 
 # %% [markdown]
 # ```{note}
diff --git a/python_scripts/parameter_tuning_sol_03.py b/python_scripts/parameter_tuning_sol_03.py
index 1cdb01191..3f50c0adf 100644
--- a/python_scripts/parameter_tuning_sol_03.py
+++ b/python_scripts/parameter_tuning_sol_03.py
@@ -160,7 +160,7 @@
     dimensions=["n_neighbors", "centering", "scaling", "mean test score"],
     color_continuous_scale=px.colors.diverging.Tealrose,
 )
-fig.show()
+fig.show(renderer="notebook")
 
 # %% [markdown] tags=["solution"]
 # We recall that it is possible to select a range of results by clicking and