Update quizzes

lesteve · lesteve · commit 3cfb6fc757be · 2022-10-12T11:55:31.000+02:00
diff --git a/jupyter-book/predictive_modeling_pipeline/03_categorical_pipeline_quiz_m1_03.md b/jupyter-book/predictive_modeling_pipeline/03_categorical_pipeline_quiz_m1_03.md
@@ -13,24 +13,24 @@ _Select all answers that apply_
 +++
 
 ```{admonition} Question
-Ordinal variables are:
+An ordinal variable:
 
-- a) categorical variables with a large number of possible categories
-- b) typically represented by integers or string labels
-- c) categorical variables with a meaningful order
+- a) is a categorical variable with a large number of different categories;
+- b) can be represented by integers or string labels;
+- c) is a categorical variable with a meaningful order.
 
 _Select all answers that apply_
 ```
 
 +++
 
 ```{admonition} Question
-One-hot encoding will:
+One-hot encoding:
 
-- a) encode a single string-encoded column into a single integer coded column
-- b) transform a numerical variable into a categorical variable
-- c) create one additional column for each possible category
-- d) transform string variable onto numerical representation
+- a) encodes each column with string-labeled values into a single integer-coded column
+- b) transforms a numerical variable into a categorical variable
+- c) creates one additional column for each possible category
+- d) transforms string-labeled variables using a numerical representation
 
 _Select all answers that apply_
 ```
diff --git a/jupyter-book/trees/trees_quiz_m5_03.md b/jupyter-book/trees/trees_quiz_m5_03.md
@@ -14,11 +14,10 @@ _Select a single answer_
 +++
 
 ```{admonition} Question
-Decision trees are capable of:
+Decision tree regressors can predict:
 
-- a) interpolating and extrapolating
-- b) only interpolating
-- c) only extrapolating
+- a) any values, including values larger or smaller than those observed in `y_train`;
+- b) only values in the range from `np.min(y_train)` to `np.max(y_train)`.
 
 _Select a single answer_
 ```
diff --git a/jupyter-book/trees/trees_wrap_up_quiz.md b/jupyter-book/trees/trees_wrap_up_quiz.md
@@ -37,7 +37,8 @@ and evaluate them by 10-fold cross-validation.
 
 Thus, use `sklearn.linear_model.LinearRegression` and
 `sklearn.tree.DecisionTreeRegressor` to create the models. Use the default
-parameters for both models.
+parameters for the linear regression and set `random_state=0` for the decision
+tree.
 
 Be aware that a linear model requires to scale numerical features.
 Please use `sklearn.preprocessing.StandardScaler` so that your
@@ -108,8 +109,8 @@ columns. For the sake of simplicity, we will assume the following:
 
 - categorical columns can be selected if they have an `object` data type;
 - use an `OrdinalEncoder` to encode the categorical columns;
-- numerical columns can be selected if they do not have an `object` data type.
-  It will be the complement of the numerical columns.
+- numerical columns should correspond to the `numerical_features` as defined above.
+  This is a subset of the features that are not an `object` data type.
 
 In addition, set the `max_depth` of the decision tree to `7` (fixed, no need
 to tune it with a grid-search).
diff --git a/jupyter-book/tuning/parameter_tuning_wrap_up_quiz.md b/jupyter-book/tuning/parameter_tuning_wrap_up_quiz.md
@@ -163,8 +163,8 @@ Which of the following statements hold:
 
 - a) Looking at the individual cross-validation scores, the best ranked model using a
   `StandardScaler` is substantially better (at least 7 of the cross-validations scores are better)
-  than using any other processor
-- b) Using any of the preprocessors has always a better ranking than using no processor, irrespective
+  than using any other preprocessor
+- b) Using any of the preprocessors has always a better ranking than using no preprocessor, irrespective
   of the value `of n_neighbors`
 - c) Looking at the individual cross-validation scores, the model with `n_neighbors=5` and
   `StandardScaler` is substantially better (at least 7 of the cross-validations scores are better)
@@ -202,10 +202,10 @@ Explore the set of best parameters that the different grid search models found
 in each fold of the outer cross-validation. Remember that you can access them
 with the `best_params_` attribute of the estimator. Select all the statements that are true.
 
-- a) The tuned number of nearest neighbors is stable across all folds
-- b) The tuned number of nearest neighbors changes often across all folds
-- c) The optimal scaler is stable across all folds
-- d) The optimal scaler changes often across all folds
+- a) The tuned number of nearest neighbors is stable across folds
+- b) The tuned number of nearest neighbors changes often across folds
+- c) The optimal scaler is stable across folds
+- d) The optimal scaler changes often across folds
 
 _Select all answers that apply_