DOC Improve user guide on scoring parameter (scikit-learn#30316)

lucyleeow · web-flow · commit 8a8bfc24a734 · 2024-11-29T09:30:02.000+01:00
diff --git a/doc/modules/classification_threshold.rst b/doc/modules/classification_threshold.rst
@@ -97,7 +97,7 @@ a meaningful metric for their use case.
     the label of the class of interest (i.e. `pos_label`). Thus, if this label is not
     the right one for your application, you need to define a scorer and pass the right
     `pos_label` (and additional parameters) using the
-    :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring` to get
+    :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring_callable` to get
     information to define your own scoring function. For instance, we show how to pass
     the information to the scorer that the label of interest is `0` when maximizing the
     :func:`~sklearn.metrics.f1_score`::
diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
@@ -148,13 +148,16 @@ predictions:
 
 * **Estimator score method**: Estimators have a ``score`` method providing a
   default evaluation criterion for the problem they are designed to solve.
-  This is not discussed on this page, but in each estimator's documentation.
+  Most commonly this is :ref:`accuracy <accuracy_score>` for classifiers and the
+  :ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
+  Details for each estimator can be found in its documentation.
 
-* **Scoring parameter**: Model-evaluation tools using
+* **Scoring parameter**: Model-evaluation tools that use
   :ref:`cross-validation <cross_validation>` (such as
-  :func:`model_selection.cross_val_score` and
-  :class:`model_selection.GridSearchCV`) rely on an internal *scoring* strategy.
-  This is discussed in the section :ref:`scoring_parameter`.
+  :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
+  :class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy.
+  This can be specified using the `scoring` parameter of that tool and is discussed
+  in the section :ref:`scoring_parameter`.
 
 * **Metric functions**: The :mod:`sklearn.metrics` module implements functions
   assessing prediction error for specific purposes. These metrics are detailed
@@ -175,24 +178,39 @@ value of those metrics for random predictions.
 The ``scoring`` parameter: defining model evaluation rules
 ==========================================================
 
-Model selection and evaluation using tools, such as
-:class:`model_selection.GridSearchCV` and
-:func:`model_selection.cross_val_score`, take a ``scoring`` parameter that
+Model selection and evaluation tools that internally use
+:ref:`cross-validation <cross_validation>` (such as
+:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
+:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that
 controls what metric they apply to the estimators evaluated.
 
-Common cases: predefined values
--------------------------------
+They can be specified in several ways:
+
+* `None`: the estimator's default evaluation criterion (i.e., the metric used in the
+  estimator's `score` method) is used.
+* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
+  name.
+* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a custom
+  metric callable (e.g., function).
+
+Some tools do also accept multiple metric evaluation. See :ref:`multimetric_scoring`
+for details.
+
+.. _scoring_string_names:
+
+String name scorers
+-------------------
 
 For the most common use cases, you can designate a scorer object with the
-``scoring`` parameter; the table below shows all possible values.
+``scoring`` parameter via a string name; the table below shows all possible values.
 All scorer objects follow the convention that **higher return values are better
-than lower return values**.  Thus metrics which measure the distance between
+than lower return values**. Thus metrics which measure the distance between
 the model and the data, like :func:`metrics.mean_squared_error`, are
-available as neg_mean_squared_error which return the negated value
+available as 'neg_mean_squared_error' which return the negated value
 of the metric.
 
 ====================================   ==============================================     ==================================
-Scoring                                Function                                           Comment
+Scoring string name                    Function                                           Comment
 ====================================   ==============================================     ==================================
 **Classification**
 'accuracy'                             :func:`metrics.accuracy_score`
@@ -260,12 +278,23 @@ Usage examples:
 
 .. currentmodule:: sklearn.metrics
 
-.. _scoring:
+.. _scoring_callable:
+
+Callable scorers
+----------------
+
+For more complex use cases and more flexibility, you can pass a callable to
+the `scoring` parameter. This can be done by:
 
-Defining your scoring strategy from metric functions
------------------------------------------------------
+* :ref:`scoring_adapt_metric`
+* :ref:`scoring_custom` (most flexible)
 
-The following metrics functions are not implemented as named scorers,
+.. _scoring_adapt_metric:
+
+Adapting predefined metrics via `make_scorer`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following metric functions are not implemented as named scorers,
 sometimes because they require additional parameters, such as
 :func:`fbeta_score`. They cannot be passed to the ``scoring``
 parameters; instead their callable needs to be passed to
@@ -303,37 +332,44 @@ measuring a prediction error given ground truth and prediction:
   maximize, the higher the better.
 
 - functions ending with ``_error``, ``_loss``, or ``_deviance`` return a
-  value to minimize, the lower the better.  When converting
+  value to minimize, the lower the better. When converting
   into a scorer object using :func:`make_scorer`, set
   the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
   parameter description below).
 
+.. _scoring_custom:
+
+Creating a custom scorer object
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can create your own custom scorer object using
+:func:`make_scorer` or for the most flexibility, from scratch. See below for details.
 
-.. dropdown:: Custom scorer objects
+.. dropdown:: Custom scorer objects using `make_scorer`
 
-  The second use case is to build a completely custom scorer object
+  You can build a completely custom scorer object
   from a simple python function using :func:`make_scorer`, which can
   take several parameters:
 
   * the python function you want to use (``my_custom_loss_func``
     in the example below)
 
   * whether the python function returns a score (``greater_is_better=True``,
-    the default) or a loss (``greater_is_better=False``).  If a loss, the output
+    the default) or a loss (``greater_is_better=False``). If a loss, the output
     of the python function is negated by the scorer object, conforming to
     the cross validation convention that scorers return higher values for better models.
 
   * for classification metrics only: whether the python function you provided requires
     continuous decision certainties. If the scoring function only accepts probability
-    estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter
-    `response_method`, thus in this case `response_method="predict_proba"`. Some scoring
-    function do not necessarily require probability estimates but rather non-thresholded
-    decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a
-    list such as `response_method=["decision_function", "predict_proba"]`. In this case,
-    the scorer will use the first available method, in the order given in the list,
+    estimates (e.g. :func:`metrics.log_loss`), then one needs to set the parameter
+    `response_method="predict_proba"`. Some scoring
+    functions do not necessarily require probability estimates but rather non-thresholded
+    decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one can provide a
+    list (e.g., `response_method=["decision_function", "predict_proba"]`),
+    and scorer will use the first available method, in the order given in the list,
     to compute the scores.
 
-  * any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`.
+  * any additional parameters of the scoring function, such as ``beta`` or ``labels``.
 
   Here is an example of building custom scorers, and of using the
   ``greater_is_better`` parameter::
@@ -357,16 +393,10 @@ measuring a prediction error given ground truth and prediction:
       >>> score(clf, X, y)
       -0.69...
 
-.. _diy_scoring:
+.. dropdown:: Custom scorer objects from scratch
 
-Implementing your own scoring object
-------------------------------------
-
-You can generate even more flexible model scorers by constructing your own
-scoring object from scratch, without using the :func:`make_scorer` factory.
-
-
-.. dropdown:: How to build a scorer from scratch
+  You can generate even more flexible model scorers by constructing your own
+  scoring object from scratch, without using the :func:`make_scorer` factory.
 
   For a callable to be a scorer, it needs to meet the protocol specified by
   the following two rules:
@@ -389,24 +419,24 @@ scoring object from scratch, without using the :func:`make_scorer` factory.
     more details.
 
 
-  .. note:: **Using custom scorers in functions where n_jobs > 1**
+.. dropdown:: Using custom scorers in functions where n_jobs > 1
 
-      While defining the custom scoring function alongside the calling function
-      should work out of the box with the default joblib backend (loky),
-      importing it from another module will be a more robust approach and work
-      independently of the joblib backend.
+    While defining the custom scoring function alongside the calling function
+    should work out of the box with the default joblib backend (loky),
+    importing it from another module will be a more robust approach and work
+    independently of the joblib backend.
 
-      For example, to use ``n_jobs`` greater than 1 in the example below,
-      ``custom_scoring_function`` function is saved in a user-created module
-      (``custom_scorer_module.py``) and imported::
+    For example, to use ``n_jobs`` greater than 1 in the example below,
+    ``custom_scoring_function`` function is saved in a user-created module
+    (``custom_scorer_module.py``) and imported::
 
-          >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
-          >>> cross_val_score(model,
-          ...  X_train,
-          ...  y_train,
-          ...  scoring=make_scorer(custom_scoring_function, greater_is_better=False),
-          ...  cv=5,
-          ...  n_jobs=-1) # doctest: +SKIP
+        >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
+        >>> cross_val_score(model,
+        ...  X_train,
+        ...  y_train,
+        ...  scoring=make_scorer(custom_scoring_function, greater_is_better=False),
+        ...  cv=5,
+        ...  n_jobs=-1) # doctest: +SKIP
 
 .. _multimetric_scoring:
 
@@ -3066,15 +3096,14 @@ display.
 .. _clustering_metrics:
 
 Clustering metrics
-======================
+==================
 
 .. currentmodule:: sklearn.metrics
 
 The :mod:`sklearn.metrics` module implements several loss, score, and utility
-functions. For more information see the :ref:`clustering_evaluation`
-section for instance clustering, and :ref:`biclustering_evaluation` for
-biclustering.
-
+functions to measure clustering performance. For more information see the
+:ref:`clustering_evaluation` section for instance clustering, and
+:ref:`biclustering_evaluation` for biclustering.
 
 .. _dummy_estimators:
 
diff --git a/sklearn/feature_selection/_sequential.py b/sklearn/feature_selection/_sequential.py
@@ -78,7 +78,7 @@ class SequentialFeatureSelector(SelectorMixin, MetaEstimatorMixin, BaseEstimator
 
     scoring : str or callable, default=None
         A single str (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
 
         NOTE that when using a custom scorer, it should return a single
         value.
diff --git a/sklearn/inspection/_permutation_importance.py b/sklearn/inspection/_permutation_importance.py
@@ -177,7 +177,7 @@ def permutation_importance(
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py
@@ -640,7 +640,7 @@ def make_scorer(
     The parameter `response_method` allows to specify which method of the estimator
     should be used to feed the scoring/loss function.
 
-    Read more in the :ref:`User Guide <scoring>`.
+    Read more in the :ref:`User Guide <scoring_callable>`.
 
     Parameters
     ----------
@@ -933,7 +933,7 @@ def check_scoring(estimator=None, scoring=None, *, allow_none=False, raise_exc=T
         Scorer to use. If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
diff --git a/sklearn/model_selection/_plot.py b/sklearn/model_selection/_plot.py
@@ -369,7 +369,7 @@ def from_estimator(
         scoring : str or callable, default=None
             A string (see :ref:`scoring_parameter`) or
             a scorer callable object / function with signature
-            `scorer(estimator, X, y)` (see :ref:`scoring`).
+            `scorer(estimator, X, y)` (see :ref:`scoring_callable`).
 
         exploit_incremental_learning : bool, default=False
             If the estimator supports incremental learning, this will be
@@ -752,7 +752,7 @@ def from_estimator(
         scoring : str or callable, default=None
             A string (see :ref:`scoring_parameter`) or
             a scorer callable object / function with signature
-            `scorer(estimator, X, y)` (see :ref:`scoring`).
+            `scorer(estimator, X, y)` (see :ref:`scoring_callable`).
 
         n_jobs : int, default=None
             Number of jobs to run in parallel. Training the estimator and
diff --git a/sklearn/model_selection/_search.py b/sklearn/model_selection/_search.py
@@ -1247,7 +1247,7 @@ class GridSearchCV(BaseSearchCV):
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
@@ -1623,7 +1623,7 @@ class RandomizedSearchCV(BaseSearchCV):
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
diff --git a/sklearn/model_selection/_search_successive_halving.py b/sklearn/model_selection/_search_successive_halving.py
@@ -480,7 +480,7 @@ class HalvingGridSearchCV(BaseSuccessiveHalving):
 
     scoring : str, callable, or None, default=None
         A single string (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
         If None, the estimator's score method is used.
 
     refit : bool, default=True
@@ -821,7 +821,7 @@ class HalvingRandomSearchCV(BaseSuccessiveHalving):
 
     scoring : str, callable, or None, default=None
         A single string (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
         If None, the estimator's score method is used.
 
     refit : bool, default=True
diff --git a/sklearn/model_selection/_validation.py b/sklearn/model_selection/_validation.py
@@ -170,12 +170,13 @@ def cross_validate(
     scoring : str, callable, list, tuple, or dict, default=None
         Strategy to evaluate the performance of the cross-validated model on
         the test set. If `None`, the
-        :ref:`default evaluation criterion <model_evaluation>` of the estimator is used.
+        :ref:`default evaluation criterion <scoring_api_overview>` of the estimator
+        is used.
 
         If `scoring` represents a single score, one can use:
 
         - a single string (see :ref:`scoring_parameter`);
-        - a callable (see :ref:`scoring`) that returns a single value.
+        - a callable (see :ref:`scoring_callable`) that returns a single value.
 
         If `scoring` represents multiple scores, one can use:
 
@@ -1562,7 +1563,7 @@ def permutation_test_score(
 
     scoring : str or callable, default=None
         A single str (see :ref:`scoring_parameter`) or a callable
-        (see :ref:`scoring`) to evaluate the predictions on the test set.
+        (see :ref:`scoring_callable`) to evaluate the predictions on the test set.
 
         If `None` the estimator's score method is used.