Skip to content

Commit 8a8bfc2

Browse files
authored
DOC Improve user guide on scoring parameter (scikit-learn#30316)
1 parent bcee404 commit 8a8bfc2

File tree

9 files changed

+102
-72
lines changed

9 files changed

+102
-72
lines changed

doc/modules/classification_threshold.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ a meaningful metric for their use case.
9797
the label of the class of interest (i.e. `pos_label`). Thus, if this label is not
9898
the right one for your application, you need to define a scorer and pass the right
9999
`pos_label` (and additional parameters) using the
100-
:func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring` to get
100+
:func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring_callable` to get
101101
information to define your own scoring function. For instance, we show how to pass
102102
the information to the scorer that the label of interest is `0` when maximizing the
103103
:func:`~sklearn.metrics.f1_score`::

doc/modules/model_evaluation.rst

Lines changed: 87 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -148,13 +148,16 @@ predictions:
148148

149149
* **Estimator score method**: Estimators have a ``score`` method providing a
150150
default evaluation criterion for the problem they are designed to solve.
151-
This is not discussed on this page, but in each estimator's documentation.
151+
Most commonly this is :ref:`accuracy <accuracy_score>` for classifiers and the
152+
:ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
153+
Details for each estimator can be found in its documentation.
152154

153-
* **Scoring parameter**: Model-evaluation tools using
155+
* **Scoring parameter**: Model-evaluation tools that use
154156
:ref:`cross-validation <cross_validation>` (such as
155-
:func:`model_selection.cross_val_score` and
156-
:class:`model_selection.GridSearchCV`) rely on an internal *scoring* strategy.
157-
This is discussed in the section :ref:`scoring_parameter`.
157+
:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
158+
:class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy.
159+
This can be specified using the `scoring` parameter of that tool and is discussed
160+
in the section :ref:`scoring_parameter`.
158161

159162
* **Metric functions**: The :mod:`sklearn.metrics` module implements functions
160163
assessing prediction error for specific purposes. These metrics are detailed
@@ -175,24 +178,39 @@ value of those metrics for random predictions.
175178
The ``scoring`` parameter: defining model evaluation rules
176179
==========================================================
177180

178-
Model selection and evaluation using tools, such as
179-
:class:`model_selection.GridSearchCV` and
180-
:func:`model_selection.cross_val_score`, take a ``scoring`` parameter that
181+
Model selection and evaluation tools that internally use
182+
:ref:`cross-validation <cross_validation>` (such as
183+
:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
184+
:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that
181185
controls what metric they apply to the estimators evaluated.
182186

183-
Common cases: predefined values
184-
-------------------------------
187+
They can be specified in several ways:
188+
189+
* `None`: the estimator's default evaluation criterion (i.e., the metric used in the
190+
estimator's `score` method) is used.
191+
* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
192+
name.
193+
* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a custom
194+
metric callable (e.g., function).
195+
196+
Some tools do also accept multiple metric evaluation. See :ref:`multimetric_scoring`
197+
for details.
198+
199+
.. _scoring_string_names:
200+
201+
String name scorers
202+
-------------------
185203

186204
For the most common use cases, you can designate a scorer object with the
187-
``scoring`` parameter; the table below shows all possible values.
205+
``scoring`` parameter via a string name; the table below shows all possible values.
188206
All scorer objects follow the convention that **higher return values are better
189-
than lower return values**. Thus metrics which measure the distance between
207+
than lower return values**. Thus metrics which measure the distance between
190208
the model and the data, like :func:`metrics.mean_squared_error`, are
191-
available as neg_mean_squared_error which return the negated value
209+
available as 'neg_mean_squared_error' which return the negated value
192210
of the metric.
193211

194212
==================================== ============================================== ==================================
195-
Scoring Function Comment
213+
Scoring string name Function Comment
196214
==================================== ============================================== ==================================
197215
**Classification**
198216
'accuracy' :func:`metrics.accuracy_score`
@@ -260,12 +278,23 @@ Usage examples:
260278

261279
.. currentmodule:: sklearn.metrics
262280

263-
.. _scoring:
281+
.. _scoring_callable:
282+
283+
Callable scorers
284+
----------------
285+
286+
For more complex use cases and more flexibility, you can pass a callable to
287+
the `scoring` parameter. This can be done by:
264288

265-
Defining your scoring strategy from metric functions
266-
-----------------------------------------------------
289+
* :ref:`scoring_adapt_metric`
290+
* :ref:`scoring_custom` (most flexible)
267291

268-
The following metrics functions are not implemented as named scorers,
292+
.. _scoring_adapt_metric:
293+
294+
Adapting predefined metrics via `make_scorer`
295+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
296+
297+
The following metric functions are not implemented as named scorers,
269298
sometimes because they require additional parameters, such as
270299
:func:`fbeta_score`. They cannot be passed to the ``scoring``
271300
parameters; instead their callable needs to be passed to
@@ -303,37 +332,44 @@ measuring a prediction error given ground truth and prediction:
303332
maximize, the higher the better.
304333

305334
- functions ending with ``_error``, ``_loss``, or ``_deviance`` return a
306-
value to minimize, the lower the better. When converting
335+
value to minimize, the lower the better. When converting
307336
into a scorer object using :func:`make_scorer`, set
308337
the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
309338
parameter description below).
310339

340+
.. _scoring_custom:
341+
342+
Creating a custom scorer object
343+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
344+
345+
You can create your own custom scorer object using
346+
:func:`make_scorer` or for the most flexibility, from scratch. See below for details.
311347

312-
.. dropdown:: Custom scorer objects
348+
.. dropdown:: Custom scorer objects using `make_scorer`
313349

314-
The second use case is to build a completely custom scorer object
350+
You can build a completely custom scorer object
315351
from a simple python function using :func:`make_scorer`, which can
316352
take several parameters:
317353

318354
* the python function you want to use (``my_custom_loss_func``
319355
in the example below)
320356

321357
* whether the python function returns a score (``greater_is_better=True``,
322-
the default) or a loss (``greater_is_better=False``). If a loss, the output
358+
the default) or a loss (``greater_is_better=False``). If a loss, the output
323359
of the python function is negated by the scorer object, conforming to
324360
the cross validation convention that scorers return higher values for better models.
325361

326362
* for classification metrics only: whether the python function you provided requires
327363
continuous decision certainties. If the scoring function only accepts probability
328-
estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter
329-
`response_method`, thus in this case `response_method="predict_proba"`. Some scoring
330-
function do not necessarily require probability estimates but rather non-thresholded
331-
decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a
332-
list such as `response_method=["decision_function", "predict_proba"]`. In this case,
333-
the scorer will use the first available method, in the order given in the list,
364+
estimates (e.g. :func:`metrics.log_loss`), then one needs to set the parameter
365+
`response_method="predict_proba"`. Some scoring
366+
functions do not necessarily require probability estimates but rather non-thresholded
367+
decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one can provide a
368+
list (e.g., `response_method=["decision_function", "predict_proba"]`),
369+
and scorer will use the first available method, in the order given in the list,
334370
to compute the scores.
335371

336-
* any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`.
372+
* any additional parameters of the scoring function, such as ``beta`` or ``labels``.
337373

338374
Here is an example of building custom scorers, and of using the
339375
``greater_is_better`` parameter::
@@ -357,16 +393,10 @@ measuring a prediction error given ground truth and prediction:
357393
>>> score(clf, X, y)
358394
-0.69...
359395

360-
.. _diy_scoring:
396+
.. dropdown:: Custom scorer objects from scratch
361397

362-
Implementing your own scoring object
363-
------------------------------------
364-
365-
You can generate even more flexible model scorers by constructing your own
366-
scoring object from scratch, without using the :func:`make_scorer` factory.
367-
368-
369-
.. dropdown:: How to build a scorer from scratch
398+
You can generate even more flexible model scorers by constructing your own
399+
scoring object from scratch, without using the :func:`make_scorer` factory.
370400

371401
For a callable to be a scorer, it needs to meet the protocol specified by
372402
the following two rules:
@@ -389,24 +419,24 @@ scoring object from scratch, without using the :func:`make_scorer` factory.
389419
more details.
390420

391421

392-
.. note:: **Using custom scorers in functions where n_jobs > 1**
422+
.. dropdown:: Using custom scorers in functions where n_jobs > 1
393423

394-
While defining the custom scoring function alongside the calling function
395-
should work out of the box with the default joblib backend (loky),
396-
importing it from another module will be a more robust approach and work
397-
independently of the joblib backend.
424+
While defining the custom scoring function alongside the calling function
425+
should work out of the box with the default joblib backend (loky),
426+
importing it from another module will be a more robust approach and work
427+
independently of the joblib backend.
398428

399-
For example, to use ``n_jobs`` greater than 1 in the example below,
400-
``custom_scoring_function`` function is saved in a user-created module
401-
(``custom_scorer_module.py``) and imported::
429+
For example, to use ``n_jobs`` greater than 1 in the example below,
430+
``custom_scoring_function`` function is saved in a user-created module
431+
(``custom_scorer_module.py``) and imported::
402432

403-
>>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
404-
>>> cross_val_score(model,
405-
... X_train,
406-
... y_train,
407-
... scoring=make_scorer(custom_scoring_function, greater_is_better=False),
408-
... cv=5,
409-
... n_jobs=-1) # doctest: +SKIP
433+
>>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
434+
>>> cross_val_score(model,
435+
... X_train,
436+
... y_train,
437+
... scoring=make_scorer(custom_scoring_function, greater_is_better=False),
438+
... cv=5,
439+
... n_jobs=-1) # doctest: +SKIP
410440

411441
.. _multimetric_scoring:
412442

@@ -3066,15 +3096,14 @@ display.
30663096
.. _clustering_metrics:
30673097

30683098
Clustering metrics
3069-
======================
3099+
==================
30703100

30713101
.. currentmodule:: sklearn.metrics
30723102

30733103
The :mod:`sklearn.metrics` module implements several loss, score, and utility
3074-
functions. For more information see the :ref:`clustering_evaluation`
3075-
section for instance clustering, and :ref:`biclustering_evaluation` for
3076-
biclustering.
3077-
3104+
functions to measure clustering performance. For more information see the
3105+
:ref:`clustering_evaluation` section for instance clustering, and
3106+
:ref:`biclustering_evaluation` for biclustering.
30783107

30793108
.. _dummy_estimators:
30803109

sklearn/feature_selection/_sequential.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ class SequentialFeatureSelector(SelectorMixin, MetaEstimatorMixin, BaseEstimator
7878
7979
scoring : str or callable, default=None
8080
A single str (see :ref:`scoring_parameter`) or a callable
81-
(see :ref:`scoring`) to evaluate the predictions on the test set.
81+
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.
8282
8383
NOTE that when using a custom scorer, it should return a single
8484
value.

sklearn/inspection/_permutation_importance.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ def permutation_importance(
177177
If `scoring` represents a single score, one can use:
178178
179179
- a single string (see :ref:`scoring_parameter`);
180-
- a callable (see :ref:`scoring`) that returns a single value.
180+
- a callable (see :ref:`scoring_callable`) that returns a single value.
181181
182182
If `scoring` represents multiple scores, one can use:
183183

sklearn/metrics/_scorer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -640,7 +640,7 @@ def make_scorer(
640640
The parameter `response_method` allows to specify which method of the estimator
641641
should be used to feed the scoring/loss function.
642642
643-
Read more in the :ref:`User Guide <scoring>`.
643+
Read more in the :ref:`User Guide <scoring_callable>`.
644644
645645
Parameters
646646
----------
@@ -933,7 +933,7 @@ def check_scoring(estimator=None, scoring=None, *, allow_none=False, raise_exc=T
933933
Scorer to use. If `scoring` represents a single score, one can use:
934934
935935
- a single string (see :ref:`scoring_parameter`);
936-
- a callable (see :ref:`scoring`) that returns a single value.
936+
- a callable (see :ref:`scoring_callable`) that returns a single value.
937937
938938
If `scoring` represents multiple scores, one can use:
939939

sklearn/model_selection/_plot.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -369,7 +369,7 @@ def from_estimator(
369369
scoring : str or callable, default=None
370370
A string (see :ref:`scoring_parameter`) or
371371
a scorer callable object / function with signature
372-
`scorer(estimator, X, y)` (see :ref:`scoring`).
372+
`scorer(estimator, X, y)` (see :ref:`scoring_callable`).
373373
374374
exploit_incremental_learning : bool, default=False
375375
If the estimator supports incremental learning, this will be
@@ -752,7 +752,7 @@ def from_estimator(
752752
scoring : str or callable, default=None
753753
A string (see :ref:`scoring_parameter`) or
754754
a scorer callable object / function with signature
755-
`scorer(estimator, X, y)` (see :ref:`scoring`).
755+
`scorer(estimator, X, y)` (see :ref:`scoring_callable`).
756756
757757
n_jobs : int, default=None
758758
Number of jobs to run in parallel. Training the estimator and

sklearn/model_selection/_search.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1247,7 +1247,7 @@ class GridSearchCV(BaseSearchCV):
12471247
If `scoring` represents a single score, one can use:
12481248
12491249
- a single string (see :ref:`scoring_parameter`);
1250-
- a callable (see :ref:`scoring`) that returns a single value.
1250+
- a callable (see :ref:`scoring_callable`) that returns a single value.
12511251
12521252
If `scoring` represents multiple scores, one can use:
12531253
@@ -1623,7 +1623,7 @@ class RandomizedSearchCV(BaseSearchCV):
16231623
If `scoring` represents a single score, one can use:
16241624
16251625
- a single string (see :ref:`scoring_parameter`);
1626-
- a callable (see :ref:`scoring`) that returns a single value.
1626+
- a callable (see :ref:`scoring_callable`) that returns a single value.
16271627
16281628
If `scoring` represents multiple scores, one can use:
16291629

sklearn/model_selection/_search_successive_halving.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -480,7 +480,7 @@ class HalvingGridSearchCV(BaseSuccessiveHalving):
480480
481481
scoring : str, callable, or None, default=None
482482
A single string (see :ref:`scoring_parameter`) or a callable
483-
(see :ref:`scoring`) to evaluate the predictions on the test set.
483+
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.
484484
If None, the estimator's score method is used.
485485
486486
refit : bool, default=True
@@ -821,7 +821,7 @@ class HalvingRandomSearchCV(BaseSuccessiveHalving):
821821
822822
scoring : str, callable, or None, default=None
823823
A single string (see :ref:`scoring_parameter`) or a callable
824-
(see :ref:`scoring`) to evaluate the predictions on the test set.
824+
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.
825825
If None, the estimator's score method is used.
826826
827827
refit : bool, default=True

sklearn/model_selection/_validation.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -170,12 +170,13 @@ def cross_validate(
170170
scoring : str, callable, list, tuple, or dict, default=None
171171
Strategy to evaluate the performance of the cross-validated model on
172172
the test set. If `None`, the
173-
:ref:`default evaluation criterion <model_evaluation>` of the estimator is used.
173+
:ref:`default evaluation criterion <scoring_api_overview>` of the estimator
174+
is used.
174175
175176
If `scoring` represents a single score, one can use:
176177
177178
- a single string (see :ref:`scoring_parameter`);
178-
- a callable (see :ref:`scoring`) that returns a single value.
179+
- a callable (see :ref:`scoring_callable`) that returns a single value.
179180
180181
If `scoring` represents multiple scores, one can use:
181182
@@ -1562,7 +1563,7 @@ def permutation_test_score(
15621563
15631564
scoring : str or callable, default=None
15641565
A single str (see :ref:`scoring_parameter`) or a callable
1565-
(see :ref:`scoring`) to evaluate the predictions on the test set.
1566+
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.
15661567
15671568
If `None` the estimator's score method is used.
15681569

0 commit comments

Comments
 (0)