Skip to content

Commit d4d85a3

Browse files
authored
Metrics: make CRPS handle obs outside the fx support (#781)
* tests for CPRS with obs outside forecast support (WIP) * CRPS: handle obs outside forecast support Revise the CRPS calculation to handle the case where the observation is outside the forecast support. For example, if Prob(Power <= 10 MW) = 0% but then the observation is 9 MW. Or if Prob(Power <= 30 MW) = 100% but ten the observation is 31 MW. This change required some subtle changes to how the vectorized calculation is performed. Also, this commit switches the integration from the rectangular rule to a quadrature rule, which seems result in more accurate CRPS calculations when the number of forecast CDF intervals is low (e.g., 10 or less). This commit also updates the docstring of the CRPS function and the tests, including comparisons against examples where the forecast is defined by a continuous parametric distribution that allows calculating the CRPS analytically. Note: this branch still needs to validate the CRPS skill score calculation and related tests. Also, it would be good to include some "simpler" CRPS calculation examples (e.g., with 3 or 5 CDF intervals, but that may not be practical. * CRPS: simplify integration using np.trapz Simplify the integration using the numpy trapezoidal function. The result is identical to the prior code (since underneath it's the same operations), but using np.trapz() makes it clearer what method is being used and allows users to read up on the numpy documentation regarding the integration method. * Add tests for CRPS and CRPSS (skill score) Add additional tests for the CRPS and CRPSS (CRPS skill score) functions, including "simple" examples with 3 CDF intervals to help show the logical of the trapezoidal integration. * CRPS: allow numpy broadcasting with n >= 2 Modify how the forecast CDF is expanded and the observation indicator function is calculated to support numpy broadcasting when the number of samples is greater than or equal to two (i.e., n >= 2). * tests/probabilistic: use np.array for obs Revert to assuming observations are provided as numpy arrays, including in the case where the observation is for a single sample. This matches the previous logic and helps prevent issues in other parts of the code base. * tests/calculator: update CRPS examples Note that since the CRPS calculation now uses a more generalized numerical integration setup, some of the example CRPS values had to be adjusted as the "simple" CRPS examples in the calculator tests are not very realistic (e.g., only 3 forecast CDF intervals). Rather than complicate the examples in these tests, I instead corrected the CRPS values for the given examples. Also, this commit corrects a mispelling of the Brier score name in a code comment. * Add whatsnew/1.0.13.rst with CRPS revision * whatsnew: correct name * Address reviewer comments and suggested revisions.
1 parent 881fbab commit d4d85a3

File tree

4 files changed

+259
-154
lines changed

4 files changed

+259
-154
lines changed

docs/source/whatsnew/1.0.13.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
.. _whatsnew_1013:
2+
3+
.. py:currentmodule:: solarforecastarbiter
4+
5+
6+
1.0.13 (February 10, 2022)
7+
--------------------------
8+
9+
Fixed
10+
~~~~~~~~~~~~
11+
* Revised CRPS metric to handle observations outside the forecast support. (:pull:`781`, :issue:`779`)
12+
13+
Contributors
14+
~~~~~~~~~~~~
15+
16+
* Will Holmgren (:ghuser:`wholmgren`)
17+
* Leland Boeman (:ghuser:`lboeman`)
18+
* David Larson (:ghuser:`dplarson`)

solarforecastarbiter/metrics/probabilistic.py

Lines changed: 58 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -468,15 +468,19 @@ def sharpness(fx_lower, fx_upper):
468468
def continuous_ranked_probability_score(obs, fx, fx_prob):
469469
"""Continuous Ranked Probability Score (CRPS).
470470
471-
CRPS = 1/n sum_{i=1}^n int (F_i - O_i)^2 dx
472-
473-
where F_i is the CDF of the forecast at time i and O_i is the CDF
474-
associated with the observed value obs_i:
471+
.. math::
475472
476-
O_{i, j} = 1 if obs_i <= fx_{i, j}, else O_{i, j} = 0
473+
\\text{CRPS} = \\frac{1}{n} \\sum_{i=1}^n \\int_{-\\infty}^{\\infty}
474+
(F_i(x) - \\mathbf{1} \\{x \\geq y_i \\})^2 dx
477475
478-
where obs_i is the observation at time i, and fx_{i, j} is the forecast at
479-
time i for CDF interval j.
476+
where :math:`F_i(x)` is the CDF of the forecast at time :math:`i`,
477+
:math:`y_i` is the observation at time :math:`i`, and :math:`\\mathbf{1}`
478+
is the indicator function that transforms the observation into a step
479+
function (1 if :math:`x \\geq y`, 0 if :math:`x < y`). In other words, the
480+
CRPS measures the difference between the forecast CDF and the empirical CDF
481+
of the observation. The CRPS has the same units as the observation. Lower
482+
CRPS values indicate more accurate forecasts, where a CRPS of 0 indicates a
483+
perfect forecast. [1]_ [2]_ [3]_
480484
481485
Parameters
482486
----------
@@ -492,7 +496,8 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
492496
Returns
493497
-------
494498
crps : float
495-
The Continuous Ranked Probability Score [unitless].
499+
The Continuous Ranked Probability Score, with the same units as the
500+
observation.
496501
497502
Raises
498503
------
@@ -502,22 +507,27 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
502507
array with d values or b) the forecasts are given as 2D arrays (n,d)
503508
but do not contain at least 2 CDF intervals (i.e. d < 2).
504509
505-
Examples
506-
--------
507-
508-
Forecast probabilities of <= 10 MW and <= 20 MW:
509-
>>> fx = np.array([[10, 20], [10, 20]])
510-
>>> fx_prob = np.array([[30, 50], [65, 100]])
511-
>>> obs = np.array([8, 12])
512-
>>> continuous_ranked_probability_score(obs, fx, fx_prob)
513-
4.5625
510+
Notes
511+
-----
512+
The CRPS can be calculated analytically when the forecast CDF is of a
513+
continuous parametric distribution, e.g., Gaussian distribution. However,
514+
since the Solar Forecast Arbiter makes no assumptions regarding how a
515+
probabilistic forecast was generated, the CRPS is instead calculated using
516+
numerical integration of the discretized forecast CDF. Therefore, the
517+
accuracy of the CRPS calculation is limited by the precision of the
518+
forecast CDF. In practice, this means the forecast CDF should 1) consist of
519+
at least 10 intervals and 2) cover probabilities from 0% to 100%.
514520
515-
Forecast thresholds for constant probabilities (25%, 75%):
516-
>>> fx = np.array([[5, 15], [8, 14]])
517-
>>> fx_prob = np.array([[25, 75], [25, 75]])
518-
>>> obs = np.array([8, 10])
519-
>>> continuous_ranked_probability_score(obs, fx, fx_prob)
520-
0.5
521+
References
522+
----------
523+
.. [1] Matheson and Winkler (1976) "Scoring rules for continuous
524+
probability distributions." Management Science, vol. 22, pp.
525+
1087-1096. doi: 10.1287/mnsc.22.10.1087
526+
.. [2] Hersbach (2000) "Decomposition of the continuous ranked probability
527+
score for ensemble prediction systems." Weather Forecast, vol. 15,
528+
pp. 559-570. doi: 10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2
529+
.. [3] Wilks (2019) "Statistical Methods in the Atmospheric Sciences", 4th
530+
ed. Oxford; Waltham, MA; Academic Press.
521531
522532
"""
523533

@@ -528,19 +538,37 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
528538
elif np.shape(fx)[1] < 2:
529539
raise ValueError("forecasts must have d >= 2 CDF intervals "
530540
f"(expected >= 2, got {np.shape(fx)[1]})")
531-
else:
532-
obs = np.tile(obs, (fx.shape[1], 1)).T
533541

534-
# event: 0=did not happen, 1=did happen
535-
o = np.where(obs <= fx, 1.0, 0.0)
542+
n = len(fx)
543+
544+
# extend CDF min to ensure obs within forecast support
545+
# fx.shape = (n, d) ==> (n, d + 1)
546+
fx_min = np.minimum(obs, fx[:, 0])
547+
fx = np.hstack([fx_min[:, np.newaxis], fx])
548+
fx_prob = np.hstack([np.zeros([n, 1]), fx_prob])
549+
550+
# extend CDF max to ensure obs within forecast support
551+
# fx.shape = (n, d + 1) ==> (n, d + 2)
552+
idx = (fx[:, -1] < obs)
553+
fx_max = np.maximum(obs, fx[:, -1])
554+
fx = np.hstack([fx, fx_max[:, np.newaxis]])
555+
fx_prob = np.hstack([fx_prob, np.full([n, 1], 100)])
556+
557+
# indicator function:
558+
# - left of the obs is 0.0
559+
# - obs and right of the obs is 1.0
560+
o = np.where(fx >= obs[:, np.newaxis], 1.0, 0.0)
561+
562+
# correct behavior when obs > max fx:
563+
# - should be 0 over range: max fx < x < obs
564+
o[idx, -1] = 0.0
536565

537566
# forecast probabilities [unitless]
538567
f = fx_prob / 100.0
539568

540569
# integrate along each sample, then average all samples
541-
integrand = (f - o) ** 2
542-
dx = np.diff(fx, axis=1)
543-
crps = np.mean(np.sum(integrand[:, :-1] * dx, axis=1))
570+
crps = np.mean(np.trapz((f - o) ** 2, x=fx, axis=1))
571+
544572
return crps
545573

546574

solarforecastarbiter/metrics/tests/test_calculator.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -361,12 +361,12 @@ def test_calculate_metrics_with_probablistic(single_observation,
361361
probabilistic._REQ_DIST)
362362

363363
expected = {
364-
0: ('total', 'crps', '0', 17.247),
365-
2: ('year', 'crps', '2019', 17.247),
366-
4: ('season', 'crps', 'JJA', 17.247),
367-
6: ('month', 'crps', 'Aug', 17.247),
368-
8: ('hour', 'crps', '0', 19.801000000000002),
369-
9: ('hour', 'crps', '1', 19.405)
364+
0: ('total', 'crps', '0', 21.41819),
365+
2: ('year', 'crps', '2019', 21.41819),
366+
4: ('season', 'crps', 'JJA', 21.41819),
367+
6: ('month', 'crps', 'Aug', 21.41819),
368+
8: ('hour', 'crps', '0', 28.103),
369+
9: ('hour', 'crps', '1', 26.634375),
370370
}
371371
attr_order = ('category', 'metric', 'index', 'value')
372372
for k, expected_attrs in expected.items():
@@ -1006,7 +1006,7 @@ def test_apply_deterministic_bad_metric_func():
10061006
('bs', [], [], [], None, None, np.NaN),
10071007
('bs', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
10081008
1009-
# Briar Skill Score with no reference
1009+
# Brier Skill Score with no reference
10101010
('bss', [1, 1, 1], [100, 100, 100], [0, 0, 0],
10111011
None, None, 1.),
10121012
('bss', [1, 1, 1], [100, 100, 100], [1, 1, 1],
@@ -1017,11 +1017,11 @@ def test_apply_deterministic_bad_metric_func():
10171017
('unc', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
10181018
10191019
# CRPS single forecast
1020-
('crps', [[1, 1]], [[100, 100]], [[0, 0]], None, None, 0.),
1020+
('crps', [[1, 1]], [[100, 100]], [0], None, None, 0.5),
10211021
# CRPS mulitple forecasts
10221022
('crps', [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
10231023
[[100, 100, 100], [100, 100, 100], [100, 100, 100]],
1024-
[0, 0, 0], None, None, 0.)
1024+
[0, 0, 0], None, None, 1.)
10251025
])
10261026
def test_apply_probabilistic_metric_func(metric, fx, fx_prob, obs,
10271027
ref_fx, ref_fx_prob, expect,

0 commit comments

Comments
 (0)