Metrics: make CRPS handle obs outside the fx support (#781)

dplarson · web-flow · commit d4d85a344ef3 · 2022-02-11T12:07:13.000-07:00
* tests for CPRS with obs outside forecast support (WIP)

* CRPS: handle obs outside forecast support

Revise the CRPS calculation to handle the case where the observation is
outside the forecast support. For example, if Prob(Power &lt;= 10 MW) = 0%
but then the observation is 9 MW. Or if Prob(Power &lt;= 30 MW) = 100% but
ten the observation is 31 MW. This change required some subtle changes
to how the vectorized calculation is performed. Also, this commit
switches the integration from the rectangular rule to a quadrature rule,
which seems result in more accurate CRPS calculations when the number of
forecast CDF intervals is low (e.g., 10 or less).

This commit also updates the docstring of the CRPS function and the
tests, including comparisons against examples where the forecast is
defined by a continuous parametric distribution that allows calculating
the CRPS analytically.

Note: this branch still needs to validate the CRPS skill score
calculation and related tests. Also, it would be good to include some
"simpler" CRPS calculation examples (e.g., with 3 or 5 CDF intervals,
but that may not be practical.

* CRPS: simplify integration using np.trapz

Simplify the integration using the numpy trapezoidal function. The
result is identical to the prior code (since underneath it's the same
operations), but using np.trapz() makes it clearer what method is being
used and allows users to read up on the numpy documentation regarding
the integration method.

* Add tests for CRPS and CRPSS (skill score)

Add additional tests for the CRPS and CRPSS (CRPS skill score)
functions, including "simple" examples with 3 CDF intervals to help show
the logical of the trapezoidal integration.

* CRPS: allow numpy broadcasting with n &gt;= 2

Modify how the forecast CDF is expanded and the observation indicator
function is calculated to support numpy broadcasting when the number of
samples is greater than or equal to two (i.e., n &gt;= 2).

* tests/probabilistic: use np.array for obs

Revert to assuming observations are provided as numpy arrays, including
in the case where the observation is for a single sample. This matches
the previous logic and helps prevent issues in other parts of the code
base.

* tests/calculator: update CRPS examples

Note that since the CRPS calculation now uses a more generalized
numerical integration setup, some of the example CRPS values had to be
adjusted as the "simple" CRPS examples in the calculator tests are not
very realistic (e.g., only 3 forecast CDF intervals). Rather than
complicate the examples in these tests, I instead corrected the CRPS
values for the given examples.

Also, this commit corrects a mispelling of the Brier score name in a
code comment.

* Add whatsnew/1.0.13.rst with CRPS revision

* whatsnew: correct name

* Address reviewer comments and suggested revisions.
diff --git a/docs/source/whatsnew/1.0.13.rst b/docs/source/whatsnew/1.0.13.rst
@@ -0,0 +1,18 @@
+.. _whatsnew_1013:
+
+.. py:currentmodule:: solarforecastarbiter
+
+
+1.0.13 (February 10, 2022)
+--------------------------
+
+Fixed
+~~~~~~~~~~~~
+* Revised CRPS metric to handle observations outside the forecast support. (:pull:`781`, :issue:`779`)
+
+Contributors
+~~~~~~~~~~~~
+
+* Will Holmgren (:ghuser:`wholmgren`)
+* Leland Boeman (:ghuser:`lboeman`)
+* David Larson (:ghuser:`dplarson`)
diff --git a/solarforecastarbiter/metrics/probabilistic.py b/solarforecastarbiter/metrics/probabilistic.py
@@ -468,15 +468,19 @@ def sharpness(fx_lower, fx_upper):
 def continuous_ranked_probability_score(obs, fx, fx_prob):
     """Continuous Ranked Probability Score (CRPS).
 
-        CRPS = 1/n sum_{i=1}^n int (F_i - O_i)^2 dx
-
-    where F_i is the CDF of the forecast at time i and O_i is the CDF
-    associated with the observed value obs_i:
+    .. math::
 
-        O_{i, j} = 1 if obs_i <= fx_{i, j}, else O_{i, j} = 0
+        \\text{CRPS} = \\frac{1}{n} \\sum_{i=1}^n \\int_{-\\infty}^{\\infty}
+        (F_i(x) - \\mathbf{1} \\{x \\geq y_i \\})^2 dx
 
-    where obs_i is the observation at time i, and fx_{i, j} is the forecast at
-    time i for CDF interval j.
+    where :math:`F_i(x)` is the CDF of the forecast at time :math:`i`,
+    :math:`y_i` is the observation at time :math:`i`, and :math:`\\mathbf{1}`
+    is the indicator function that transforms the observation into a step
+    function (1 if :math:`x \\geq y`, 0 if :math:`x < y`). In other words, the
+    CRPS measures the difference between the forecast CDF and the empirical CDF
+    of the observation. The CRPS has the same units as the observation. Lower
+    CRPS values indicate more accurate forecasts, where a CRPS of 0 indicates a
+    perfect forecast. [1]_ [2]_ [3]_
 
     Parameters
     ----------
@@ -492,7 +496,8 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
     Returns
     -------
     crps : float
-        The Continuous Ranked Probability Score [unitless].
+        The Continuous Ranked Probability Score, with the same units as the
+        observation.
 
     Raises
     ------
@@ -502,22 +507,27 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
         array with d values or b) the forecasts are given as 2D arrays (n,d)
         but do not contain at least 2 CDF intervals (i.e. d < 2).
 
-    Examples
-    --------
-
-    Forecast probabilities of <= 10 MW and <= 20 MW:
-    >>> fx = np.array([[10, 20], [10, 20]])
-    >>> fx_prob = np.array([[30, 50], [65, 100]])
-    >>> obs = np.array([8, 12])
-    >>> continuous_ranked_probability_score(obs, fx, fx_prob)
-    4.5625
+    Notes
+    -----
+    The CRPS can be calculated analytically when the forecast CDF is of a
+    continuous parametric distribution, e.g., Gaussian distribution. However,
+    since the Solar Forecast Arbiter makes no assumptions regarding how a
+    probabilistic forecast was generated, the CRPS is instead calculated using
+    numerical integration of the discretized forecast CDF. Therefore, the
+    accuracy of the CRPS calculation is limited by the precision of the
+    forecast CDF. In practice, this means the forecast CDF should 1) consist of
+    at least 10 intervals and 2) cover probabilities from 0% to 100%.
 
-    Forecast thresholds for constant probabilities (25%, 75%):
-    >>> fx = np.array([[5, 15], [8, 14]])
-    >>> fx_prob = np.array([[25, 75], [25, 75]])
-    >>> obs = np.array([8, 10])
-    >>> continuous_ranked_probability_score(obs, fx, fx_prob)
-    0.5
+    References
+    ----------
+    .. [1] Matheson and Winkler (1976) "Scoring rules for continuous
+           probability distributions." Management Science, vol. 22, pp.
+           1087-1096. doi: 10.1287/mnsc.22.10.1087
+    .. [2] Hersbach (2000) "Decomposition of the continuous ranked probability
+           score for ensemble prediction systems." Weather Forecast, vol. 15,
+           pp. 559-570. doi: 10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2
+    .. [3] Wilks (2019) "Statistical Methods in the Atmospheric Sciences", 4th
+           ed. Oxford; Waltham, MA; Academic Press.
 
     """
 
@@ -528,19 +538,37 @@ def continuous_ranked_probability_score(obs, fx, fx_prob):
     elif np.shape(fx)[1] < 2:
         raise ValueError("forecasts must have d >= 2 CDF intervals "
                          f"(expected >= 2, got {np.shape(fx)[1]})")
-    else:
-        obs = np.tile(obs, (fx.shape[1], 1)).T
 
-    # event: 0=did not happen, 1=did happen
-    o = np.where(obs <= fx, 1.0, 0.0)
+    n = len(fx)
+
+    # extend CDF min to ensure obs within forecast support
+    # fx.shape = (n, d) ==> (n, d + 1)
+    fx_min = np.minimum(obs, fx[:, 0])
+    fx = np.hstack([fx_min[:, np.newaxis], fx])
+    fx_prob = np.hstack([np.zeros([n, 1]), fx_prob])
+
+    # extend CDF max to ensure obs within forecast support
+    # fx.shape = (n, d + 1) ==> (n, d + 2)
+    idx = (fx[:, -1] < obs)
+    fx_max = np.maximum(obs, fx[:, -1])
+    fx = np.hstack([fx, fx_max[:, np.newaxis]])
+    fx_prob = np.hstack([fx_prob, np.full([n, 1], 100)])
+
+    # indicator function:
+    # - left of the obs is 0.0
+    # - obs and right of the obs is 1.0
+    o = np.where(fx >= obs[:, np.newaxis], 1.0, 0.0)
+
+    # correct behavior when obs > max fx:
+    # - should be 0 over range: max fx < x < obs
+    o[idx, -1] = 0.0
 
     # forecast probabilities [unitless]
     f = fx_prob / 100.0
 
     # integrate along each sample, then average all samples
-    integrand = (f - o) ** 2
-    dx = np.diff(fx, axis=1)
-    crps = np.mean(np.sum(integrand[:, :-1] * dx, axis=1))
+    crps = np.mean(np.trapz((f - o) ** 2, x=fx, axis=1))
+
     return crps
 
 
diff --git a/solarforecastarbiter/metrics/tests/test_calculator.py b/solarforecastarbiter/metrics/tests/test_calculator.py
@@ -361,12 +361,12 @@ def test_calculate_metrics_with_probablistic(single_observation,
                          probabilistic._REQ_DIST)
 
     expected = {
-        0: ('total', 'crps', '0', 17.247),
-        2: ('year', 'crps', '2019', 17.247),
-        4: ('season', 'crps', 'JJA', 17.247),
-        6: ('month', 'crps', 'Aug', 17.247),
-        8: ('hour', 'crps', '0', 19.801000000000002),
-        9: ('hour', 'crps', '1', 19.405)
+        0: ('total', 'crps', '0', 21.41819),
+        2: ('year', 'crps', '2019', 21.41819),
+        4: ('season', 'crps', 'JJA', 21.41819),
+        6: ('month', 'crps', 'Aug', 21.41819),
+        8: ('hour', 'crps', '0', 28.103),
+        9: ('hour', 'crps', '1', 26.634375),
     }
     attr_order = ('category', 'metric', 'index', 'value')
     for k, expected_attrs in expected.items():
@@ -1006,7 +1006,7 @@ def test_apply_deterministic_bad_metric_func():
     ('bs', [], [], [], None, None, np.NaN),
     ('bs', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
 
-    # Briar Skill Score with no reference
+    # Brier Skill Score with no reference
     ('bss', [1, 1, 1], [100, 100, 100], [0, 0, 0],
         None, None, 1.),
     ('bss', [1, 1, 1], [100, 100, 100], [1, 1, 1],
@@ -1017,11 +1017,11 @@ def test_apply_deterministic_bad_metric_func():
     ('unc', [1, 1, 1], [100, 100, 100], [1, 1, 1], None, None, 0.),
 
     # CRPS single forecast
-    ('crps', [[1, 1]], [[100, 100]], [[0, 0]], None, None, 0.),
+    ('crps', [[1, 1]], [[100, 100]], [0], None, None, 0.5),
     # CRPS mulitple forecasts
     ('crps', [[1, 1, 1], [2, 2, 2], [3, 3, 3]],
              [[100, 100, 100], [100, 100, 100], [100, 100, 100]],
-             [0, 0, 0], None, None, 0.)
+             [0, 0, 0], None, None, 1.)
 ])
 def test_apply_probabilistic_metric_func(metric, fx, fx_prob, obs,
                                          ref_fx, ref_fx_prob, expect,
diff --git a/solarforecastarbiter/metrics/tests/test_probabilistic.py b/solarforecastarbiter/metrics/tests/test_probabilistic.py