Skip to content

Commit f68418d

Browse files
seabbs-botclaudeseabbs
authored
Improve variogram docs, deduplicate point implementation, and update vignette (#1132)
* Improve variogram docs and deduplicate point implementation - Have variogram_score_multivariate_point() delegate to variogram_score_multivariate() with w = NULL instead of reimplementing the same loop - Use @inheritParams and @inherit to consolidate roxygen docs across the sample and point variogram functions - Improve parameter docs for w (fix dat -> predicted reference), w_vs (explain pairwise weighting), and p - Add proper descriptions, @return, and @references to energy_score_multivariate() - Fix energy score keyword from internal_input_check to metric Closes #1118, closes #1124 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update multivariate vignette with variogram score and point forecasts - Add section on variogram score alongside energy score with explanation of what each measures - Show how to customise variogram parameters via purrr::partial() - Add multivariate point forecast section using as_forecast_multivariate_point() - Add variogram score examples to the matrix-based section, including weight matrix (w_vs) and order parameter (p) Closes #1116 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Flesh out variogram score explanation in docs and vignette - Explain what the variogram score measures in concrete terms (pairwise differences between targets, not abstract dimensions) - Improve p parameter docs to explain how it scales differences - Use "targets" instead of "dimensions" throughout for clarity - Add note on p parameter and link to docs in vignette Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Flesh out variogram score explanation in roxygen - Explain what the score measures in concrete terms (pairwise differences between targets, not abstract dimensions) - Improve p parameter docs to explain how it scales differences - Use "targets" instead of "dimensions" throughout for clarity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Reorder vignette text: customisation intro before p explanation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove matrix-based scoringRules comparison section from vignette The section compared scoringutils wrappers against raw scoringRules calls, which is an implementation detail that doesn't help users. The vignette now focuses on the scoringutils API via score(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Mention mv_group_id grouping in score function descriptions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
1 parent e6e6bda commit f68418d

File tree

6 files changed

+180
-129
lines changed

6 files changed

+180
-129
lines changed

R/metrics-multivariate-point.R

Lines changed: 17 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,16 @@
11
#' @title Variogram score for multivariate point forecasts
22
#' @description
3-
#' Compute the variogram score
4-
#' (see [scoringRules::vs_sample()])
5-
#' for each group defined by `mv_group_id`, treating each point
6-
#' forecast as a single-sample ensemble.
7-
#' @param observed Numeric vector of observed values.
3+
#' Compute the variogram score for multivariate point forecasts,
4+
#' treating each point forecast as a single-sample ensemble.
5+
#' This is a thin wrapper around
6+
#' [variogram_score_multivariate()] with `w = NULL`.
7+
#'
8+
#' See [variogram_score_multivariate()] for details on the
9+
#' variogram score and its parameters.
10+
#' @inheritParams variogram_score_multivariate
11+
#' @inherit variogram_score_multivariate return references
812
#' @param predicted Numeric matrix with one column, where each row
913
#' corresponds to a target within a multivariate group.
10-
#' @param mv_group_id Numeric vector of length equal to
11-
#' `length(observed)` with group identifiers.
12-
#' @param w_vs Numeric matrix of weights for the variogram score.
13-
#' See [scoringRules::vs_sample()] for details.
14-
#' @param p Numeric, order of the variogram score.
15-
#' Defaults to 0.5. See [scoringRules::vs_sample()] for details.
16-
#' @return A named numeric vector of scores, one per multivariate
17-
#' group. Lower values are better.
18-
#' @importFrom scoringRules vs_sample
1914
#' @importFrom checkmate assert_numeric
2015
#' @export
2116
#' @keywords metric
@@ -27,19 +22,13 @@ variogram_score_multivariate_point <- function(
2722
assert_numeric(observed, min.len = 1)
2823
assert_numeric(as.vector(predicted), min.len = 1)
2924
assert_numeric(mv_group_id, len = length(observed))
30-
unique_groups <- unique(mv_group_id)
31-
32-
vs <- vapply(unique_groups, function(group) {
33-
idx <- which(mv_group_id == group)
34-
scoringRules::vs_sample(
35-
y = observed[idx],
36-
dat = predicted[idx, , drop = FALSE],
37-
w_vs = w_vs,
38-
p = p
39-
)
40-
}, numeric(1))
41-
42-
names(vs) <- unique_groups
43-
return(vs)
25+
variogram_score_multivariate(
26+
observed = observed,
27+
predicted = predicted,
28+
mv_group_id = mv_group_id,
29+
w = NULL,
30+
w_vs = w_vs,
31+
p = p
32+
)
4433
}
4534
# nolint end

R/metrics-multivariate-sample.R

Lines changed: 44 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,28 @@ assert_input_multivariate_sample <- function(observed, predicted, mv_group_id) {
2929

3030
#' @title Energy score for multivariate forecasts
3131
#' @description
32-
#' Compute the multivariate energy score
33-
#' (see \link[scoringRules:es_sample]{scoringRules::es_sample})
34-
#' for each group defined by `mv_group_id`.
32+
#' Compute the energy score (Gneiting et al., 2008) for each
33+
#' multivariate group defined by `mv_group_id`. The energy
34+
#' score is a multivariate generalisation of the CRPS that
35+
#' measures both calibration and sharpness of the forecast
36+
#' distribution.
37+
#'
38+
#' The score is computed using
39+
#' [scoringRules::es_sample()].
3540
#' @inheritParams ae_median_sample
3641
#' @inheritParams assert_input_multivariate_sample
37-
#' @inherit scoringRules::es_sample params
38-
#' @keywords internal_input_check
42+
#' @param w Optional numeric vector of weights for forecast samples
43+
#' (length equal to the number of columns of `predicted`).
44+
#' If `NULL` (the default), equal weights are used.
45+
#' @return A named numeric vector of scores, one per multivariate
46+
#' group. Lower values are better.
47+
#' @references
48+
#' Gneiting, T., Stanberry, L.I., Grimit, E.P., Held, L. and
49+
#' Johnson, N.A. (2008). Assessing probabilistic forecasts of
50+
#' multivariate quantities, with an application to ensemble
51+
#' predictions of surface winds.
52+
#' *TEST*, 17, 211-235.
53+
#' @keywords metric
3954
#' @export
4055
energy_score_multivariate <- function(observed, predicted, mv_group_id, w = NULL) {
4156
assert_input_multivariate_sample(observed, predicted, mv_group_id)
@@ -54,21 +69,35 @@ energy_score_multivariate <- function(observed, predicted, mv_group_id, w = NULL
5469
#' Variogram score for multivariate forecasts
5570
#'
5671
#' @description
57-
#' Compute the variogram score for multivariate forecasts.
58-
#' The variogram score (Scheuerer and Hamill, 2015) evaluates the
59-
#' dependence structure of multivariate forecasts by comparing
60-
#' predicted pairwise differences against observed pairwise
61-
#' differences.
72+
#' Compute the variogram score for each multivariate group
73+
#' defined by `mv_group_id`.
74+
#' The variogram score (Scheuerer and Hamill, 2015) assesses
75+
#' whether a forecast captures the correlation structure across
76+
#' the targets being forecast jointly (e.g. locations, age
77+
#' groups). For each pair of targets (i, j), it compares the
78+
#' observed absolute difference |y_i - y_j|^p against the
79+
#' expected absolute difference under the forecast distribution.
80+
#' A forecast that misspecifies correlations between targets
81+
#' will predict pairwise differences that do not match the
82+
#' observations, resulting in a higher score.
6283
#'
6384
#' The score is computed using
6485
#' [scoringRules::vs_sample()].
6586
#'
6687
#' @inheritParams energy_score_multivariate
67-
#' @param w_vs Optional non-negative weight matrix. If not `NULL`,
68-
#' must be a square matrix with dimensions equal to the number
69-
#' of targets within each multivariate group.
70-
#' @param p Numeric, order of the variogram score.
71-
#' Typical choices are 0.5 (default, more robust) and 1.
88+
#' @param w_vs Optional non-negative weight matrix for the
89+
#' pairwise comparisons between targets. Entry `w_vs[i, j]`
90+
#' controls the importance of the pair (i, j) in the score.
91+
#' Must be a symmetric square matrix with rows and columns
92+
#' equal to the number of targets within each multivariate
93+
#' group.
94+
#' If `NULL` (the default), all pairs are weighted equally.
95+
#' @param p Numeric, order of the variogram score. This controls
96+
#' how pairwise differences are scaled: the score compares
97+
#' |y_i - y_j|^p across targets. Lower values of `p` give
98+
#' less weight to large differences, making the score more
99+
#' robust to outliers. Typical choices are 0.5 (the default)
100+
#' and 1.
72101
#' @return A named numeric vector of scores, one per multivariate
73102
#' group. Lower values are better.
74103
#' @references

man/energy_score_multivariate.Rd

Lines changed: 23 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/variogram_score_multivariate.Rd

Lines changed: 27 additions & 11 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/variogram_score_multivariate_point.Rd

Lines changed: 32 additions & 11 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/scoring-multivariate-forecasts.Rmd

Lines changed: 37 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -83,71 +83,49 @@ The column `.mv_group_id` is created automatically and represents an identifier
8383
score(example_multiv)
8484
```
8585

86-
If, at any point, you want to score the same forecast using different groupings, you'd have create a new separate forecast object with a different grouping and score that new forecast object.
87-
88-
89-
## Univariate and multivariate scoring for matrices
90-
91-
Note: this section may only be relevant to you if you're planning to score forecasts in matrix format.
92-
93-
Let's construct a simple multivariate forecast:
94-
95-
```{r}
96-
# parameters for multivariate normal example
97-
set.seed(123)
98-
d <- 10 # number of dimensions
99-
m <- 50 # number of samples from multivariate forecast distribution
100-
101-
mu0 <- rep(0, d)
102-
mu <- rep(1, d)
103-
104-
S0 <- S <- diag(d)
105-
S0[S0 == 0] <- 0.2
106-
S[S == 0] <- 0.1
107-
108-
# generate samples from multivariate normal distributions
109-
obs <- drop(mu0 + rnorm(d) %*% chol(S0))
110-
fc_sample <- replicate(m, drop(mu + rnorm(d) %*% chol(S)))
111-
112-
obs2 <- drop(mu0 + rnorm(d) %*% chol(S0))
113-
fc_sample2 <- replicate(m, drop(mu + rnorm(d) %*% chol(S)))
114-
```
115-
116-
Now, we can compute the Energy Score. Let's compare the `scoringutils` implementation with that of the `scoringRules` package, on which the `scoringutils` implementation is based. The only difference is that `scoringRules` always expects a single multivariate `forecast`, while the `scoringutils` implementation can handle multiple multivariate forecasts together, identified via a grouping vector (assuming they all have the same dimension).
86+
By default, `score()` computes both the energy score and the variogram score for multivariate sample forecasts.
87+
The energy score is a multivariate generalisation of the CRPS that measures overall forecast accuracy.
88+
The variogram score (Scheuerer and Hamill, 2015) specifically targets the correlation structure between the targets being forecast jointly.
89+
For each pair of targets (e.g. two countries), it compares the observed absolute difference |y_i - y_j|^p against what the forecast distribution predicts for that difference.
90+
A forecast that gets the correlations between targets wrong will predict pairwise differences that do not match the observations, producing a higher score.
91+
This makes the variogram score more sensitive to misspecified correlations than the energy score.
92+
93+
You can customise parameters using `purrr::partial()`.
94+
The order parameter `p` controls how differences are scaled: `p = 0.5` (the default) is more robust to outliers, while `p = 1` gives a standard absolute difference.
95+
See `?variogram_score_multivariate` for full parameter documentation.
96+
For example, to use `p = 1`:
11797

11898
```{r}
119-
scoringRules::es_sample(y = obs, dat = fc_sample)
120-
# in the univariate case, Energy Score and CRPS are the same
121-
# illustration: Evaluate forecast sample for the first variable
122-
es_sr1 <- scoringRules::es_sample(y = obs, dat = fc_sample)
123-
es_sr2 <- scoringRules::es_sample(y = obs2, dat = fc_sample2)
124-
es_sr <- c(es_sr1, es_sr2)
125-
126-
es_su <- energy_score_multivariate(
127-
observed = c(obs, obs2),
128-
predicted = rbind(fc_sample, fc_sample2),
129-
mv_group_id = c(rep(1, d), rep(2, d))
99+
score(
100+
example_multiv,
101+
metrics = list(
102+
energy_score = energy_score_multivariate,
103+
variogram_score = purrr::partial(
104+
variogram_score_multivariate, p = 1
105+
)
106+
)
130107
)
131-
all.equal(es_sr, es_su, tolerance = 1e-6, check.attributes = FALSE)
132108
```
133109

134-
You can provide observation weights when computing the Energy Score.
110+
## Multivariate point forecasts
111+
112+
If you have point forecasts rather than samples, you can score them using the variogram score via `as_forecast_multivariate_point()`.
113+
This treats each point forecast as a single-sample ensemble.
135114

136115
```{r}
137-
# illustration of observation weights for Energy Score
138-
# example: equal weights for first half of draws; zero weights for other draws
139-
w <- rep(c(1, 0), each = 0.5 * m) / (0.5 * m)
140-
141-
es_sr1 <- scoringRules::es_sample(y = obs, dat = fc_sample, w = w)
142-
es_sr2 <- scoringRules::es_sample(y = obs2, dat = fc_sample2, w = w)
143-
es_sr <- c(es_sr1, es_sr2)
144-
145-
es_su <- energy_score_multivariate(
146-
observed = c(obs, obs2),
147-
predicted = rbind(fc_sample, fc_sample2),
148-
mv_group_id = c(rep(1, d), rep(2, d)),
149-
w = w
150-
)
116+
example_point_multi <- example_point[
117+
target_type == "Cases" &
118+
forecast_date == "2021-05-03" &
119+
target_end_date == "2021-05-15" &
120+
horizon == 2 &
121+
model == "EuroCOVIDhub-ensemble"
122+
]
151123
152-
all.equal(es_sr, es_su, tolerance = 1e-6, check.attributes = FALSE)
124+
example_mv_point <- as_forecast_multivariate_point(
125+
data = na.omit(example_point_multi),
126+
joint_across = c("location", "location_name")
127+
)
128+
score(example_mv_point)
153129
```
130+
131+
If, at any point, you want to score the same forecast using different groupings, you'd have create a new separate forecast object with a different grouping and score that new forecast object.

0 commit comments

Comments
 (0)