macpan-book/08-ensemble-forecasts.Rmd at main · canmod/macpan-book · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# Ensemble Forecasts

```{r redo_obs_err_fit, echo = FALSE, eval = FALSE}
sir_with_obs_err = (flexmodel(
    params = c(beta = 0.1, gamma = 0.01, N = 100),
    state = c(S = 99, I = 1, R = 0),
    start_date = "2020-03-11",
    end_date = "2020-12-01",
    do_make_state = FALSE
  )
  %>% add_rate("S", "I", ~ (I) * (beta) * (1/N))
  %>% add_rate("I", "R", ~ (gamma))
  %>% update_loss_params(data.frame(
    Parameter = "nb_disp",
    Distribution = "negative_binomial",
    Variable = c("S", "I", "R")
  ))
  %>% update_params(c(
    nb_disp_S = 1e4,
    nb_disp_I = 1e4,
    nb_disp_R = 1e4
  ))
)
set.seed(1L)
noisy_data = (sir_with_obs_err
  %>% simulation_history(include_initial_date = FALSE, obs_error = TRUE)
  %>% select(Date, I)
  %>% pivot_longer(-Date, names_to = "var", values_to = "value")
  %>% rename(date = Date)
)
sir_to_calibrate = (sir_with_obs_err
  %>% update_observed(
    noisy_data,
    loss_params = loss_params(sir_with_obs_err)
  )
  %>% update_opt_params(
    log_beta ~ log_flat(0),
    log_nb_disp_S ~ log_normal(10, 0.01), # fix at ~Poisson error
    log_nb_disp_I ~ log_normal(0, 5),     # free parameter
    log_nb_disp_R ~ log_normal(10, 0.01)  # fix at ~Poisson error
  )
)
sir_calibrated = calibrate_flexmodel(sir_to_calibrate)
```

Good forecasts include measures of uncertainty associated with each forecasted data point. The `simulate_ensemble` function allows one to make such forecasts. In [Calibrating with Observation Error] we fitted a model to simulated data with observation error. Here we use this model to illustrate the use of the `simulate_ensemble` function.

The first thing to do before making a forecast is to extend the end date, so that we are actually forecasting the future.
```{r extend_end_date}
sir_obs_err_to_forecast = extend_end_date(
  sir_obs_err_calibrated,
  days_to_extend = 100
)
```

Next just pass the model that is ready for forecasting to the `simulate_ensemble` function.
```{r simulate_ensemble, eval = TRUE}
obs_err_forecasts = (simulate_ensemble(
    sir_obs_err_to_forecast,
    use_progress_bar = FALSE
  )
  %>% filter(var != "S_to_I")
)
```
```{r paged_obs_err_forecasts, echo = FALSE}
rmarkdown::paged_table(obs_err_forecasts)
```

The forecasts can be joined back to the observed data to make plots (TODO: fix the naming of the columns of these functions to be more consistent).
```{r join_obs_data_to_sims, eval = TRUE}
obs_err_fits = (obs_err_forecasts
  %>% left_join(
    noisy_data,
    c("Date" = "date", "var" = "var"),
    suffix = c("_fitted", "")
  )
  %>% mutate(
    var = factor(
      var,
      levels = topological_sort(sir_obs_err_to_forecast))
  )
)
(ggplot(obs_err_fits)
  + facet_wrap(~var, scales = 'free', nrow = 2)
  + geom_ribbon(aes(x = Date, ymax = upr, ymin = lwr), alpha = 0.5)
  + geom_point(aes(Date, value), alpha = 0.7)
  + geom_line(aes(Date, value_fitted), colour = "red")
)
```


## Time-Varying Ensemble Forecasts

Forecasting often involves exploring what-if-scenarios. For example, what if the transmission rate jumped to high levels in the forecast period due to a policy option under consideration? Such scenario exploration can be done in McMasterPandemic by adding piece-wise time-variation of parameters in the forecasting period using the `add_piece_wise` function.

```{r sir_tv_to_forecast}
sir_tv_to_forcast = (sir_cal_tv
  %>% extend_end_date(days_to_extend = 120)
  %>% add_piece_wise(
    data.frame(
      Date = "2021-01-01",
      Symbol = "beta",
      Value = 0.5,
      Type = "abs"
    )
  )
)
```

An ensemble forecast can be produced with this scenario model. Here we do just that, while joining back the observed data so that we may plot both the fit and the forecast.
```{r forecast_tv}
(sir_tv_to_forcast
  %>% simulate_ensemble(use_progress_bar = FALSE)
  %>% filter(var != "S_to_I")
  %>% left_join(
    sir_cal_tv$observed$data,
    c("Date" = "date", "var" = "var"),
    suffix = c("_fitted", "")
  )
  %>% mutate(var = factor(var, topological_sort(sir_cal_tv)))
  %>% ggplot
  + facet_wrap(~var, ncol = 2)
  + geom_line(aes(Date, value))
  + geom_line(aes(Date, value_fitted), colour = 'red')
  + geom_ribbon(aes(x = Date, ymax = upr, ymin = lwr), alpha = 0.5)
)
```

Notice the mini-peak in infected individuals caused by the jump in transmission rate in the forecast period, accompanied by rapid susceptible depletion.