You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: knitr/disease_transmission/boarding_school_case_study.Rmd
+31-26Lines changed: 31 additions & 26 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: "Bayesian workflow for disease transmission modeling in Stan"
3
-
author: "Léo Grinsztajn^[[email protected]], Elizaveta Semenova, Charles C. Margossian, Julien Riou"
3
+
author: "Léo Grinsztajn^[École polytechnique, Palaiseau, France, [email protected]], Elizaveta Semenova^[Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK], Charles C. Margossian^[Department of Statistics, Columbia University, New York, NY, USA], Julien Riou^[Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland]"
4
4
link-citations: true
5
5
output:
6
6
html_document:
@@ -96,7 +96,7 @@ and scale up ODEs in Stan.
96
96
Throughout the tutorial, we use R as a scripting language^[Note that Stan can also be used with other langages such as Python or Julia, see [here](https://mc-stan.org/users/interfaces/) for the list of Stan interfaces],
97
97
and, while we review some elementary concepts,
98
98
assume the reader has basic familiarity with Bayesian inference
99
-
and Stan.
99
+
and Stan. The source code of this case study can be found on Github [here](https://github.com/stan-dev/example-models/tree/master/knitr/disease_transmission).
100
100
101
101
102
102
# 1 Simple SIR
@@ -163,7 +163,6 @@ Let's give some intuition behind these ODEs. The proportion of infected people a
163
163
The above model holds under several assumptions:
164
164
165
165
* births and deaths are not contributing to the dynamics and the total population $N=S+I+R$ remains constant,
166
-
167
166
* recovered individuals do not become susceptible again over time,
168
167
169
168
* the infection rate $\beta$ and recovery rate $\gamma$ are constant,
@@ -538,7 +537,8 @@ Here we see that the model gives a satisfying fit to the data,
538
537
and that the model uncertainty is able to capture the variation of the data.
Maybe we also want to access the true number of infected people at each time, and not just the number of students in bed. This is a latent variable for which we have an estimation.
552
552
```{r}
553
553
params <- lapply(t, function(i){sprintf("y[%s,2]", i)}) #number of infected for each day
pars = params, probs = c(0.05, 0.5, 0.95))$summary)
555
556
colnames(smr_y) <- make.names(colnames(smr_y)) # to remove % in the col names
556
557
557
558
ggplot(smr_y, mapping = aes(x = t)) +
@@ -686,7 +687,7 @@ ggplot(data = df_test) +
686
687
scale_x_log10()
687
688
```
688
689
689
-
We can do the same thing for R0 (again, on the log-scale), the loose bounds being 0.3 and 30.
690
+
We can do the same thing for $R_0$ (again, on the log-scale), the loose bounds being 0.3 and 30.
690
691
```{r}
691
692
df_test <- tibble(r = s_prior$R0)
692
693
ggplot(data = df_test) +
@@ -698,38 +699,37 @@ ggplot(data = df_test) +
698
699
```
699
700
700
701
We thus see that these distributions are coherent with domain knowledge. See [here](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations) for more recommendations on prior choice.
701
-
^[Previoulsy, we fitted the data with these priors and found a posteriori R0 ~ 3 and a recovery time of ~ 2 days. This is quite unexpected from our basic domain knowledge, but can probably be explained (R0 bigger among students? isolated students counts for recovered in the model? etc). This shows that the prior should not be too constraining in order to incorporate both prior knowledge and unexpected knowledge from the data.]
702
+
^[Previoulsy, we fitted the data with these priors and found a posteriori $R_0$ ~ 3 and a recovery time of ~ 2 days. This is quite unexpected from our basic domain knowledge, but can probably be explained ($R_0$ bigger among students? isolated students counts for recovered in the model? etc). This shows that the prior should not be too constraining in order to incorporate both prior knowledge and unexpected knowledge from the data.]
702
703
703
704
We can also plot trajectories of infection according to the prior,
704
705
that is the number of infected people at each time accoring to prior distributions of parameters.
(the number of infected stays below the total number of people depicted in red,
730
-
we see the number growing then decreasing etc.),
731
-
and quite diverse. Still, some of the curves look a little bit funky
732
-
and suggest we could refine our priors and make them more informative, although it may not be needed here.
732
+
It seems that most trajectories are reasonable and quite diverse. Still, some of the curves look a little bit funky and suggest we could refine our priors and make them more informative, although it may not be needed here.
733
733
734
734
Typically, we can get away with priors that do not capture all our *a priori* knowledge,
735
735
provided the data is informative enough.
@@ -799,8 +799,10 @@ These are all questions this simple test can help us tackle.
799
799
800
800
We take one arbitrary draw from the prior distribution
801
801
```{r}
802
-
draw <- 12 # one arbitrary draw from the prior distribution
803
-
cases_simu <- s_prior$pred_cases[draw,] # the number of predicted cases sampled from the prior distribution, which we will use as data
802
+
# one arbitrary draw from the prior distribution
803
+
draw <- 12
804
+
# the number of predicted cases sampled from the prior distribution, which we will use as data
805
+
cases_simu <- s_prior$pred_cases[draw,]
804
806
```
805
807
806
808
And use it as data which we fit to our model.
@@ -814,7 +816,8 @@ We can then examine the estimated posterior distribution.
0 commit comments