Calibration routine to first observed death#39
Calibration routine to first observed death#39KOVALW wants to merge 28 commits intopgv5_clinical_statusfrom
Conversation
abea006 to
1be24fd
Compare
erik-rosenstrom
left a comment
There was a problem hiding this comment.
I think its looking good. The framework is fairly easy to follow. Documentation will also be helpful.
experiments/phase1/README.md
Outdated
| Then, to render the analysis report, ensure that `tinytex` is installed with | ||
|
|
||
| ``` | ||
| quarto install tinyext |
There was a problem hiding this comment.
| quarto install tinyext | |
| quarto install tinytex |
| ) | ||
| ``` | ||
|
|
||
| Histograms of posterior samples compared to the probability density of each prior. To generate the histogram, we sample $n$ particles from the posterior population, where $n$ is the effective sample size of the population, in order to reflect the weight distribution of the posterior. For each plot, the prior is plotted as a blue line and the histogram of the posterior samples is plotted in orange with a kernel density estimator overlay. |
There was a problem hiding this comment.
Nit: it might be helpful to define what the weight distribution is here.
| "seed": 1234, | ||
| "max_time": 100.0, | ||
| "synth_population_file": "input/people_test.csv", | ||
| "symptomatic_reporting_prob": 0.5, |
There was a problem hiding this comment.
Was there any discussion on separating parameters more clearly by what part of the pipeline they belong to? It was unclear what symptomatic_reporting_prob was since its with all ixa-epi-covid model specific parameters. Only in the covid_model.rs does it become clear this is the single importation model parameter that is not tied to an ixa model parameter that is named something else. I think if the parameters are to be kept together this needs to be clarified in documentation
There was a problem hiding this comment.
I've landed some changes to wtk-dev that address this with a far cleaner particle reader. Parameters can now be specified in any structure, so long as they exist in the default values dictionary. To show this, I moved the reporting probability to the importation params, which I agree makes more sense while we have no surveillance in the transmission model
| #| echo: false | ||
| posterior_samples = results.sample_posterior_particles(n=int(results.ess)) | ||
|
|
||
| for param in results.fitted_params: |
There was a problem hiding this comment.
These types of plots seem like they would be of interest every time the model is calibrated. Is it possible to implement a method on CalibrationResults to produce these?
There was a problem hiding this comment.
We haven't landed CalibrationResults as a feature on main in calibrationtools, so I'm going to keep this longer function here for now to avoid over-complicating the dev branch with non-approved API. I've included an un-evaluated preferred API code block in this report though to be clear that this is not a final document and instead reflects the current API.
| ## Obtaining the importation time series and death incidence data frmaes for a random smaple form the posterior | ||
| # Re-generating a random sample of parameter sets from posterior | ||
| particle_count = 100 | ||
| particles = results.sample_posterior_particles(n=particle_count) |
There was a problem hiding this comment.
A similar point to the one above. This seems like something you will do with every calibrated model and is a substantial amount of code to produce for every model report. Is this something that can be moved into a CalibrationResults method
There was a problem hiding this comment.
Something like re-run particles is valuable and could be reasonable to include on the CalibrationResults methods. I was also over-complicating what needed to be done because the cleaner approach is to modify the return method of covid_model. It now can grab whichever file reports you want in a dictionary and the for loop is easier to read
| plt.title(f"Imported infections over time ({particle_count} posterior samples)") | ||
| plt.xlabel("Time (days since simulation start)") | ||
| plt.ylabel("Number of imported infections (daily)") | ||
| plt.tight_layout() |
There was a problem hiding this comment.
This plot needs a legend
| ) | ||
| ``` | ||
|
|
||
| For a final SMC step with threshold toelrance above zero, we will observe some variance in the date of first reported death. Here we show a sampled histogram of the timing of the first death across the posterior simulations. The vertical dashed line indicates the observed timing of the first death in Indiana (March 16, 2020, or 75 days after the start of the simulation on January 1, 2020). We can see that while although the distribution is centered around the observed timing, there is a higher weight on earlier first reported deaths, indicating that imported infections occur faster in the simulated model than they occured in real life. This may be due to the proprotional sampling of Indiana importations from the domestic level imported infections, when in fact they may have been less probable due to mobility flow patterns. |
There was a problem hiding this comment.
Are the importations proportion to Indiana (population = 6.7 million in 2020) when people test has a population of 1000? We should probably leave direct interpretation of results/limitations till that is fixed
There was a problem hiding this comment.
If this is true, and considering the long run time of modeling all of indiana, maybe we should switch to a slightly larger custom synthetic population, and ensure that the importations are proportional to its size. Once we can run on azure we can switch to the full synthetic population
| Finally, we can plot the same trajectories for number of infections observed in the model over time, which includes both imported infections and locally acquired infections. This figure shows the issue with calibratin the model with a small population size (1,000 people). The susceptible pool is rapidly depleted by local transmission, and a high proportion of the population is already recovered by the timepoint of evaluation comparing the model results to the first reported death in the data. | ||
|
|
||
| ```{python} | ||
| #| echo: false |
There was a problem hiding this comment.
Can this plot can be rotated? It's hard to examine now at its current size.
Calibration routine overview
The script
phase_1_calibration.pyruns an ABC-SMC routine with 6 steps.The distance measure is the time difference in days between the first observed death in the simulation and the first reported death in Indiana, with time t=0 representing January 1, 2020. The six step sizes were [30, 20, 10, 5, 2, 0], such that the last SMC step only filtered for simulations that recaptured exactly the same day of first observed death as that reported in the data.
The parameters being fit are the symptomatic reporting probability, the$\alpha$ density-dependent scaling factor for Home settings, the probability of developing symptoms, and the rate of infection attempt generation while individuals are infectious.
The priors are as follows:
symptomatic_reporting_prob~ Beta(5, 5)settings_properties.Home.alpha~ Uniform(0, 1)probability_mild_given_infect~ Beta(7, 3)infectiousness_rate_fn.Constant.rate~ Uniform(0.1, 2.0)These priors are illustrative, but are based on loosely held prior belief that the probability of reporting imported symptomatic infections was approximately 0.5 during this stage of the pandemic, and that approximately 70% of individuals would go on to develop at least mild symptoms.
The perturbation kernel variance was adapted on each SMC step according to the previous step's population variance.
Code description
In order to calibrate the transmission model to the day of first observed death, this PR introduces
MRP runner of importation and transmission
The MRP runner first uses the parameters specified for the ixa model to generate those required for the importation module. Namely, this is to calculate the overall case fatality ratio from clinical status transition probabilities, the asymptomatic ratio from the probability of developing symptoms, and introduces a new parameter for surveillance reporting.
The model then assumes categorical values (e.g. Indiana for the state model and 2020 as the year). These may be moved to a
model_selectioncategory in the inputs dictionary to generalize theCovidModel.simulate()function.The parameters are dumped to a json file and read by the ixa transmission model, with only incidence counts being returned. This may again be generalized by supplying user arguments to the config inputs section of the model inputs.
Calibration routine
The routine can be run using
uv run python scripts/phase_1_calibration.pyThe calibration routine uses tools from the
cfa-calibration-toolsrepo on thewtk-devbranch. The source code is branched off of thepgv5_clinical_statusbranch ofixa-epi-covid.The priors are read form a JSON file and supplied to the ABCSampler. The future default perturbation kernel (multivariate normal) and variance adapter are applied.
The outputs to distance function is generated inside the script as well, here calculating the difference between the observed time of the first death in the simulation and observed reported time in Indiana.
The results are then printed and stored.
Limitations
people_test.csvpopulation means that forward projections don't mean much for generalized results. This is resolvable by replacing the file with a larger population