Skip to content

Calibration routine to first observed death#39

Open
KOVALW wants to merge 28 commits intopgv5_clinical_statusfrom
wtk-phase1-calibrate
Open

Calibration routine to first observed death#39
KOVALW wants to merge 28 commits intopgv5_clinical_statusfrom
wtk-phase1-calibrate

Conversation

@KOVALW
Copy link
Collaborator

@KOVALW KOVALW commented Mar 6, 2026

Calibration routine overview

The script phase_1_calibration.py runs an ABC-SMC routine with 6 steps.

The distance measure is the time difference in days between the first observed death in the simulation and the first reported death in Indiana, with time t=0 representing January 1, 2020. The six step sizes were [30, 20, 10, 5, 2, 0], such that the last SMC step only filtered for simulations that recaptured exactly the same day of first observed death as that reported in the data.

The parameters being fit are the symptomatic reporting probability, the $\alpha$ density-dependent scaling factor for Home settings, the probability of developing symptoms, and the rate of infection attempt generation while individuals are infectious.

The priors are as follows:

  • symptomatic_reporting_prob ~ Beta(5, 5)
  • settings_properties.Home.alpha ~ Uniform(0, 1)
  • probability_mild_given_infect ~ Beta(7, 3)
  • infectiousness_rate_fn.Constant.rate ~ Uniform(0.1, 2.0)

These priors are illustrative, but are based on loosely held prior belief that the probability of reporting imported symptomatic infections was approximately 0.5 during this stage of the pandemic, and that approximately 70% of individuals would go on to develop at least mild symptoms.

The perturbation kernel variance was adapted on each SMC step according to the previous step's population variance.

Code description

In order to calibrate the transmission model to the day of first observed death, this PR introduces

  1. An MRP runner for the model
  2. A connection between case importation and the transmission model
  3. A script to generate the results object

MRP runner of importation and transmission

The MRP runner first uses the parameters specified for the ixa model to generate those required for the importation module. Namely, this is to calculate the overall case fatality ratio from clinical status transition probabilities, the asymptomatic ratio from the probability of developing symptoms, and introduces a new parameter for surveillance reporting.

The model then assumes categorical values (e.g. Indiana for the state model and 2020 as the year). These may be moved to a model_selection category in the inputs dictionary to generalize the CovidModel.simulate() function.

The parameters are dumped to a json file and read by the ixa transmission model, with only incidence counts being returned. This may again be generalized by supplying user arguments to the config inputs section of the model inputs.

Calibration routine

The routine can be run using uv run python scripts/phase_1_calibration.py

The calibration routine uses tools from the cfa-calibration-tools repo on the wtk-dev branch. The source code is branched off of the pgv5_clinical_status branch of ixa-epi-covid.

The priors are read form a JSON file and supplied to the ABCSampler. The future default perturbation kernel (multivariate normal) and variance adapter are applied.

The outputs to distance function is generated inside the script as well, here calculating the difference between the observed time of the first death in the simulation and observed reported time in Indiana.

The results are then printed and stored.

Limitations

  • Using the people_test.csv population means that forward projections don't mean much for generalized results. This is resolvable by replacing the file with a larger population
  • Death is only a symptom status, therefore dead people continue their plans, such as forecasted infections, which would otherwise not occur. The timescale to first death means that this is not an issue for the calibration portion of the routine, but projecting would be unreliable.
  • Importation data sourced from Perkins et al. 2020 only go up to March 12th 2020, but we are calibrating to the first reported death that occurred on March 16th. Importations that go at least 4 more days would be more reliable

@KOVALW KOVALW linked an issue Mar 6, 2026 that may be closed by this pull request
4 tasks
@KOVALW KOVALW marked this pull request as ready for review March 6, 2026 20:46
@eqmooring eqmooring force-pushed the pgv5_clinical_status branch from abea006 to 1be24fd Compare March 9, 2026 18:12
Copy link
Collaborator

@erik-rosenstrom erik-rosenstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its looking good. The framework is fairly easy to follow. Documentation will also be helpful.

Then, to render the analysis report, ensure that `tinytex` is installed with

```
quarto install tinyext
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
quarto install tinyext
quarto install tinytex

)
```

Histograms of posterior samples compared to the probability density of each prior. To generate the histogram, we sample $n$ particles from the posterior population, where $n$ is the effective sample size of the population, in order to reflect the weight distribution of the posterior. For each plot, the prior is plotted as a blue line and the histogram of the posterior samples is plotted in orange with a kernel density estimator overlay.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it might be helpful to define what the weight distribution is here.

"seed": 1234,
"max_time": 100.0,
"synth_population_file": "input/people_test.csv",
"symptomatic_reporting_prob": 0.5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there any discussion on separating parameters more clearly by what part of the pipeline they belong to? It was unclear what symptomatic_reporting_prob was since its with all ixa-epi-covid model specific parameters. Only in the covid_model.rs does it become clear this is the single importation model parameter that is not tied to an ixa model parameter that is named something else. I think if the parameters are to be kept together this needs to be clarified in documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've landed some changes to wtk-dev that address this with a far cleaner particle reader. Parameters can now be specified in any structure, so long as they exist in the default values dictionary. To show this, I moved the reporting probability to the importation params, which I agree makes more sense while we have no surveillance in the transmission model

#| echo: false
posterior_samples = results.sample_posterior_particles(n=int(results.ess))

for param in results.fitted_params:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types of plots seem like they would be of interest every time the model is calibrated. Is it possible to implement a method on CalibrationResults to produce these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't landed CalibrationResults as a feature on main in calibrationtools, so I'm going to keep this longer function here for now to avoid over-complicating the dev branch with non-approved API. I've included an un-evaluated preferred API code block in this report though to be clear that this is not a final document and instead reflects the current API.

## Obtaining the importation time series and death incidence data frmaes for a random smaple form the posterior
# Re-generating a random sample of parameter sets from posterior
particle_count = 100
particles = results.sample_posterior_particles(n=particle_count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar point to the one above. This seems like something you will do with every calibrated model and is a substantial amount of code to produce for every model report. Is this something that can be moved into a CalibrationResults method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like re-run particles is valuable and could be reasonable to include on the CalibrationResults methods. I was also over-complicating what needed to be done because the cleaner approach is to modify the return method of covid_model. It now can grab whichever file reports you want in a dictionary and the for loop is easier to read

plt.title(f"Imported infections over time ({particle_count} posterior samples)")
plt.xlabel("Time (days since simulation start)")
plt.ylabel("Number of imported infections (daily)")
plt.tight_layout()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plot needs a legend

)
```

For a final SMC step with threshold toelrance above zero, we will observe some variance in the date of first reported death. Here we show a sampled histogram of the timing of the first death across the posterior simulations. The vertical dashed line indicates the observed timing of the first death in Indiana (March 16, 2020, or 75 days after the start of the simulation on January 1, 2020). We can see that while although the distribution is centered around the observed timing, there is a higher weight on earlier first reported deaths, indicating that imported infections occur faster in the simulated model than they occured in real life. This may be due to the proprotional sampling of Indiana importations from the domestic level imported infections, when in fact they may have been less probable due to mobility flow patterns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the importations proportion to Indiana (population = 6.7 million in 2020) when people test has a population of 1000? We should probably leave direct interpretation of results/limitations till that is fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true, and considering the long run time of modeling all of indiana, maybe we should switch to a slightly larger custom synthetic population, and ensure that the importations are proportional to its size. Once we can run on azure we can switch to the full synthetic population

Finally, we can plot the same trajectories for number of infections observed in the model over time, which includes both imported infections and locally acquired infections. This figure shows the issue with calibratin the model with a small population size (1,000 people). The susceptible pool is rapidly depleted by local transmission, and a high proportion of the population is already recovered by the timepoint of evaluation comparing the model results to the first reported death in the data.

```{python}
#| echo: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this plot can be rotated? It's hard to examine now at its current size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run model projections given first death

3 participants