Calibration routine to first observed death by KOVALW · Pull Request #39 · CDCgov/ixa-epi-covid

KOVALW · 2026-03-06T19:24:21Z

Calibration routine overview

The script phase_1_calibration.py runs an ABC-SMC routine with 6 steps.

The distance measure is the time difference in days between the first observed death in the simulation and the first reported death in Indiana, with time t=0 representing January 1, 2020. The six step sizes were [30, 20, 10, 5, 2, 0], such that the last SMC step only filtered for simulations that recaptured exactly the same day of first observed death as that reported in the data.

The parameters being fit are the symptomatic reporting probability, the $\alpha$ density-dependent scaling factor for Home settings, the probability of developing symptoms, and the rate of infection attempt generation while individuals are infectious.

The priors are as follows:

symptomatic_reporting_prob ~ Beta(5, 5)
settings_properties.Home.alpha ~ Uniform(0, 1)
probability_mild_given_infect ~ Beta(7, 3)
infectiousness_rate_fn.Constant.rate ~ Uniform(0.1, 2.0)

These priors are illustrative, but are based on loosely held prior belief that the probability of reporting imported symptomatic infections was approximately 0.5 during this stage of the pandemic, and that approximately 70% of individuals would go on to develop at least mild symptoms.

The perturbation kernel variance was adapted on each SMC step according to the previous step's population variance.

Code description

In order to calibrate the transmission model to the day of first observed death, this PR introduces

An MRP runner for the model
A connection between case importation and the transmission model
A script to generate the results object

MRP runner of importation and transmission

The MRP runner first uses the parameters specified for the ixa model to generate those required for the importation module. Namely, this is to calculate the overall case fatality ratio from clinical status transition probabilities, the asymptomatic ratio from the probability of developing symptoms, and introduces a new parameter for surveillance reporting.

The model then assumes categorical values (e.g. Indiana for the state model and 2020 as the year). These may be moved to a model_selection category in the inputs dictionary to generalize the CovidModel.simulate() function.

The parameters are dumped to a json file and read by the ixa transmission model, with only incidence counts being returned. This may again be generalized by supplying user arguments to the config inputs section of the model inputs.

Calibration routine

The routine can be run using uv run python scripts/phase_1_calibration.py

The calibration routine uses tools from the cfa-calibration-tools repo on the wtk-dev branch. The source code is branched off of the pgv5_clinical_status branch of ixa-epi-covid.

The priors are read form a JSON file and supplied to the ABCSampler. The future default perturbation kernel (multivariate normal) and variance adapter are applied.

The outputs to distance function is generated inside the script as well, here calculating the difference between the observed time of the first death in the simulation and observed reported time in Indiana.

The results are then printed and stored.

Limitations

Using the people_test.csv population means that forward projections don't mean much for generalized results. This is resolvable by replacing the file with a larger population
Death is only a symptom status, therefore dead people continue their plans, such as forecasted infections, which would otherwise not occur. The timescale to first death means that this is not an issue for the calibration portion of the routine, but projecting would be unreliable.
Importation data sourced from Perkins et al. 2020 only go up to March 12th 2020, but we are calibrating to the first reported death that occurred on March 16th. Importations that go at least 4 more days would be more reliable

erik-rosenstrom

I think its looking good. The framework is fairly easy to follow. Documentation will also be helpful.

erik-rosenstrom · 2026-03-10T17:33:04Z

experiments/phase1/README.md

+Then, to render the analysis report, ensure that `tinytex` is installed with
+
+```
+quarto install tinyext


Suggested change

quarto install tinyext

quarto install tinytex

erik-rosenstrom · 2026-03-10T17:46:56Z

experiments/phase1/reports/calibration.qmd

+)
+```
+
+Histograms of posterior samples compared to the probability density of each prior. To generate the histogram, we sample $n$ particles from the posterior population, where $n$ is the effective sample size of the population, in order to reflect the weight distribution of the posterior. For each plot, the prior is plotted as a blue line and the histogram of the posterior samples is plotted in orange with a kernel density estimator overlay.


Nit: it might be helpful to define what the weight distribution is here.

erik-rosenstrom · 2026-03-10T18:20:58Z

experiments/phase1/input/default_params.json

+        "seed": 1234,
+        "max_time": 100.0,
+        "synth_population_file": "input/people_test.csv",
+        "symptomatic_reporting_prob": 0.5,


Was there any discussion on separating parameters more clearly by what part of the pipeline they belong to? It was unclear what symptomatic_reporting_prob was since its with all ixa-epi-covid model specific parameters. Only in the covid_model.rs does it become clear this is the single importation model parameter that is not tied to an ixa model parameter that is named something else. I think if the parameters are to be kept together this needs to be clarified in documentation

I've landed some changes to wtk-dev that address this with a far cleaner particle reader. Parameters can now be specified in any structure, so long as they exist in the default values dictionary. To show this, I moved the reporting probability to the importation params, which I agree makes more sense while we have no surveillance in the transmission model

erik-rosenstrom · 2026-03-10T18:35:55Z

experiments/phase1/reports/calibration.qmd

+#| echo: false
+posterior_samples = results.sample_posterior_particles(n=int(results.ess))
+
+for param in results.fitted_params:


These types of plots seem like they would be of interest every time the model is calibrated. Is it possible to implement a method on CalibrationResults to produce these?

We haven't landed CalibrationResults as a feature on main in calibrationtools, so I'm going to keep this longer function here for now to avoid over-complicating the dev branch with non-approved API. I've included an un-evaluated preferred API code block in this report though to be clear that this is not a final document and instead reflects the current API.

erik-rosenstrom · 2026-03-10T18:37:45Z

experiments/phase1/reports/calibration.qmd

+## Obtaining the importation time series and death incidence data frmaes for a random smaple form the posterior
+# Re-generating a random sample of parameter sets from posterior
+particle_count = 100
+particles = results.sample_posterior_particles(n=particle_count)


A similar point to the one above. This seems like something you will do with every calibrated model and is a substantial amount of code to produce for every model report. Is this something that can be moved into a CalibrationResults method

Something like re-run particles is valuable and could be reasonable to include on the CalibrationResults methods. I was also over-complicating what needed to be done because the cleaner approach is to modify the return method of covid_model. It now can grab whichever file reports you want in a dictionary and the for loop is easier to read

erik-rosenstrom · 2026-03-10T18:43:02Z

experiments/phase1/reports/calibration.qmd

+plt.title(f"Imported infections over time ({particle_count} posterior samples)")
+plt.xlabel("Time (days since simulation start)")
+plt.ylabel("Number of imported infections (daily)")
+plt.tight_layout()


This plot needs a legend

erik-rosenstrom · 2026-03-10T18:56:40Z

experiments/phase1/reports/calibration.qmd

+)
+```
+
+For a final SMC step with threshold toelrance above zero, we will observe some variance in the date of first reported death. Here we show a sampled histogram of the timing of the first death across the posterior simulations. The vertical dashed line indicates the observed timing of the first death in Indiana (March 16, 2020, or 75 days after the start of the simulation on January 1, 2020). We can see that while although the distribution is centered around the observed timing, there is a higher weight on earlier first reported deaths, indicating that imported infections occur faster in the simulated model than they occured in real life. This may be due to the proprotional sampling of Indiana importations from the domestic level imported infections, when in fact they may have been less probable due to mobility flow patterns.


Are the importations proportion to Indiana (population = 6.7 million in 2020) when people test has a population of 1000? We should probably leave direct interpretation of results/limitations till that is fixed

If this is true, and considering the long run time of modeling all of indiana, maybe we should switch to a slightly larger custom synthetic population, and ensure that the importations are proportional to its size. Once we can run on azure we can switch to the full synthetic population

erik-rosenstrom · 2026-03-10T19:51:34Z

experiments/phase1/reports/calibration.qmd

+Finally, we can plot the same trajectories for number of infections observed in the model over time, which includes both imported infections and locally acquired infections. This figure shows the issue with calibratin the model with a small population size (1,000 people). The susceptible pool is rapidly depleted by local transmission, and a high proportion of the population is already recovered by the timepoint of evaluation comparing the model results to the first reported death in the data.
+
+```{python}
+#| echo: false


Can this plot can be rotated? It's hard to examine now at its current size.

…a inputs

eqmooring and others added 6 commits February 28, 2026 08:05

initial commit symptom status manager

71100f7

revise symptom manager and make reports

ed06e88

modify prevalence report to show combos

dad1fc0

revise tests for report generation

2312e54

fix additional test and pre-commit issues

9a5233a

working calibration routine

1b2dd4a

KOVALW linked an issue Mar 6, 2026 that may be closed by this pull request

Run model projections given first death #22

Open

4 tasks

KOVALW added 3 commits March 6, 2026 20:01

repair zero probability nan occurence importation

f1bbf68

precommit

2d0a7c6

Store results and print select diagnostics

6c7a5af

KOVALW added this to the Phase 1: No cases reported or just a few milestone Mar 6, 2026

KOVALW added 2 commits March 6, 2026 20:15

update gitignore

96b88c0

updated printing

28a7000

KOVALW marked this pull request as ready for review March 6, 2026 20:46

KOVALW added 2 commits March 9, 2026 14:24

report and minor importation fix

c56240b

remove unnecessary run file and create visualizations

c30556f

eqmooring force-pushed the pgv5_clinical_status branch from abea006 to 1be24fd Compare March 9, 2026 18:12

KOVALW added 3 commits March 9, 2026 18:46

add prior distribution figs to plot calibration script

1b5f0ca

Port plotting from script into qmd report

c1fcf78

added readme for recreating results

2d4802c

KOVALW requested review from confunguido, eqmooring and erik-rosenstrom March 10, 2026 14:27

KOVALW added 3 commits March 10, 2026 15:04

update importation timeseries plot

d335275

clean up param for loop

788b45c

updated qmd with a preferred api

1a8d1b5

erik-rosenstrom requested changes Mar 10, 2026

View reviewed changes

KOVALW added 3 commits March 10, 2026 20:57

readme typo

a376732

rebase calibrationtools and remove symptomatic_reporting_prob from ix…

4b594a9

…a inputs

precommit

5bfd950

KOVALW added 6 commits March 12, 2026 18:58

override population

b37ad1f

reduce run time workarounds

fe6ec72

update model output return

4ee8307

fix test string for report

542da1a

comma missing

158208c

figure updates

96737c9

Conversation

KOVALW commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Calibration routine overview

Code description

MRP runner of importation and transmission

Calibration routine

Limitations

Uh oh!

erik-rosenstrom left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KOVALW commented Mar 6, 2026 •

edited

Loading