Maximum likelihood estimator of BD, BD-CT(1) and BDSKY model parameters from phylogenetic trees and a non-parametric CT detection test.
Anna Zhukova, Olivier Gascuel (2025) Accounting for contact tracing in epidemiological birth-death models. PLOS Comput Biol 21(5): e1012461; doi:10.1371/journal.pcbi.1012461
The birth-death (BD) model with incomplete sampling was introduced by Stadler [J Theor Biol 2009]. Under this model, infected individuals can transmit their pathogen with a constant rate λ, get removed (become non-infectious) with a constant rate ψ, and their pathogen can be sampled upon removal with a constant probability ρ.
The BD model therefore has 3 parameters:
- λ -- transmission rate
- ψ -- removal rate
- ρ -- sampling probability upon removal.
These parameters can be expressed in terms of the following epidemiological parameters:
- R0=λ/ψ -- reproduction number
- 1/ψ -- infectious time.
For identifiability, the BD model requires one of its parameters (λ, ψ, ρ) to be fixed.
The BD-CT(1) model extends the classical BD model (see above) by adding contact tracing (CT): At the moment of sampling the sampled individual might notify their most recent contact with a constant probability υ. Upon notification, the contact is removed almost instantaneously and their pathogen is systematically sampled upon removal (modeled via a constant notified sampling rate φ >> ψ).
The BD-CT(1) model therefore has 5 parameters:
- λ -- transmission rate
- ψ -- removal rate
- ρ -- sampling probability upon removal
- υ -- probability to notify contacts upon sampling
- φ -- notified contact removal and sampling rate: φ >> ψi ∀i (1 ≤ i ≤ m). The pathogen of a notified contact is sampled automatically (with a probability of 1) upon removal.
These parameters can be expressed in terms of the following epidemiological parameters:
- R0=λ/ψ -- reproduction number
- 1/ψ -- infectious time
- 1/φ -- notified contact removal time
The BD-CT(1) model makes 3 assumptions:
- only observed individuals can notify (instead of any removed individual);
- notified contacts are always observed upon removal;
- only the most recent contact can get notified.
For identifiability, BD-CT(1) model requires one of the three BD model parameters (λ, ψ, ρ) to be fixed.
The BD Skyline (BDSKY) model was introduced by Skyline was introduced by Stadler et al. [PNAS 2013]. It extends the classical BD model (see above) with piece-wise constant parameter value changes.
A BDSKY model with k intervals has 3k + (k-1) parameters:
- λ1, ..., λk -- k transmission rates, one per skyline interval
- ψ1, ..., ψk -- k removal rates, one per skyline interval
- ρ1, ..., ρk -- k sampling probabilities upon removal, one per skyline interval
- t1, ..., tk-1 -- k-1 skyline interval changing times, where 0 < t1 < ... < tk-1 < T, where T is the end of the sampling period.
For identifiability, the BDSKY model requires one of the three BD model parameters (λi, ψi, ρi) to be fixed for each time interval i (0 ≤ i ≤ k).
The bdct package provides a classical BD, BD-CT(1), and BDSKY model maximum-likelihood parameter estimators from a user-supplied time-scaled phylogenetic tree (or a forest of trees). User must also provide a value for at least one of the three BD model parameters (λ, ψ, or ρ).
In the BDSKY case the values of one of the three BD model parameters need to be provided for all the skyline intervals. On top of that, providing interval changing times is highly recommended (but optional).
For a parameter to be fixed, we recommend providing the sampling probability ρ, which could be approximated as the number of tree tips divided by the number of declared cases for the same time period.
If the user-supplied file contains several trees, their starting times (beginning of the root branches) are by default assumed to be equal (t=0). However, this can be changed by providing starting times for each tree with the --start_times option.
The bdct package provides a non-parametric test detecting presence/absence of contact tracing in a user-supplied time-scaled phylogenetic tree/forest. It outputs a p-value and the number of cherries in the tree.
One needs to supply a time-scaled phylogenetic tree in newick or nexus format, or a collection of trees (one per line in the newick file), which will be treated as all belonging to the same epidemic (i.e., having the same BD/BD-CT(1)/BDSKY model parameters). In the examples below we will use an HIV tree reconstructed from 200 sequences, published in [Rasmussen et al. PLoS Comput. Biol. 2017], which you can find at PairTree GitHub and in hiv_zurich/Zurich.nwk.
There are 4 alternative ways to run bdct on your computer: with docker, apptainer, in Python3, or via command line (requires installation with Python3).
You could either install python (version 3.10 or higher) system-wide and then install bdct via pip:
sudo apt install -y python3 python3-pip python3-setuptools python3-distutils
pip3 install bdctor alternatively, you could install python (version 3.10 or higher) and bdct via conda (make sure that conda is installed first). Here we will create a conda environment called bdctenv:
conda create --name bdctenv python=3.10
conda activate bdctenv
pip install bdctIf you installed bdct in a conda environment (here named bdctenv), do not forget to first activate it, e.g.
conda activate bdctenvRun the following commands to check for the presence of contact tracing and estimate BD-CT(1) model parameters. The first command applies the CT test to a tree Zurich.nwk and saves the CT-test value to the file cherry_test.txt The second command estimated the BD-CT(1) parameters and their 95% CIs for this tree, assuming the sampling probability of 0.25, and saves the estimated parameters to a comma-separated file estimates_bdct.csv. The third command estimated the classical BD parameters and their 95% CIs for this tree, assuming the sampling probability of 0.25, and saves the estimated parameters to a comma-separated file estimates_bd.csv. The fourth command estimated the BDSKY parameters and their 95% CIs for this tree, assuming two time intervals, with a parameter change around the first tip sampling (21.68 years after the root time), a very low sampling probability for the first interval (0.001) and a higher one for the second one (0.25), and saves the estimated parameters to a comma-separated file estimates_bdsky.csv. Note that if the skyline changing times are not provided, they will be estimated. However, it is recommended to provide them when possible.
ct_test --nwk Zurich.nwk --log cherry_test.txt
bdct_infer --nwk Zurich.nwk --ci --p 0.25 --log estimates_bdct.csv
bd_infer --nwk Zurich.nwk --ci --p 0.25 --log estimates_bd.csv
bdsky_infer --nwk Zurich.nwk --ci --p 0.001 0.25 --skyline_times 21.68 --log estimates_bdsky.csvTo see detailed options, run:
ct_test --help
bdct_infer --help
bd_infer --help
bdsky_infer --helpOnce docker is installed, run the following command to estimate BD-CT(1) model parameters:
docker run -v <path_to_the_folder_containing_the_tree>:/data:rw -t evolbioinfo/bdct --nwk /data/Zurich.nwk --ci --p 0.25 --log /data/estimates.csvThis will produce a comma-separated file estimates.csv in the <path_to_the_folder_containing_the_tree> folder, containing the estimated parameter values and their 95% CIs (can be viewed with a text editor, Excel or Libre Office Calc).
To see advanced options, run
docker run -t evolbioinfo/bdct -hOnce apptainer is installed, run the following command to estimate BD-CT(1) model parameters (from the folder where the Zurich.nwk tree is contained):
apptainer run docker://evolbioinfo/bdct --nwk Zurich.nwk --ci --p 0.25 --log estimates.csvThis will produce a comma-separated file estimates.csv, containing the estimated parameter values and their 95% CIs (can be viewed with a text editor, Excel or Libre Office Calc).
To see advanced options, run
apptainer run docker://evolbioinfo/bdct -h