Skip to content

marie3003/effect_population_size_tmrca

Repository files navigation

Impact of molecular signal and population model assumptions on the estimation of node heights in time-resolved phylogenetic trees.

Time-calibrated phylogenetic trees provide critical insights into pathogen evolution, transmission, and epidemiology, but their accuracy depends on both the availability of mutational signal and the choice of coalescent priors. In this study, we investigated how assumptions about population size models affect estimates of node heights in dated phylogenies. We derive analytic posteriors for the two-sequence tMRCA under constant size, exponential growth, exponential decline, and bottleneck models, and quantify prior influence using summary-based metrics such as mean shift, median shift, and mode shift, together with the Wasserstein-1 distance and the coalescent information ratio $\Omega$. By mapping parameter regimes to exemplary pathogens, Influenza with high mutational signal and Mycobacterium tuberculosis with low signal, we show theoretically that prior impact declines with increasing mutational signal and effective population size. Yet, prior influence can be substantial when signal is weak or when population dynamics depart from constancy. Exponential growth can move the posterior to earlier or later times depending on the growth rate, whereas bottlenecks induce non-monotonic effects and secondary posterior peaks. We then simulate trees under constant and exponential growth population models, generate sequences, and re-estimate times on fixed topologies in BEAST using either a constant coalescent prior or a piecewise-constant skyline prior. The skyline prior recovers temporal population size changes and slightly reduces root-height and deep-branch errors under low signal with fast growth, whereas both priors perform similarly when signal is medium to high or when population models are more similar to constant models. Overall, demographic misspecification can bias time estimates in bacteria-like settings with limited signal. When the true demography is uncertain, flexible skyline priors are preferable, while constant priors remain adequate for high-signal viral analyses or large effective population sizes in near constant settings.

Part 1: Theoretical analysis

This notebook contains the code for the first part of this project. The plot_population_size_impact.ipynb notebook summarizes all results obtained in this part. Code can be found in the src folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published