Recombinants vs case numbers #407

spotto · 2025-04-13T01:43:30Z

spotto
Apr 13, 2025

I explored the timing of recombination events (929 events by date_added). The recombinants (red) generally occur before the major rise in global cases (blue):

The shift to earlier time points is easier to see on a cumulative plot:

Squaring the number of cases doesn't help the fit (purple), and assuming that the first case happened earlier makes it worse (not shown). [The maximum likelihood fit is actually to cases^0.6, i.e., to a slightly more uniform distribution than the cases themselves.]

The peak in cases (red) happens during the Delta wave, which was well known to be undercounted, with massive case numbers in India unreported. Indeed, looking at the deaths (black), paints an entirely different picture, with recombinants now after the majority of deaths.

Death data is problematic though -- deaths per case rose with Alpha and Delta and then fell with Omicron and vaccines, making it a very imperfect measure of cases.

I'm thus unsure how much weight to put on any of this, lacking solid case data.

One remaining possibility that might be worth exploring is comparing the timing of recombinants with diversity (e.g., phylogenetic diversity among the sequences added each day). This is less about when we expect recombination to happen, and more about the ability to detect those recombinants happening.

jeromekelleher · 2025-04-14T08:30:50Z

jeromekelleher
Apr 14, 2025
Maintainer

Interesting... I wonder if it's worth trying this again with a more trustworthy set of recombinants? I think there's a large set of correlated recombinants around the same time which are very likely due to bioinformatics issues, and given the numbers are quite small these could be skewing the signal significantly. I think a mixture of max_run_length and averted_mutations is the best way to get at a reliable set, but I don't really know what values to suggest.

0 replies

spotto · 2025-04-15T16:02:41Z

spotto
Apr 15, 2025
Author

A somewhat busy plot showing recombinants over time, shaded by run length (red is any max_run_length; purple is max_run_length = 0). The recombinants are tabulated in weekly numbers (max = 44 in a week), and the cases in blue are scaled to this same height.

0 replies

jeromekelleher · 2025-04-22T12:45:10Z

jeromekelleher
Apr 22, 2025
Maintainer

Filtering out the recombinants where the breakpoint is near large chunks of missing data and plotting this next to the diversity gives a slightly different perspective I think. Here's what we get when we focus on the subset of 576 recombinants where break_near_missing_run_12_7 is False:

I've not done any stats here, but it looks to me like the peak in recombinants detected occurs roughly where BA.1 and BA.2 are maximally coexisting?

In contrast, here's what we get when we look at the whole thing:

The data for the sample composition is here. The columns of interest are date, scorpio and total (the total number of samples processed).

2 replies

jeromekelleher Apr 22, 2025
Maintainer

Zooming in on the interesting bit and showing the sample composition as a fraction, we get this

This looks like the peak occurs pretty much exactly when BA.1 and BA.2 are 50/50?

hyanwong Apr 24, 2025
Collaborator

This is nice, but I have also identified a few other probably spurious patterns that together might account for another hundred or so recombinants, e.g. see #420. It would be good to remove these (somehow?) before jumping to too many conclusions.

These dodgy recombinants tend to be clustered in time, so would have an effect on plots such as the one above.

jeromekelleher · 2025-04-28T10:17:31Z

jeromekelleher
Apr 28, 2025
Maintainer

I've updated the plots in #423, and @hyanwong was right about the nice pleak at BA.1 and BA.2 being artefactual. Here's the updated plot for all(ish) 386 high quality recombinants:

Here's the follow-up zoomed in and with the relative fractions of the Scorpio lineages:

11 replies

szhan May 3, 2025
Collaborator

Maybe we should choose a slightly custom palette, if there aren't enough colors like in the first plot.

spotto May 3, 2025
Author

But we should standardize the colours used for the variants throughout, right? We could use the CoVariant palette (https://covariants.org/variants) or the nextstrain palette. I've attached the one that we use in Duotang, which has more distinctive colours for adjacent variants, which I'll use for now.
vocvoi.csv

spotto May 3, 2025
Author

Plots using this colour palette: vocvoi.csv.

Scaled to cases:

Scaled to one:

Scaled to sample size by date (like old figure, except I tabulated by week -- above might be smoothed to the month?):

The y-axis for recombinant events isn't added yet.

jeromekelleher May 6, 2025
Maintainer

Following up on this one @spotto:

In daily_stats.csv, I wanted to confirm that the actual number of a lineage observed on a particular day is "exact_matches" (not "total", which includes inferred nodes)

The total field here is the total number of sequences that passed QC (i.e, > missingness and deletions thresholds) which we ran the HMM for (so, total number of HMM matches performed). The exact_matches field is the number of these which were exact matches (i.e., required no mutations or recombinations) of an existing node in the ARG.

So, the total here would be quite a rough proxy for overall prevalence. We can easily add the actual total number of sequences from Viridian if that would be useful (but, I think the case numbers are better, right?).

jeromekelleher May 6, 2025
Maintainer

Also, the latest version of the "high quality setis based on thenet_min_supporting_loci_lft_rgt_ge_4`` field in the CSV. The rows in which this is "True" correspond to what we think are pretty solid recombinants.

spotto · 2025-05-19T19:17:16Z

spotto
May 19, 2025
Author

The graphs have now been updated in the manuscript and supporting text added, both to the results and to the Star Methods.

0 replies

spotto · 2025-05-21T14:04:51Z

spotto
May 21, 2025
Author

@hyanwong - Here is the regression against het*cases:

for comparison to the current figure:

Let's wait until you read it over to decide about adding a third regression figure or not. I did add that the p-values were also assessed by randomly permuting the predictor, to avoid concerns about outliers.

@jeromekelleher - let me know when you have a minute to add the estimated date of the recombination event to the csv, and I'll rerun.

Thanks!

5 replies

hyanwong May 21, 2025
Collaborator

Thanks. I agree about waiting. It's not terribly different really. Only worth adding something if it's helpful to back up the conclusions made in the text. I think it looks a bit more robust than the other two, but only marginally.

jeromekelleher May 21, 2025
Maintainer

@spotto - the estimated dates are in the CSV now, I think Shing sent an email about this?

spotto May 21, 2025
Author

Sorry. My email and I are having a trial separation; it's not going very well.

I reran all of the analyses with "date_tsdate". There are slight shifts, but no substantive change to the figure or stats. I think that the tsdate is the better one to report, right? If I don't hear any objections, I'll swap things out.

jeromekelleher May 21, 2025
Maintainer

Yep, agree tsdate estimate is the better thing to report. Great to hear not much changed, as expected

spotto May 21, 2025
Author

Figures and text updated with tsdate!

Recombinants vs case numbers #407

Uh oh!

Uh oh!

spotto Apr 13, 2025

Replies: 6 comments · 18 replies

Uh oh!

jeromekelleher Apr 14, 2025 Maintainer

Uh oh!

spotto Apr 15, 2025 Author

Uh oh!

jeromekelleher Apr 22, 2025 Maintainer

Uh oh!

Uh oh!

jeromekelleher Apr 22, 2025 Maintainer

Uh oh!

hyanwong Apr 24, 2025 Collaborator

Uh oh!

jeromekelleher Apr 28, 2025 Maintainer

Uh oh!

szhan May 3, 2025 Collaborator

Uh oh!

Uh oh!

spotto May 3, 2025 Author

Uh oh!

Uh oh!

spotto May 3, 2025 Author

Uh oh!

jeromekelleher May 6, 2025 Maintainer

Uh oh!

jeromekelleher May 6, 2025 Maintainer

Uh oh!

spotto May 19, 2025 Author

Uh oh!

spotto May 21, 2025 Author

Uh oh!

Uh oh!

hyanwong May 21, 2025 Collaborator

Uh oh!

jeromekelleher May 21, 2025 Maintainer

Uh oh!

spotto May 21, 2025 Author

Uh oh!

jeromekelleher May 21, 2025 Maintainer

Uh oh!

spotto May 21, 2025 Author

spotto
Apr 13, 2025

Replies: 6 comments 18 replies

jeromekelleher
Apr 14, 2025
Maintainer

spotto
Apr 15, 2025
Author

jeromekelleher
Apr 22, 2025
Maintainer

jeromekelleher Apr 22, 2025
Maintainer

hyanwong Apr 24, 2025
Collaborator

jeromekelleher
Apr 28, 2025
Maintainer

szhan May 3, 2025
Collaborator

spotto May 3, 2025
Author

spotto May 3, 2025
Author

jeromekelleher May 6, 2025
Maintainer

jeromekelleher May 6, 2025
Maintainer

spotto
May 19, 2025
Author

spotto
May 21, 2025
Author

hyanwong May 21, 2025
Collaborator

jeromekelleher May 21, 2025
Maintainer

spotto May 21, 2025
Author

jeromekelleher May 21, 2025
Maintainer

spotto May 21, 2025
Author