Skip to content

Updates to recombinants_summary dataframe#508

Merged
jeromekelleher merged 2 commits intomainfrom
new-recombinants-summary
Jun 22, 2025
Merged

Updates to recombinants_summary dataframe#508
jeromekelleher merged 2 commits intomainfrom
new-recombinants-summary

Conversation

@jeromekelleher
Copy link
Member

No description provided.

Used to be arbitrary, now is one of the samples that has be fewest
mutations in its hmm copying path.
@jeromekelleher
Copy link
Member Author

Two significant changes here @szhan and @hyanwong

Changes the choice of representative sample for a recombinant to one of those that have the minimum number of mutations in the hmm match. Before this was done arbitrarily, and this may slightly change some results for the X lineages. We'll need to check Shing's hand-made figure carefully against the final data to make sure that no errors creeped in here.

The second change is to be more systematic about the list of metadata fields included for each node of interest. We now have (e.g.):

num_descendant_samples                        2 
num_samples                                   2
distinct_sample_pango                         1                                                                       
interval_left                              3788                                                                       
interval_right                            11083                                                                       
num_mutations                                 0                                                                       
Viridian_amplicon_scheme                Unknown                                                                        
Artic_primer_version                    Unknown                                                                       
date_added                           2020-02-15                                                                       
group_id                             fc5a70591c                                                                        
group_size                                    3                                                                        
recombinant                                  55                                                                       
recombinant_pango                       Unknown            
recombinant_scorpio                     Unknown                                                                       
recombinant_time                       0.000001                                                                        
recombinant_date            2020-02-15 00:00:00                                                                        
sample                                       53                                                                        
sample_pango                            Unknown            
sample_scorpio                          Unknown                                                                       
sample_time                                 0.0                                                                       
sample_date                 2020-02-15 00:00:00                                                                       
parent_left                                  31            
parent_left_pango                             B                                                                       
parent_left_scorpio                           .                                                                       
parent_left_time                           11.0                                                                        
parent_left_date            2020-02-04 00:00:00                                                                       
parent_right                                 46                                                                       
parent_right_pango                      Unknown                                                                       
parent_right_scorpio                    Unknown                                                                       
parent_right_time                         14.25                                                                                                                                                                                               
parent_right_date           2020-02-01 00:00:00                                                                                                                                                                                               
parent_mrca                                   1                                                                                                                                                                                               
parent_mrca_pango                       Unknown                                                                       
parent_mrca_scorpio                     Unknown                                                                                                                                                                                               
parent_mrca_time                           51.0                                                                                                                                                                                               
parent_mrca_date            2019-12-26 00:00:00                                                         

This is following most of the points raised in this issue: jeromekelleher/sc2ts-paper#468

Note: I've changed the "_time_tsdate" and "_date_tsdate" fields to just "_time" and "_date" as this more of a function of the tree sequence you run the function on than anything to do with tsdate.

Hopefully this is all the metadata we need in the first pass, and we can then proceed with adding more columns later with results from HMM reruns and pangonet?

@jeromekelleher jeromekelleher merged commit a1fa4b2 into main Jun 22, 2025
3 checks passed
@jeromekelleher jeromekelleher deleted the new-recombinants-summary branch June 22, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet