Replies: 11 comments
-
We may ultimately want to allow user to specify a list of skip dates. |
Beta Was this translation helpful? Give feedback.
-
Basically fix jeromekelleher/sc2ts#227 |
Beta Was this translation helpful? Give feedback.
-
It seems like 2020's New Year's Eve is the only egregious collection date, having more than twice the number of samples as other collection dates. Note that I am counting only the samples having full-precision collection dates, ignoring the less precise dates (e.g. 2020 and 2022-09), so there are 3,960,704 samples here. Some actual counts.
|
Beta Was this translation helpful? Give feedback.
-
Excellent - let's just add a skip-date argument to the CLI so and add 2020-12-31 in our runscripts. I guess one thing to follow up on just to be sure would be to do some analysis on the pango lineages from 2021-11-29. Do they all look roughly like what should be sampled on that date? |
Beta Was this translation helpful? Give feedback.
-
I'm comparing the per-Pangolin label counts of samples collected on 2021-11-29 (the focal date) in the Viridian dataset versus those from the cov-lineage website (recorded here). The cov-lineages website keeps track of the earliest and latest dates of each Pangolin lineage (
|
Beta Was this translation helpful? Give feedback.
-
The sample counts are pretty well correlated as well. |
Beta Was this translation helpful? Give feedback.
-
Great - so we're happy that the samples for this date (2021-11-29) look OK, and (by extension) the conclusion that we only need to filter the samples for 2020-12-31? |
Beta Was this translation helpful? Give feedback.
-
I just checked 2021-12-29(the other date with >18k samples) as well, and it looks okay. So, yes, I think we just need to skip 2020-12-31. |
Beta Was this translation helpful? Give feedback.
-
I guess we can close this now that we know there is really only one egregious collection date to skip. |
Beta Was this translation helpful? Give feedback.
-
Yep. |
Beta Was this translation helpful? Give feedback.
-
I moved this here as a conversation as I'm regularly coming back to look at the plot above and it's good to have the data somewhere easy to find |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Before running sc2ts on the entire dataset, we should check whether there are dates with an unusually high number of samples, like the one seen on 2020 New Year Eve (see jeromekelleher/sc2ts#225).
Beta Was this translation helpful? Give feedback.
All reactions