-
Notifications
You must be signed in to change notification settings - Fork 53
[Draft]: Add blog post about norovirus resources #1287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
2b9c219 to
3958050
Compare
|
|
||
| ## Norovirus groups, types, and variants | ||
|
|
||
| Norovirus samples have a duel-typing system based on a polymerase region (RdRp) and capsid region (VP1) of the genome, between which is a known recombination site. The resolution of norovirus typing has undergone multiple changes ([Zheng et al., 2006](https://doi.org/10.1016/j.virol.2005.11.015); [Eden et al., 2013](https://doi.org/10.1128/jvi.03464-12); [Chhabra et al., 2019](https://doi.org/10.1099/jgv.0.001318); [Tatusov et al., 2020](https://doi.org/10.1016/j.jcv.2020.104718)), but generally are split into a "genogroup", "genotype", and "variant" classification for VP1 (and "P-group", "P-type", and "variant" for RdRp). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time I come back to norovirus I get confused by the different levels of genotypes 😅
I can understand how "genogroup" maps to the VP1 group (Nextclade) coloring, "genotype" maps to the VP1 type (Nextclade) coloring, and "variant" maps to the VP1 variant (Nextclade) coloring. However, I'm lost as to how this maps to the default Vp1 Genotype (Nextclade) coloring. Could you explain that additional coloring here?
Also, it'd be helpful to keep terms consistent in the text/dataset/screenshots, either with or without the "geno" prefix, so that it's clear they refer to the same classifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I'm lost as to how this maps to the default Vp1 Genotype (Nextclade) coloring.
Long term, I was hoping “Genotype” would eventually contain the most specific (variant, type, group) classification level with sufficient confidence of assignment (I’m not sure how to do this yet). I can imagine cases where a norovirus sample belongs to a group category but does not fit into any of the type or variant categories. Or into a type category, but not into the more specific variant categories.
But perhaps that is too confusing right now...this question also came up in a slack discussion.
Shall I drop the "VP1 Genotype (Nextclade) coloring" and ditto for "RdRp Genotype (Nextclade) coloring" since the key information should be contained in group, type, and variant colorings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I drop the "VP1 Genotype (Nextclade) coloring" and ditto for "RdRp Genotype (Nextclade) coloring" since the key information should be contained in group, type, and variant colorings?
+1 for this.
I get confused every time I look at all of the different coloring options in norovirus. The "VP1 Genotype (Nextclade) coloring" can be added in the future if we implement the most specific classification assignment.
|  | ||
| **Figure 2. Typing of norovirus samples is based on the VP1 and RdRp region** and are further split out into group, type, and variant resolution. | ||
|
|
||
| We provide 2 Nextclade datasets (either VP1 or RdRp) which each provide group, type, and variant levels of resolution for clade coloring. The Norovirus nextclade datasets are preliminary and can use further development depending on future funding or priorities. Scaffold strains for the norovirus lineage systems are consistently updated at https://calicivirustypingtool.cdc.gov/becerance.cgi and these Nextclade datasets were built with the version available on September 16, 2025. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was surprised to see the mention of the Nextclade datasets since they are not officially available in nextclade_data yet. What's blocking on publishing the official Nextclade datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main blocker was that the Norovirus Nextclade dataset had too many false-positive GII.4 results for my comfort. nextstrain/norovirus#27 Therefore I was presenting these Norovirus Nextclade datasets as preliminary for further refinement.
But perhaps I'm being too cautious here? I don't mind opening a PR to Nextclade datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But perhaps I'm being too cautious here? I don't mind opening a PR to Nextclade datasets
My thoughts here: We are already using these unpublished Nextclade datasets in our ingest workflow and pushing up the results in our production S3 files + builds so we are essentially publishing those results. I'd rather publish the Nextclade dataset and be transparent about the potential false-positives in the dataset README + build descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can see that. From my perspective, when I’m looking at a phylogenetic tree, the coloring makes misclassifications pretty obvious because they stand out as being in the wrong part of the tree. My concern is more about people using the nextclade-cli norovirus dataset in automated workflows, where they may never look at the trees directly. In that context, false positives are more likely to propagate downstream without being noticed.
That said, I can add an explicit warning message in the README and description.md about GII.4 Sidney false positives and how this Nextclade dataset is a preliminary draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I'm fine with adding explicit warnings in about the false positives in the README and description. In the blog post itself, would be good to add a brief description of why this Nextclade dataset is still a preliminary draft.
Hah, fix wrong word Co-authored-by: Kim Andrews <[email protected]>
Co-authored-by: Kim Andrews <[email protected]>
Description of proposed changes
Add a blog post about https://nextstrain.org/norovirus
Preview here
Related issue(s)
Checklist