Skip to content

Conversation

@j23414
Copy link
Contributor

@j23414 j23414 commented Dec 15, 2025

Description of proposed changes

Add a blog post about https://nextstrain.org/norovirus

Preview here

Related issue(s)

Checklist

@nextstrain-bot nextstrain-bot temporarily deployed to nextstrain-s-j23414-blo-wq3bec December 15, 2025 20:47 Inactive
@j23414 j23414 force-pushed the j23414/blog-norovirus branch from 2b9c219 to 3958050 Compare December 15, 2025 20:48
@nextstrain-bot nextstrain-bot temporarily deployed to nextstrain-s-j23414-blo-wq3bec December 15, 2025 20:48 Inactive

## Norovirus groups, types, and variants

Norovirus samples have a duel-typing system based on a polymerase region (RdRp) and capsid region (VP1) of the genome, between which is a known recombination site. The resolution of norovirus typing has undergone multiple changes ([Zheng et al., 2006](https://doi.org/10.1016/j.virol.2005.11.015); [Eden et al., 2013](https://doi.org/10.1128/jvi.03464-12); [Chhabra et al., 2019](https://doi.org/10.1099/jgv.0.001318); [Tatusov et al., 2020](https://doi.org/10.1016/j.jcv.2020.104718)), but generally are split into a "genogroup", "genotype", and "variant" classification for VP1 (and "P-group", "P-type", and "variant" for RdRp).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time I come back to norovirus I get confused by the different levels of genotypes 😅

I can understand how "genogroup" maps to the VP1 group (Nextclade) coloring, "genotype" maps to the VP1 type (Nextclade) coloring, and "variant" maps to the VP1 variant (Nextclade) coloring. However, I'm lost as to how this maps to the default Vp1 Genotype (Nextclade) coloring. Could you explain that additional coloring here?

Also, it'd be helpful to keep terms consistent in the text/dataset/screenshots, either with or without the "geno" prefix, so that it's clear they refer to the same classifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm lost as to how this maps to the default Vp1 Genotype (Nextclade) coloring.

Long term, I was hoping “Genotype” would eventually contain the most specific (variant, type, group) classification level with sufficient confidence of assignment (I’m not sure how to do this yet). I can imagine cases where a norovirus sample belongs to a group category but does not fit into any of the type or variant categories. Or into a type category, but not into the more specific variant categories.

But perhaps that is too confusing right now...this question also came up in a slack discussion.

Shall I drop the "VP1 Genotype (Nextclade) coloring" and ditto for "RdRp Genotype (Nextclade) coloring" since the key information should be contained in group, type, and variant colorings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I drop the "VP1 Genotype (Nextclade) coloring" and ditto for "RdRp Genotype (Nextclade) coloring" since the key information should be contained in group, type, and variant colorings?

+1 for this.

I get confused every time I look at all of the different coloring options in norovirus. The "VP1 Genotype (Nextclade) coloring" can be added in the future if we implement the most specific classification assignment.

![Figure 2](/blog/img/norovirus-group-type-variant.png)
**Figure 2. Typing of norovirus samples is based on the VP1 and RdRp region** and are further split out into group, type, and variant resolution.

We provide 2 Nextclade datasets (either VP1 or RdRp) which each provide group, type, and variant levels of resolution for clade coloring. The Norovirus nextclade datasets are preliminary and can use further development depending on future funding or priorities. Scaffold strains for the norovirus lineage systems are consistently updated at https://calicivirustypingtool.cdc.gov/becerance.cgi and these Nextclade datasets were built with the version available on September 16, 2025.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was surprised to see the mention of the Nextclade datasets since they are not officially available in nextclade_data yet. What's blocking on publishing the official Nextclade datasets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main blocker was that the Norovirus Nextclade dataset had too many false-positive GII.4 results for my comfort. nextstrain/norovirus#27 Therefore I was presenting these Norovirus Nextclade datasets as preliminary for further refinement.

But perhaps I'm being too cautious here? I don't mind opening a PR to Nextclade datasets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But perhaps I'm being too cautious here? I don't mind opening a PR to Nextclade datasets

My thoughts here: We are already using these unpublished Nextclade datasets in our ingest workflow and pushing up the results in our production S3 files + builds so we are essentially publishing those results. I'd rather publish the Nextclade dataset and be transparent about the potential false-positives in the dataset README + build descriptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can see that. From my perspective, when I’m looking at a phylogenetic tree, the coloring makes misclassifications pretty obvious because they stand out as being in the wrong part of the tree. My concern is more about people using the nextclade-cli norovirus dataset in automated workflows, where they may never look at the trees directly. In that context, false positives are more likely to propagate downstream without being noticed.

That said, I can add an explicit warning message in the README and description.md about GII.4 Sidney false positives and how this Nextclade dataset is a preliminary draft.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I'm fine with adding explicit warnings in about the false positives in the README and description. In the blog post itself, would be good to add a brief description of why this Nextclade dataset is still a preliminary draft.

Hah, fix wrong word

Co-authored-by: Kim Andrews <[email protected]>
@nextstrain-bot nextstrain-bot temporarily deployed to nextstrain-s-j23414-blo-wq3bec December 16, 2025 21:41 Inactive
Co-authored-by: Kim Andrews <[email protected]>
@nextstrain-bot nextstrain-bot temporarily deployed to nextstrain-s-j23414-blo-wq3bec December 17, 2025 01:14 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants