Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

nextstrain.org/vzv

This is the Nextstrain build for Varicella-Zoster Virus (VZV). The results can be viewed locally with Auspice or deployed to nextstrain.org.

Usage

If you're new to Nextstrain, we recommend starting with the
Running a Pathogen Workflow guide.

Using nextstrain build

Clone or update the repository:

git clone https://github.com/verenne04/vzv.git
cd vzv/phylogenetic
# or, if already cloned
git pull origin main

Run the workflow:

nextstrain build .

View the results:

nextstrain view .

The auspice/vzv.json file will be created for visualization.


Configuration

The default configuration lives in config/config.yaml.
Workflow rules are defined in the Snakefile and include:

  • Filtering low-quality sequences
  • Aligning sequences with Nextclade
  • Masking repetitive or problematic regions
  • Calling mutations and assigning clades

Clade Definitions

VZV clades are based on fixed SNPs described in the file 'clades.tsv'.

Defined clades:

Clade name Synonyms (CDC/UK/Iowa–Canada nomenclatures) Geographic association Notes
1 E1/C/A Europe, North America Common western lineage
2 J/J/B Japan, China Dominant East Asian lineage; vaccine-origin clade
3 E2/B/D Europe, North America Common western lineage
4 M2/J/C Africa, Asia
5 M1/A/-- Africa, Indian subcontinent
6 M4/--/-- Global circulation Recombinant of clades 1/3 and 4/5
VII M3/--/-- -- Not circulating; only one partial-genome strain
VIII -- Global circulation Putative clade; closely related to clade 6
9 -- Identified in India for the first time Europe (UK, Germany, USA), India (newly reported)

Note: Clades 8 and 9 are rare and poorly sampled. Their placement and interpretation may change as more genomes become available.

Clade-defining SNPs are stored in clades.tsv and loaded during augur assign-clades.


Key masked regions

Region Coordinates Feature Reason
R1 13459–14696 ORF11 Tandem repeats within coding region (mask recommended)
R2 20593–21143 ORF14 (glycoprotein C) Tandem repeats within glycoprotein C
R3 37914–44413 ORF22 Highly variable tandem repeat region (masking essential)
R4 54555–54558 Non-coding Short highly variable site (improves phylogenetic accuracy)
R5 101650–102573 ORF60–61 intergenic Repetitive motif region (optional masking)
R6 104955–104991 Intergenic Unannotated repetitive region
R7 105009–105018 Non-coding Highly variable short stretch, poor alignment quality
R8 110035–110394 Non-coding Repetitive stretch / poor alignment
R9 117790–124697 Reversed repeated region Duplicated ORFs (e.g. ORF62/71, ORF63/70, ORF64/69)
R10 124881–124884 Non-coding Small hypervariable site (improves topology when masked)

Further reading


Files & Outputs

File Description
clades.tsv List of clade-defining SNPs
mask.bed Regions to mask before tree construction
alignment.fasta Aligned sequences
tree.nwk Newick tree
nt_muts.json Nucleotide mutation annotations
aa_muts.json Amino acid mutation annotations
auspice/vzv.json Visualization JSON for Auspice

Visualization

View the results with:

nextstrain view .