This is the Nextstrain build for Varicella-Zoster Virus (VZV). The results can be viewed locally with Auspice or deployed to nextstrain.org.
If you're new to Nextstrain, we recommend starting with the
Running a Pathogen Workflow guide.
Clone or update the repository:
git clone https://github.com/verenne04/vzv.git
cd vzv/phylogenetic
# or, if already cloned
git pull origin mainRun the workflow:
nextstrain build .View the results:
nextstrain view .The auspice/vzv.json file will be created for visualization.
The default configuration lives in config/config.yaml.
Workflow rules are defined in the Snakefile and include:
- Filtering low-quality sequences
- Aligning sequences with Nextclade
- Masking repetitive or problematic regions
- Calling mutations and assigning clades
VZV clades are based on fixed SNPs described in the file 'clades.tsv'.
Defined clades:
| Clade name | Synonyms (CDC/UK/Iowa–Canada nomenclatures) | Geographic association | Notes |
|---|---|---|---|
| 1 | E1/C/A | Europe, North America | Common western lineage |
| 2 | J/J/B | Japan, China | Dominant East Asian lineage; vaccine-origin clade |
| 3 | E2/B/D | Europe, North America | Common western lineage |
| 4 | M2/J/C | Africa, Asia | |
| 5 | M1/A/-- | Africa, Indian subcontinent | |
| 6 | M4/--/-- | Global circulation | Recombinant of clades 1/3 and 4/5 |
| VII | M3/--/-- | -- | Not circulating; only one partial-genome strain |
| VIII | -- | Global circulation | Putative clade; closely related to clade 6 |
| 9 | -- | Identified in India for the first time | Europe (UK, Germany, USA), India (newly reported) |
Note: Clades 8 and 9 are rare and poorly sampled. Their placement and interpretation may change as more genomes become available.
Clade-defining SNPs are stored in clades.tsv and loaded during augur assign-clades.
| Region | Coordinates | Feature | Reason |
|---|---|---|---|
| R1 | 13459–14696 | ORF11 | Tandem repeats within coding region (mask recommended) |
| R2 | 20593–21143 | ORF14 (glycoprotein C) | Tandem repeats within glycoprotein C |
| R3 | 37914–44413 | ORF22 | Highly variable tandem repeat region (masking essential) |
| R4 | 54555–54558 | Non-coding | Short highly variable site (improves phylogenetic accuracy) |
| R5 | 101650–102573 | ORF60–61 intergenic | Repetitive motif region (optional masking) |
| R6 | 104955–104991 | Intergenic | Unannotated repetitive region |
| R7 | 105009–105018 | Non-coding | Highly variable short stretch, poor alignment quality |
| R8 | 110035–110394 | Non-coding | Repetitive stretch / poor alignment |
| R9 | 117790–124697 | Reversed repeated region | Duplicated ORFs (e.g. ORF62/71, ORF63/70, ORF64/69) |
| R10 | 124881–124884 | Non-coding | Small hypervariable site (improves topology when masked) |
| File | Description |
|---|---|
clades.tsv |
List of clade-defining SNPs |
mask.bed |
Regions to mask before tree construction |
alignment.fasta |
Aligned sequences |
tree.nwk |
Newick tree |
nt_muts.json |
Nucleotide mutation annotations |
aa_muts.json |
Amino acid mutation annotations |
auspice/vzv.json |
Visualization JSON for Auspice |
View the results with:
nextstrain view .