Name	Name	Last commit message	Last commit date
parent directory ..
config	config
defaults	defaults
profiles/default	profiles/default
scripts	scripts
.gitignore	.gitignore
README.md	README.md
Snakefile	Snakefile

nextstrain.org/vzv

This is the Nextstrain build for Varicella-Zoster Virus (VZV). The results can be viewed locally with Auspice or deployed to nextstrain.org.

Usage

If you're new to Nextstrain, we recommend starting with the
Running a Pathogen Workflow guide.

Using `nextstrain build`

Clone or update the repository:

git clone https://github.com/verenne04/vzv.git
cd vzv/phylogenetic
# or, if already cloned
git pull origin main

Run the workflow:

nextstrain build .

View the results:

nextstrain view .

The auspice/vzv.json file will be created for visualization.

Configuration

The default configuration lives in config/config.yaml.
Workflow rules are defined in the Snakefile and include:

Filtering low-quality sequences
Aligning sequences with Nextclade
Masking repetitive or problematic regions
Calling mutations and assigning clades

Clade Definitions

VZV clades are based on fixed SNPs described in the file 'clades.tsv'.

Defined clades:

Clade name	Synonyms (CDC/UK/Iowa–Canada nomenclatures)	Geographic association	Notes
1	E1/C/A	Europe, North America	Common western lineage
2	J/J/B	Japan, China	Dominant East Asian lineage; vaccine-origin clade
3	E2/B/D	Europe, North America	Common western lineage
4	M2/J/C	Africa, Asia
5	M1/A/--	Africa, Indian subcontinent
6	M4/--/--	Global circulation	Recombinant of clades 1/3 and 4/5
VII	M3/--/--	--	Not circulating; only one partial-genome strain
VIII	--	Global circulation	Putative clade; closely related to clade 6
9	--	Identified in India for the first time	Europe (UK, Germany, USA), India (newly reported)

Note: Clades 8 and 9 are rare and poorly sampled. Their placement and interpretation may change as more genomes become available.

Clade-defining SNPs are stored in clades.tsv and loaded during augur assign-clades.

Key masked regions

Region	Coordinates	Feature	Reason
R1	13459–14696	ORF11	Tandem repeats within coding region (mask recommended)
R2	20593–21143	ORF14 (glycoprotein C)	Tandem repeats within glycoprotein C
R3	37914–44413	ORF22	Highly variable tandem repeat region (masking essential)
R4	54555–54558	Non-coding	Short highly variable site (improves phylogenetic accuracy)
R5	101650–102573	ORF60–61 intergenic	Repetitive motif region (optional masking)
R6	104955–104991	Intergenic	Unannotated repetitive region
R7	105009–105018	Non-coding	Highly variable short stretch, poor alignment quality
R8	110035–110394	Non-coding	Repetitive stretch / poor alignment
R9	117790–124697	Reversed repeated region	Duplicated ORFs (e.g. ORF62/71, ORF63/70, ORF64/69)
R10	124881–124884	Non-coding	Small hypervariable site (improves topology when masked)

Files & Outputs

File	Description
`clades.tsv`	List of clade-defining SNPs
`mask.bed`	Regions to mask before tree construction
`alignment.fasta`	Aligned sequences
`tree.nwk`	Newick tree
`nt_muts.json`	Nucleotide mutation annotations
`aa_muts.json`	Amino acid mutation annotations
`auspice/vzv.json`	Visualization JSON for Auspice

Visualization

View the results with:

nextstrain view .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

nextstrain.org/vzv

Usage

Using `nextstrain build`

Configuration

Clade Definitions

Key masked regions

Further reading

Files & Outputs

Visualization

FilesExpand file tree

phylogenetic

Directory actions

More options

Directory actions

More options

Latest commit

History

phylogenetic

Folders and files

parent directory

README.md

nextstrain.org/vzv

Usage

Using nextstrain build

Configuration

Clade Definitions

Key masked regions

Further reading

Files & Outputs

Visualization

Using `nextstrain build`