Skip to content

Commit 9c1fea2

Browse files
authored
Merge pull request #20 from nextstrain/add-N450-tree
Make tree for 450bp of the N gene ("N450")
2 parents 0855c99 + bf83a42 commit 9c1fea2

13 files changed

+222
-49
lines changed

CHANGES.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# CHANGELOG
2+
* 1 April 2024: Create a tree using the 450 nucleotides encoding the carboxyl-terminal 150 amino acids of the nucleoprotein (N450), which is highly represented on NCBI for measles. [PR #20](https://github.com/nextstrain/measles/pull/20)
23
* 15 March 2024: Connect ingest and phylogenetic workflows to follow the pathogen-repo-guide by uploading ingest output to S3, downloading ingest output from S3 to phylogenetic directory, using "accession" column as the ID column, and using a color scheme that matches the new region name format. [PR #19](https://github.com/nextstrain/measles/pull/19)
34
* 1 March 2024: Add phylogenetic directory to follow the pathogen-repo-guide, and update the CI workflow to match the new file structure. [PR #18](https://github.com/nextstrain/measles/pull/18)
45
* 14 February 2024: Add ingest directory from pathogen-repo-guide and make measles-specific modifications. [PR #10](https://github.com/nextstrain/measles/pull/10)

phylogenetic/Snakefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1+
genes = ['N450', 'genome']
2+
13
configfile: "defaults/config.yaml"
24

35
rule all:
46
input:
5-
auspice_json = "auspice/measles.json",
7+
auspice_json = expand("auspice/measles_{gene}.json", gene=genes)
68

79
include: "rules/prepare_sequences.smk"
10+
include: "rules/prepare_sequences_N450.smk"
811
include: "rules/construct_phylogeny.smk"
912
include: "rules/annotate_phylogeny.smk"
1013
include: "rules/export.smk"

phylogenetic/defaults/auspice_config.json

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,6 @@
1616
"title": "Date",
1717
"type": "continuous"
1818
},
19-
{
20-
"key": "author",
21-
"title": "Author",
22-
"type": "categorical"
23-
},
2419
{
2520
"key": "country",
2621
"title": "Country",
@@ -43,5 +38,8 @@
4338
"country",
4439
"region",
4540
"author"
41+
],
42+
"metadata_columns": [
43+
"author"
4644
]
4745
}
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
{
2+
"title": "Real-time tracking of measles virus evolution",
3+
"maintainers": [
4+
{"name": "Kim Andrews", "url": "https://bedford.io/team/kim-andrews/"},
5+
{"name": "the Nextstrain team", "url": "https://nextstrain.org/team"}
6+
],
7+
"build_url": "https://github.com/nextstrain/measles",
8+
"colorings": [
9+
{
10+
"key": "gt",
11+
"title": "Genotype",
12+
"type": "categorical"
13+
},
14+
{
15+
"key": "num_date",
16+
"title": "Date",
17+
"type": "continuous"
18+
},
19+
{
20+
"key": "country",
21+
"title": "Country",
22+
"type": "categorical"
23+
},
24+
{
25+
"key": "region",
26+
"title": "Region",
27+
"type": "categorical"
28+
}
29+
],
30+
"geo_resolutions": [
31+
"country",
32+
"region"
33+
],
34+
"display_defaults": {
35+
"map_triplicate": true
36+
},
37+
"filters": [
38+
"country",
39+
"region",
40+
"author"
41+
],
42+
"metadata_columns": [
43+
"author"
44+
]
45+
}

phylogenetic/defaults/colors.tsv

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,3 @@ region Africa #8ABB6A
44
region Europe #BEBB48
55
region South America #E29E39
66
region North America #E2562B
7-
8-
country india #511EA8
9-
country china #4333BE
10-
country vietnam #3F4ECB
11-
country south korea #4169CF
12-
country japan #4682C9
13-
country australia #4F96BB
14-
country new zealand #5AA5A8
15-
country russia #68AF92
16-
country gambia #78B77D
17-
country sudan #8BBB6A
18-
country morocco #9EBE59
19-
country italy #B3BD4D
20-
country germany #C5B945
21-
country france #D5B03F
22-
country netherlands #E0A23A
23-
country united kingdom #E68D36
24-
country brazil #E67231
25-
country usa #E1502A
26-
country canada #DC2F24

phylogenetic/defaults/config.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,21 @@ strain_id_field: "accession"
22
files:
33
exclude: "defaults/dropped_strains.txt"
44
reference: "defaults/measles_reference.gb"
5+
reference_N450: "defaults/measles_reference_N450.gb"
6+
reference_N450_fasta: "defaults/measles_reference_N450.fasta"
57
colors: "defaults/colors.tsv"
68
auspice_config: "defaults/auspice_config.json"
9+
auspice_config_N450: "defaults/auspice_config_N450.json"
710
filter:
811
group_by: "country year month"
912
sequences_per_group: 20
1013
min_date: 1950
1114
min_length: 5000
15+
filter_N450:
16+
group_by: "country year"
17+
subsample_max_sequences: 3000
18+
min_date: 1950
19+
min_length: 400
1220
refine:
1321
coalescent: "opt"
1422
date_inference: "marginal"
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
>lcl|NC_001498.1_cds_NP_056918.1_1 [gene=N] [locus_tag=MeVgp1] [db_xref=GeneID:1489804] [protein=nucleocapsid protein] [protein_id=NP_056918.1] [location=1233..1682] [gbkey=CDS]
2+
GTCAGTTCCACATTGGCATCCGAACTCGGTATCACTGCCGAGGATGCAAGGCTTGTTTCAGAGAT
3+
TGCAATGCATACTACTGAGGACAGGATCAGTAGAGCGGTCGGACCCAGACAAGCCCAAGTGTCATTTCTA
4+
CACGGTGATCAAAGTGAGAATGAGCTACCAGGATTGGGGGGCAAGGAAGATAGGAGGGTCAAACAGGGTC
5+
GGGGAGAAGCCAGGGAGAGCTACAGAGAAACCGGGTCCAGCAGAGCAAGTGATGCGAGAGCTGCCCATCC
6+
TCCAACCAGCATGCCCCTAGACATTGACACTGCATCGGAGTCAGGCCAAGATCCGCAGGACAGTCGAAGG
7+
TCAGCTGACGCCCTGCTCAGGCTGCAAGCCATGGCAGGAATCTTGGAAGAACAAGGCTCAGACACGGACA
8+
CCCCTAGGGTATACAATGACAGAGATCTTCTAGAC
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
LOCUS NC_001498 450 bp cRNA linear VRL 13-AUG-2018
2+
DEFINITION Measles virus, complete genome.
3+
ACCESSION NC_001498 REGION: 1233..1682
4+
VERSION NC_001498.1
5+
DBLINK Project: 15025
6+
BioProject: PRJNA485481
7+
KEYWORDS RefSeq.
8+
SOURCE Measles morbillivirus
9+
ORGANISM Measles morbillivirus
10+
Viruses; Riboviria; Orthornavirae; Negarnaviricota;
11+
Haploviricotina; Monjiviricetes; Mononegavirales; Paramyxoviridae;
12+
Orthoparamyxovirinae; Morbillivirus; Morbillivirus hominis.
13+
REFERENCE 1 (sites)
14+
AUTHORS Rima,B.K. and Duprex,W.P.
15+
TITLE The measles virus replication cycle
16+
JOURNAL Curr. Top. Microbiol. Immunol. 329, 77-102 (2009)
17+
PUBMED 19198563
18+
REFERENCE 2
19+
AUTHORS Takeuchi,K., Miyajima,N., Kobune,F. and Tashiro,M.
20+
TITLE Comparative nucleotide sequence analyses of the entire genomes of
21+
B95a cell-isolated and vero cell-isolated measles viruses from the
22+
same patient
23+
JOURNAL Virus Genes 20 (3), 253-257 (2000)
24+
PUBMED 10949953
25+
REFERENCE 3 (bases 1 to 450)
26+
CONSRTM NCBI Genome Project
27+
TITLE Direct Submission
28+
JOURNAL Submitted (01-AUG-2000) National Center for Biotechnology
29+
Information, NIH, Bethesda, MD 20894, USA
30+
REFERENCE 4 (bases 1 to 450)
31+
AUTHORS Takeuchi,K., Tanabayashi,K. and Tashiro,M.
32+
TITLE Direct Submission
33+
JOURNAL Submitted (10-JUL-1998) Kaoru Takeuchi, National Institute of
34+
Infectious Diseases, Viral Disease and Vaccine Contorol; 4-7-1
35+
Gakuen, Musashi-murayama, Tokyo 208-0011, Japan
36+
(E-mail:[email protected], Tel:81-42-561-0771(ex.530),
37+
Fax:81-42-567-5631)
38+
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
39+
reference sequence was derived from AB016162.
40+
Sequence updated (21-Jul-1998)
41+
Sequence updated (11-Dec-1998).
42+
COMPLETENESS: full length.
43+
FEATURES Location/Qualifiers
44+
source 1..450
45+
/organism="Measles morbillivirus"
46+
/mol_type="viral cRNA"
47+
/strain="Ichinose-B95a"
48+
/db_xref="taxon:11234"
49+
CDS <1..>450
50+
/gene="N"
51+
/codon_start=1
52+
/product="nucleocapsid protein"
53+
/protein_id="NP_056918.1"
54+
/db_xref="GeneID:1489804"
55+
/translation="VSSTLASELGITAEDAR
56+
LVSEIAMHTTEDRISRAVGPRQAQVSFLHGDQSENELPGLGGKEDRRVKQGRGEARES
57+
YRETGSSRASDARAAHPPTSMPLDIDTASESGQDPQDSRRSADALLRLQAMAGILEEQ
58+
GSDTDTPRVYNDRDLLD"
59+
ORIGIN
60+
1 gtcagttcca cattggcatc cgaactcggt atcactgccg aggatgcaag gcttgtttca
61+
61 gagattgcaa tgcatactac tgaggacagg atcagtagag cggtcggacc cagacaagcc
62+
121 caagtgtcat ttctacacgg tgatcaaagt gagaatgagc taccaggatt ggggggcaag
63+
181 gaagatagga gggtcaaaca gggtcgggga gaagccaggg agagctacag agaaaccggg
64+
241 tccagcagag caagtgatgc gagagctgcc catcctccaa ccagcatgcc cctagacatt
65+
301 gacactgcat cggagtcagg ccaagatccg caggacagtc gaaggtcagc tgacgccctg
66+
361 ctcaggctgc aagccatggc aggaatcttg gaagaacaag gctcagacac ggacacccct
67+
421 agggtataca atgacagaga tcttctagac
68+
//
69+

phylogenetic/rules/annotate_phylogeny.smk

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ See Augur's usage docs for these commands for more details.
88
rule ancestral:
99
"""Reconstructing ancestral sequences and mutations"""
1010
input:
11-
tree = "results/tree.nwk",
12-
alignment = "results/aligned.fasta"
11+
tree = "results/{gene}/tree.nwk",
12+
alignment = "results/{gene}/aligned.fasta"
1313
output:
14-
node_data = "results/nt_muts.json"
14+
node_data = "results/{gene}/nt_muts.json"
1515
params:
1616
inference = config["ancestral"]["inference"]
1717
shell:
@@ -26,11 +26,11 @@ rule ancestral:
2626
rule translate:
2727
"""Translating amino acid sequences"""
2828
input:
29-
tree = "results/tree.nwk",
30-
node_data = "results/nt_muts.json",
31-
reference = config["files"]["reference"]
29+
tree = "results/{gene}/tree.nwk",
30+
node_data = "results/{gene}/nt_muts.json",
31+
reference = lambda wildcard: "defaults/measles_reference.gb" if wildcard.gene in ["genome"] else "defaults/measles_reference_{gene}.gb"
3232
output:
33-
node_data = "results/aa_muts.json"
33+
node_data = "results/{gene}/aa_muts.json"
3434
shell:
3535
"""
3636
augur translate \

phylogenetic/rules/construct_phylogeny.smk

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ See Augur's usage docs for these commands for more details.
77
rule tree:
88
"""Building tree"""
99
input:
10-
alignment = "results/aligned.fasta"
10+
alignment = "results/{gene}/aligned.fasta"
1111
output:
12-
tree = "results/tree_raw.nwk"
12+
tree = "results/{gene}/tree_raw.nwk"
1313
shell:
1414
"""
1515
augur tree \
@@ -26,12 +26,12 @@ rule refine:
2626
- filter tips more than {params.clock_filter_iqd} IQDs from clock expectation
2727
"""
2828
input:
29-
tree = "results/tree_raw.nwk",
30-
alignment = "results/aligned.fasta",
29+
tree = "results/{gene}/tree_raw.nwk",
30+
alignment = "results/{gene}/aligned.fasta",
3131
metadata = "data/metadata.tsv"
3232
output:
33-
tree = "results/tree.nwk",
34-
node_data = "results/branch_lengths.json"
33+
tree = "results/{gene}/tree.nwk",
34+
node_data = "results/{gene}/branch_lengths.json"
3535
params:
3636
coalescent = config["refine"]["coalescent"],
3737
date_inference = config["refine"]["date_inference"],
@@ -50,6 +50,7 @@ rule refine:
5050
--coalescent {params.coalescent} \
5151
--date-confidence \
5252
--date-inference {params.date_inference} \
53-
--clock-filter-iqd {params.clock_filter_iqd}
53+
--clock-filter-iqd {params.clock_filter_iqd} \
54+
--stochastic-resolve
5455
"""
5556

0 commit comments

Comments
 (0)