Skip to content

Commit c890891

Browse files
Merge branch 'master' into feat/json-schema
2 parents 6cc7a2c + 0ea1eed commit c890891

File tree

2 files changed

+103
-13
lines changed

2 files changed

+103
-13
lines changed

docs/user/assets/web_aaMotifs.png

55.2 KB
Loading

docs/user/input-files/05-pathogen-config.md

Lines changed: 103 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@ Example configuration for SARS-CoV-2:
6363
```json
6464
{
6565
"qc": {
66-
"schemaVersion": "1.2.0",
6766
"privateMutations": {
6867
"enabled": true,
6968
"typical": 8,
@@ -131,8 +130,10 @@ Example:
131130

132131
```json
133132
{
134-
"cli": "3.0.0",
135-
"web": "3.0.0"
133+
"compatibility": {
134+
"cli": "3.0.0",
135+
"web": "3.0.0"
136+
}
136137
}
137138
```
138139

@@ -142,7 +143,7 @@ Optional `str`. The default gene/CDS to be shown in Nextclade web. If not provid
142143

143144
#### `cdsOrderPreference`
144145

145-
Optional `array[str]`. Order in which genes are shown in Nextclade web dropdown. Example value ["S", "ORF1a", "N", "E"]
146+
Optional `array[str]`. Order in which genes are shown in Nextclade web dropdown. Example value `["S", "ORF1a", "N", "E"]`
146147

147148
#### `generalParams`
148149

@@ -158,25 +159,114 @@ Optional `dict`. Parameters for the alignment algorithm. These are identical to
158159

159160
#### `treeBuilderParams`
160161

161-
Optional `dict`. Parameters for the tree building algorithm. These are identical to the corresponding CLI arguments (though here _camelCase_ needs to be used. If not provided, default values are used.
162+
Optional `dict`. Parameters for the tree building algorithm. These are identical to the corresponding CLI arguments (though here _camelCase_ needs to be used). If not provided, default values are used.
162163

163164
- `withoutGreedyTreeBuilder`: If you don't want to use the greedy tree builder, set this to `true`. Default: `false`.
164165
- `maskedMutsWeight`: Parsimony weight for masked mutations. Default: `0.05`.
165166

166-
#### `primers`
167167

168-
TODO
168+
#### Calculate phenotypic scores from mutations (`phenotypeData`)
169169

170-
#### `phenotypeData`
170+
Nextclade can calculate numerical scores derived from mutations in a query sequence relative to the reference sequence.
171+
Such scores could for example be used to calculate predicted ACE2 binding for SARS-CoV-2, immune escape estimates, or potential drug resistance. To specify such numerical scores, the field `phenotypeData` needs to be added to the `pathogen.json`.
171172

172-
TODO
173+
Each such score is based on exactly one CDS and each amino acid mutation can be assigned a specific contribution to the score.
174+
In addition, a "default" value can be specified for amino acid mutations that are not explicitly listed.
175+
```json
176+
{
177+
"phenotypeData": [
178+
{
179+
"aaRange": {
180+
"begin": 330,
181+
"end": 531
182+
},
183+
"description": "Estimated ACE2 binding",
184+
"cds": "S",
185+
"ignore": {
186+
"clades": ["outgroup"]
187+
},
188+
"name": "ace2_binding",
189+
"nameFriendly": "ACE2 binding",
190+
"data": [
191+
{
192+
"name": "binding",
193+
"weight": 1.0,
194+
"locations": {
195+
"330": {
196+
"default": 0.1,
197+
"A": -0.08339,
198+
"C": -0.61624,
199+
"D": -0.1467,
200+
"E": -0.14146,
201+
...
202+
},
203+
"331": {}
204+
...
205+
}
206+
}
207+
]
208+
}
209+
]
210+
}
211+
```
212+
If the score is only relevant for specific clades, you can specify which clades are to be ignored.
213+
214+
#### Amino acid motif detection (`aaMotifs`)
215+
216+
Nextclade can detect and report specific motifs in translated amino acid sequences. This feature is currently being used to highlight changes in glycosylation or cleavage sites, but the feature itself is generic.
217+
To use this feature, you need to add a `aaMotifs` field to the `pathogen.json`.
173218

174-
#### `aaMotifs`
219+
Amino acid motifs can be specified using regular expressions and the parts of the genome in which Nextclade searches for the motifs is specified by listing the CDS and (optional) ranges within these CDSs (e.g.~to restrict to the exposed part of a protein).
220+
An example of a full configuration (for glycosylation in influenza HA) is shown below.
221+
```json
222+
"aaMotifs": [
223+
{
224+
"name": "glycosylation",
225+
"nameShort": "Glyc.",
226+
"nameFriendly": "Glycosylation",
227+
"description": "N-linked glycosylation motifs (N-X-S/T with X any amino acid other than P)",
228+
"includeCdses": [
229+
{
230+
"cds":"HA1",
231+
"ranges":[]
232+
},
233+
{
234+
"cds":"HA2",
235+
"ranges":[{"begin":0, "end":186}]
236+
}
237+
],
238+
"motifs": [
239+
"N[^P][ST]"
240+
]
241+
}
242+
]
243+
```
244+
In the web interface, motifs are reported as shown in the screenshot below:
245+
![aaMotifs](../assets/web_aaMotifs.png)
246+
247+
#### Labelling mutations of interest (`mutLabels`)
248+
249+
Nextclade can highlight specific mutations to the user, for example mutations that are indicative of contamination, drug resistance, or otherwise of particular interest.
250+
To do so, you can specify mutations as "labeled" using the `mutLabels` field in the `pathogenJson`.
251+
Labeled mutations are only searched among the "private" mutations, i.e. mutations in query sequences that are not found in the part of the reference tree the query sequence attaches to.
252+
253+
The json specification looks as follows
254+
```json
255+
{
256+
"mutLabels": {
257+
"nucMutLabelMap": {
258+
"174T": ["20H", ...],
259+
"204T": ["20E"],
260+
...
261+
}
262+
}
263+
}
264+
```
265+
Labeled "private" mutations are shown in the tool-tip of the mutation column when mutations "relative to parent" are shown (private mutations) and exported into the tabular output.
175266

176-
TODO
267+
TODO: add amino acid mutations once released.
177268

178-
#### `mutLabels`
269+
> ⚠️ Note that the specification of these mutations breaks with the convention of zero-indexing. Instead, these labeled mutations are one-indexed and directly correspond to the mutations displayed in the UI or in the tables.
179270
180-
TODO
181271

182272
> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression.md) for more details.

0 commit comments

Comments
 (0)