-
Notifications
You must be signed in to change notification settings - Fork 27
Expand file tree
/
Copy pathvadr.config
More file actions
56 lines (56 loc) · 3.44 KB
/
vadr.config
File metadata and controls
56 lines (56 loc) · 3.44 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# Format of this config file:
# - one line per 'options key': per specific set of v-annotate.pl options to
# use for a set of models
# - each line has 3 fields: 'options_key' 'model_dir' 'options'
# - first two fields are white space delimited, (that is, first token is
# 'options key' (no spaces allowed) and second token is 'model_dir' (no spaces allowed)
# All remaining text is combined to make field 3 (field 3 will contains whitespace)
#
dengue $VADRINSTALLDIR/vadr-models-flavi --split --cpu 1 --group Dengue --nomisc --noprotid --mkey flavi -r
hcv $VADRINSTALLDIR/vadr-models-flavi --split --cpu 1 -r --mkey flavi --group HCV
flavi $VADRINSTALLDIR/vadr-models-flavi --split --cpu 1 -r --nomisc
norovirus $VADRINSTALLDIR/vadr-models-calici --split --cpu 1 --group Norovirus --nomisc --noprotid --mkey calici -r
calici $VADRINSTALLDIR/vadr-models-calici --split --cpu 1 -r --nomisc
sarscov2 $VADRINSTALLDIR/vadr-models-sarscov2 --split --cpu 4 -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --glsearch
corona $VADRINSTALLDIR/vadr-models-corona --split --cpu 1 -s -r --nomisc --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --glsearch
flu $VADRINSTALLDIR/vadr-models-flu --split --cpu 2 -r --atgonly --alt_fail extrant5,extrant3 --xnocomp --nomisc --forcegene
rsv $VADRINSTALLDIR/vadr-models-rsv --split --cpu 1 -r --xnocomp --nomisc
mpxv $VADRINSTALLDIR/vadr-models-mpxv --split --cpu 1 --glsearch --minimap2 -s -r --nomisc --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin --s_overhang 150
#
# Rule for how we determine <s> value for --mkey <s> to pass to v-annotate.pl
# inside v-scan.pl:
# 1. It is '--mkey <s>' if '--mkey <s>' exists in the 'options' string (field 3)
# 2. Else it is 'options_key' value
#
# This means that multiple, different 'options_key' values can use the
# same model directory. This allows us to have a different line for
# 'options_key' values 'norovirus' and 'calici'. for example, but
# have them both use --mkey calici for the --cls_only classification
# stage inside v-scan.pl. Any sequences that match best to 'norovirus'
# will be rerun using v-annotate.pl with the options string for the
# line starting with 'norovirus'. Any sequences matching to
# non-norovirus models in the 'calici' library will be rerun using
# v-annotate.pl with the options string for the line starting with
# 'calici'.
#
# For sequences to 'match best' to norovirus, they must
# match to a model in the 'calici' library that has either a name,
# group or subgroup value that is 'norovirus' (*after removing special
# characters and converting to lowercase*, so a name, group or
# subgroup values of 'NOROvirus' 'norovirus!' or 'Noro-virus' would
# all 'match best' to norovirus). The v-scan.pl program checks
# that at least one model in each library $okey (e.g. 'calici') meets
# this criteria for every $okey2 (e.g. 'norovirus') that uses $okey as
# its model key.
#
# About the --split --cpu <n> options:
# * --split is used to parallelize processing in v-annotate.pl
# if you want to run on only 1 CPU, remove `--split --cpu <n>`
# from each line
# * --cpu <n> controls how many threads will run, you may want
# to change this depending on how much available RAM you have:
# * suggested amount of RAM per CPU for each virus:
# sarscov2: 2Gb
# flu: 4Gb
# dengue, hcv, flavi, norovirus, calici, corona, mpxv: 8Gb
# rsv: 16Gb