You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-12Lines changed: 25 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,20 +9,21 @@ bash install.sh
9
9
bash build-vue-app.sh
10
10
```
11
11
Only installation on Linux x86_64 is currently supported.
12
-
Tested in the Protected Environment of the Center for High Performance Computing (CHPC) at University of Utah.
12
+
Tested in the Protected Environment computer cluster of the Center for High Performance Computing (CHPC) at University of Utah.
13
13
14
14
## Quick Start
15
15
16
16
Assuming one has access to the protected environment on the CHPC at University of Utah:
17
17
18
18
```
19
-
[sbatch | bash] tests/train.sh $PWD
19
+
bash tests/train.sh $PWD
20
20
```
21
21
22
22
Once training is complete, do:
23
23
```
24
24
bash tests/visualize.sh $PWD
25
25
```
26
+
26
27
Follow the instructions at the command line to view a web app that visualizes observed mutation counts, and those expected under a null model of sequence-dependent mutation (see `model-definition` folder), as a function of genomic coordinate.
27
28
28
29
A plot of estimated mutation probabilities of the neutral model can be found here: https://github.com/quinlan-lab/constraint-tools/blob/main/tests/plot_mutation_probabilities.ipynb
@@ -48,38 +49,44 @@ Required arguments for `train` are:
48
49
49
50
```
50
51
--genome STR
51
-
Path to the reference fasta.
52
+
Path to a reference fasta.
52
53
A "samtools faidx" index is expected to be present at the same path.
53
54
--mutations STR
54
55
Path to a set of mutations specified in Mutation Annotation Format.
55
56
A "tabix" index is expected to be present at the same path.
56
57
--kmer-size INT
57
-
Size of kmer to use in model.
58
-
--output STR
59
-
Path to a directory to store results in.
58
+
Size of kmer of model to be trained.
59
+
--model STR
60
+
Path to a directory to store trained model in.
60
61
```
61
62
62
63
By default the `train` subcommand uses a pre-computed set of putatively neutral regions from the GRCH37 reference. Optionally, the user may change this by specifying the `--regions` argument:
63
64
64
65
```
65
-
--regions STR
66
-
Bed-format file containing a list of genomic intervals on which the model is trained.
66
+
--regions STR
67
+
Bed-format file containing a list of genomic intervals on which the model is to be trained.
67
68
```
68
69
69
70
This produces a specification of the sequence-dependent neutral mutation model in json format, viewable using, e.g.,
Path to the neutral model produced by the train sub-command (in json format). This model is used to compute the expected mutation counts in the visualization.
79
78
--port INT
80
79
The port to serve the web-app on
81
80
```
82
-
81
+
82
+
By default the `visualize` subcommand uses a pre-computed model.
83
+
Optionally, the user may change this by specifying the `--model` argument:
84
+
85
+
```
86
+
--model STR
87
+
Path to a neutral model produced by the train sub-command (in json format). This model is used to compute the expected mutation counts in the visualization.
88
+
```
89
+
83
90
## Input Data
84
91
85
92
Assuming one has access to the protected environment on the CHPC at University of Utah,
@@ -89,6 +96,12 @@ then sorted, block-compressed, and indexed vcf, maf, gtf and fasta files can be
0 commit comments