Skip to content

Commit d8a3110

Browse files
authored
Merge pull request #161 from griffithlab/edit_docs
Edit docs
2 parents 5226219 + af0b36c commit d8a3110

9 files changed

+130
-95
lines changed

docs/about.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
# RegTools
2+
23
RegTools is a project by The Griffith Lab at the McDonnell Genome Institute.
34
The source for the project is on [Github.](https://github.com/griffithlab/regtools)
45

5-
##License
6+
## License
67

78
The project is licensed under the MIT license.

docs/commands/cis-ase-identify.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1-
###Synopsis
1+
# Overview of `cis-ase identify` command
2+
23
The `cis-ase identify` command is used to identify allele-specific expression events. This command takes in a list of germline variants and somatic variants in the VCF format. The module also needs RNAseq alignments produced with a splice-aware aligner in the BAM format and an alignment of the DNA reads in the BAM format. The tool then proceeds to identify polymorphisms that show allele specific expression near the somatic variant sites.
34

4-
###Usage
5+
## Usage
6+
57
`regtools cis-ase identify [options] somatic_variants.vcf polymorphism.vcf dna_alignments.bam rna_alignments.bam ref.fa annotations.gtf`
68

7-
###Input
9+
## Input
10+
811
| Input | Description |
912
| ------ | ----------- |
1013
| somatic-variants.vcf | Somatic variant calls in VCF format. The tool looks for allele specific expression at polymorphic loci near the somatic variants|
@@ -16,7 +19,8 @@ The `cis-ase identify` command is used to identify allele-specific expression ev
1619

1720
**Note** - Please make sure that the version of the annotation GTF that you use corresponds with the version of the assembly build (ref.fa) and that the co-ordinates in the VCF file are also from the same build.
1821

19-
###Options
22+
## Options
23+
2024
| Option | Description |
2125
| ------ | ----------- |
2226
| -o | Output file containing the variants that show evidence for allele specific expression. [STDOUT] |
@@ -25,7 +29,8 @@ The `cis-ase identify` command is used to identify allele-specific expression ev
2529
| -E | Flag to look at all neighboring polymorphisms for ASE, not just the exonic polymorphisms. |
2630
| -B | Flag to use the binomial model and not the default beta model. This feature is under test. |
2731

28-
###Output
32+
## Output
33+
2934
The output is in the VCF format and contains a list of polymorphic sites that show evidence for allele specific expression.
3035

3136
TODO - add details about the model parameters here.
Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
[csei]: ../images/csei_examples.png
22

3-
###Synopsis
3+
# Overview of `cis-splice-effects associate` command
4+
45
The `cis-splice-effects associate` command is used to identify splicing misregulation events. This command is similar to `cis-splice-effects identify`, but takes the BED output of `junctions extract` in lieu of a BAM file with RNA alignments. The tool then proceeds to associate non-canonical splicing junctions near the variant sites.
56

6-
###Usage
7+
## Usage
8+
79
`regtools cis-splice-effects associate [options] variants.vcf junctions.bed ref.fa annotations.gtf`
810

9-
###Input
11+
## Input
12+
1013
| Input | Description |
1114
| ------ | ----------- |
1215
| variants.vcf | Variant call in VCF format from which to look for cis-splice-effects.|
@@ -16,22 +19,25 @@ The `cis-splice-effects associate` command is used to identify splicing misregul
1619

1720
**Note** - Please make sure that the version of the annotation GTF that you use corresponds with the version of the assembly build (ref.fa) and that the co-ordinates in the VCF file are also from the same build.
1821

19-
###Options
22+
## Options
23+
2024
| Option | Description |
2125
| ------ | ----------- |
22-
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT] |
23-
| -v STR | Output file containing variants annotated as splice relevant (VCF format). |
24-
| -j STR | Output file containing the aberrant junctions in BED12 format. |
25-
| -w INT | Window size in b.p to associate splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
26-
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3] |
27-
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2] |
28-
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
29-
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
30-
| -S | Don't skip single exon transcripts. |
31-
32-
###Output
26+
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT] |
27+
| -v STR | Output file containing variants annotated as splice relevant (VCF format). |
28+
| -j STR | Output file containing the aberrant junctions in BED12 format. |
29+
| -w INT | Window size in b.p to associate splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
30+
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3] |
31+
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2] |
32+
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
33+
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
34+
| -S | Don't skip single exon transcripts. |
35+
36+
## Output
37+
3338
For an explanation of the annotated junctions that are identified by this command please refer to the output of the `junctions annotate` command [here](junctions-annotate.md#output)
3439
For an explanation of the annotated variants that are identified by this command when using the -v option, please refer to the output of the `variants annotate` command [here](variants-annotate.md#output)
3540

36-
###Examples
41+
## Examples
42+
3743
![cis-splice-effects identify example][csei]
Lines changed: 24 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
[csei]: ../images/csei_examples.png
22

3-
###Synopsis
4-
The `cis-splice-effects identify` command is used to identify splicing misregulation events. This command takes in a list of variants in the VCF format and RNAseq alignments produced with a splice-aware aligner in the BAM/CRAM format. The tool then proceeds to identify non-canonical splicing junctions near the variant sites.
53

6-
###Usage
4+
The `cis-splice-effects identify` command is used to identify splicing misregulation events. This command takes in a list of variants in the VCF format and RNAseq alignments produced with a splice-aware aligner in the BAM format. The tool then proceeds to identify non-canonical splicing junctions near the variant sites.
5+
6+
7+
## Usage
8+
79
`regtools cis-splice-effects identify [options] variants.vcf alignments.bam ref.fa annotations.gtf`
810

9-
###Input
11+
## Input
12+
1013
| Input | Description |
1114
| ------ | ----------- |
1215
| variants.vcf | Variant call in VCF format from which to look for cis-splice-effects.|
@@ -16,23 +19,26 @@ The `cis-splice-effects identify` command is used to identify splicing misregula
1619

1720
**Note** - Please make sure that the version of the annotation GTF that you use corresponds with the version of the assembly build (ref.fa) and that the co-ordinates in the VCF file are also from the same build.
1821

19-
###Options
22+
## Options
23+
2024
| Option | Description |
2125
| ------ | ----------- |
22-
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT] |
23-
| -v STR | Output file containing variants annotated as splice relevant (VCF format). |
24-
| -j STR | Output file containing the aberrant junctions in BED12 format. |
25-
| -s INT | Strand specificity of RNA library preparation, where 0 = unstranded/XS, 1 = first-strand/RF, 2 = second-strand/FR. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help. |
26-
| -w INT | Window size in b.p to identify splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
27-
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3] |
28-
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2] |
29-
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
30-
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
31-
| -S | Don't skip single exon transcripts. |
32-
33-
###Output
26+
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT] |
27+
| -v STR | Output file containing variants annotated as splice relevant (VCF format). |
28+
| -j STR | Output file containing the aberrant junctions in BED12 format. |
29+
| -s INT | Strand specificity of RNA library preparation, where 0 = unstranded/XS, 1 = first-strand/RF, 2 = second-strand/FR. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help. |
30+
| -w INT | Window size in b.p to identify splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
31+
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3] |
32+
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2] |
33+
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
34+
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
35+
| -S | Don't skip single exon transcripts. |
36+
37+
## Output
38+
3439
For an explanation of the annotated junctions that are identified by this command please refer to the output of the `junctions annotate` command [here](junctions-annotate.md#output)
3540
For an explanation of the annotated variants that are identified by this command when using the -v option, please refer to the output of the `variants annotate` command [here](variants-annotate.md#output)
3641

37-
###Examples
42+
## Examples
43+
3844
![cis-splice-effects identify example][csei]

docs/commands/commands.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
##Synopsis
1+
# RegTools commands
2+
23
To get a list of all available regtools commands `regtools --help`
34

45
The main regtools commands are
@@ -7,30 +8,34 @@ The main regtools commands are
78
- [junctions](#junctions)
89
- [variants](#variants)
910

10-
##cis-splice-effects
11+
## cis-splice-effects
12+
1113
This set of tools helps identify and work with aberrant splicing events near variants, these could be somatic variants or germline polymorphisms/mutations. These variants are hypothesized to act in cis and affect how the gene is transcribed.
1214

1315
Below are links to detailed explanations of the `cis-splice-effects` sub-commands:
1416

1517
- [identify](cis-splice-effects-identify.md)
1618
- [associate](cis-splice-effects-associate.md)
1719

18-
##cis-ase
20+
## cis-ase
21+
1922
This set of tools helps identify and work with allele-specific-expression near variants, these could be somatic variants or germline polymorphisms/mutations. These variants are hypothesized to act in cis and affect how the gene is transcribed.
2023

2124
Below are links to detailed explanations of the `cis-ase` sub-commands:
2225

2326
- [identify](cis-ase-identify.md)
2427

25-
##junctions
28+
## junctions
29+
2630
The transcriptome structure is often summarized from a RNAseq experiment with a BED file. This BED file contains the exon-exon boundary co-ordinates which are referred to as junctions. For example, TopHat outputs a file called 'junctions.bed' which contains this information. This file is very useful if you are interested in studying which exons/transcripts are expressed, splicing effects etc. On the command line these commands can be accessed using the `regtools junctions` command.
2731

2832
Listed below are links to detailed explanations of the `junctions` sub-commands:
2933

3034
- [extract](junctions-extract.md)
3135
- [annotate](junctions-annotate.md)
3236

33-
##variants
37+
## variants
38+
3439
The variants sub-command contains a list of tools that deal with variants that are potentially regulatory in nature. Variants are generally accepted in the standard VCF format unless specified otherwise.
3540

3641
Below are links to detailed explanations of the `variants` sub-commands:
Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,34 @@
11
[junction_annotation]: ../images/junction_annotation_examples.png
22
[anchor_annotation]: ../images/anchor_examples.png
33

4-
###Synopsis
4+
# Overview of `regtools junctions annotate` command
5+
56
The `regtools junctions annotate` command is a tool to annotate the observed junctions with respect to a known
67
transcript structure. The known transcript structure is in the form of a GTF file obtained from one of the standard
78
Gene Annotation databases such as Ensembl/RefSeq/UCSC etc. The goal of the annotation step is to help identify novel/unusual junctions.
89

9-
###Usage
10+
## Usage
11+
1012
`regtools junctions annotate [options] junctions.bed ref.fa annotations.gtf`
1113

12-
###Input
14+
## Input
15+
1316
| Input | Description |
1417
| ------ | ----------- |
1518
| junctions.bed | The BED file with the junctions that have be annotated. This file has to be in the BED12 format. One recommended way of obtaining this file is by running `regtools junctions extract`. See [here](junctions-extract.md) for more details.|
1619
| ref.fa | The reference FASTA file. The donor and acceptor sequences used in the "splice-site" column are extracted from the FASTA file. |
1720
| annotations.gtf | The GTF file specifies the transcriptome that is used to annotate the junctions. For examples, the Ensembl GTFs for release78 are [here](ftp://ftp.ensembl.org/pub/release-78/gtf/)|
1821

22+
## Options
1923

20-
###Options
2124
| Option | Description |
2225
| ------ | ----------- |
2326
| -S | Do not skip single exon genes. The default is to skip the single exon genes while annotating junctions.|
2427
| -o | File to write output to. STDOUT by default. The output format is described [here](#output)|
2528
| -h | Display help message for this command.|
2629

27-
###Output
30+
## Output
31+
2832
| Column name | Description |
2933
| ----------- | ----------- |
3034
| chrom | Chromosome of the junction.|
@@ -34,18 +38,20 @@ Gene Annotation databases such as Ensembl/RefSeq/UCSC etc. The goal of the annot
3438
| score | The number of reads supporting the junction. [integer]|
3539
| strand | The strand the junction is identified. Same as the input file. [+/-]|
3640
| splice_site | The two basepairs at the donor and acceptor sites separated by a hyphen. [e.g CT-AG]|
37-
| acceptors_skipped | Number of known acceptors skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
38-
| exons_skipped | Number of known exons skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
39-
| donors_skipped | Number of known donors skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
41+
| acceptors_skipped | Number of known acceptors skipped by this junction according to the GTF. See [Notes](#notes) below for explanation. [integer]|
42+
| exons_skipped | Number of known exons skipped by this junction according to the GTF. See [Notes](#notes) below for explanation. [integer]|
43+
| donors_skipped | Number of known donors skipped by this junction according to the GTF. See [Notes](#notes) below for explanation. [integer]|
4044
| anchor | Field that specifies the donor, acceptor configuration. See [Notes](#notes) below for explanation. [D/A/DA/NDA/N]|
4145
| known_donor | Is the junction-donor a known donor in the GTF file? [0/1]|
4246
| known_acceptor | Is junction-donor a known acceptor in the GTF file? [0/1]|
4347
| known_junction | Does the junction have a known donor-acceptor pair according to the GTF file. This is equivalent to "DA" in the "anchor" column.|
4448
| transcripts | The transcripts that overlap the junction according to the input GTF file. |
4549
| genes | The genes that overlap the junction according to the input GTF file. |
4650

47-
###Notes
48-
####Annotating observed junctions with known donor/acceptor/junction information
51+
## Notes
52+
53+
### Annotating observed junctions with known donor/acceptor/junction information
54+
4955
It is useful to annotate the ends of junction with respect to known acceptors,
5056
donors and junctions in the transcriptome. The known acceptor, donor and junction
5157
information is computed from the GTF file and this information is then used to annotate the observed
@@ -56,7 +62,6 @@ The junctions are annotated using the following nomenclature (and as shown in th
5662
1. DA - The ends of this junction are known donor and known acceptor sites according to "annotations.gtf".
5763
This junction is known to the transcriptome.
5864

59-
6065
2. NDA - The ends of this junction are known donor and known acceptor sites, according to "annotations.gtf".
6166
This junction is not known to the transcriptome (novel).
6267

@@ -69,10 +74,10 @@ This junction is not known to the transcriptome (novel).
6974
5. N - The ends of this junction are a novel donor site and a novel acceptor site, according to "annotations.gtf".
7075
This junction is not known to the transcriptome (novel).
7176

72-
7377
![Anchor-annotation example][anchor_annotation]
7478

75-
####Annotating a junction with number of donors/acceptors/exons skipped
79+
### Annotating a junction with number of donors/acceptors/exons skipped
80+
7681
Exon skipping is a form of RNA splicing that can be identified using RNAseq data. It is hence useful
7782
to compute for every observed putative exon-exon junction, the number of exons skipped, the number of
7883
known donor sites skipped and the number of known acceptor sites skipped. The known exons, donors and
@@ -85,5 +90,4 @@ considered to be different the number of exons skipped is 3. We try and provide
8590

8691
![Junction-annotation example][junction_annotation]
8792

88-
If any of the examples are not clear or if you would like more information please feel free to open an issue on GitHub [here](https://github.com/griffithlab/regtools)
89-
or post on the discussion page [here.](https://groups.google.com/d/forum/regtools)
93+
If any of the examples are not clear or if you would like more information please feel free to open an issue on GitHub [here](https://github.com/griffithlab/regtools) or post on the discussion page [here](https://groups.google.com/d/forum/regtools).

0 commit comments

Comments
 (0)