You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `cis-ase identify` command is used to identify allele-specific expression events. This command takes in a list of germline variants and somatic variants in the VCF format. The module also needs RNAseq alignments produced with a splice-aware aligner in the BAM format and an alignment of the DNA reads in the BAM format. The tool then proceeds to identify polymorphisms that show allele specific expression near the somatic variant sites.
| somatic-variants.vcf | Somatic variant calls in VCF format. The tool looks for allele specific expression at polymorphic loci near the somatic variants|
@@ -16,7 +19,8 @@ The `cis-ase identify` command is used to identify allele-specific expression ev
16
19
17
20
**Note** - Please make sure that the version of the annotation GTF that you use corresponds with the version of the assembly build (ref.fa) and that the co-ordinates in the VCF file are also from the same build.
18
21
19
-
###Options
22
+
## Options
23
+
20
24
| Option | Description |
21
25
| ------ | ----------- |
22
26
| -o | Output file containing the variants that show evidence for allele specific expression. [STDOUT]|
@@ -25,7 +29,8 @@ The `cis-ase identify` command is used to identify allele-specific expression ev
25
29
| -E | Flag to look at all neighboring polymorphisms for ASE, not just the exonic polymorphisms. |
26
30
| -B | Flag to use the binomial model and not the default beta model. This feature is under test. |
27
31
28
-
###Output
32
+
## Output
33
+
29
34
The output is in the VCF format and contains a list of polymorphic sites that show evidence for allele specific expression.
30
35
31
36
TODO - add details about the model parameters here.
# Overview of `cis-splice-effects associate` command
4
+
4
5
The `cis-splice-effects associate` command is used to identify splicing misregulation events. This command is similar to `cis-splice-effects identify`, but takes the BED output of `junctions extract` in lieu of a BAM file with RNA alignments. The tool then proceeds to associate non-canonical splicing junctions near the variant sites.
| variants.vcf | Variant call in VCF format from which to look for cis-splice-effects.|
@@ -16,22 +19,25 @@ The `cis-splice-effects associate` command is used to identify splicing misregul
16
19
17
20
**Note** - Please make sure that the version of the annotation GTF that you use corresponds with the version of the assembly build (ref.fa) and that the co-ordinates in the VCF file are also from the same build.
18
21
19
-
###Options
22
+
## Options
23
+
20
24
| Option | Description |
21
25
| ------ | ----------- |
22
-
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT]|
| -j STR | Output file containing the aberrant junctions in BED12 format. |
25
-
| -w INT | Window size in b.p to associate splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
26
-
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3]|
27
-
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2]|
28
-
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
29
-
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
30
-
| -S | Don't skip single exon transcripts. |
31
-
32
-
###Output
26
+
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT]|
| -j STR | Output file containing the aberrant junctions in BED12 format. |
29
+
| -w INT | Window size in b.p to associate splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
30
+
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3]|
31
+
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2]|
32
+
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
33
+
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
34
+
| -S | Don't skip single exon transcripts. |
35
+
36
+
## Output
37
+
33
38
For an explanation of the annotated junctions that are identified by this command please refer to the output of the `junctions annotate` command [here](junctions-annotate.md#output)
34
39
For an explanation of the annotated variants that are identified by this command when using the -v option, please refer to the output of the `variants annotate` command [here](variants-annotate.md#output)
The `cis-splice-effects identify` command is used to identify splicing misregulation events. This command takes in a list of variants in the VCF format and RNAseq alignments produced with a splice-aware aligner in the BAM/CRAM format. The tool then proceeds to identify non-canonical splicing junctions near the variant sites.
5
3
6
-
###Usage
4
+
The `cis-splice-effects identify` command is used to identify splicing misregulation events. This command takes in a list of variants in the VCF format and RNAseq alignments produced with a splice-aware aligner in the BAM format. The tool then proceeds to identify non-canonical splicing junctions near the variant sites.
| variants.vcf | Variant call in VCF format from which to look for cis-splice-effects.|
@@ -16,23 +19,26 @@ The `cis-splice-effects identify` command is used to identify splicing misregula
16
19
17
20
**Note** - Please make sure that the version of the annotation GTF that you use corresponds with the version of the assembly build (ref.fa) and that the co-ordinates in the VCF file are also from the same build.
18
21
19
-
###Options
22
+
## Options
23
+
20
24
| Option | Description |
21
25
| ------ | ----------- |
22
-
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT]|
| -j STR | Output file containing the aberrant junctions in BED12 format. |
25
-
| -s INT | Strand specificity of RNA library preparation, where 0 = unstranded/XS, 1 = first-strand/RF, 2 = second-strand/FR. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help. |
26
-
| -w INT | Window size in b.p to identify splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
27
-
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3]|
28
-
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2]|
29
-
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
30
-
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
31
-
| -S | Don't skip single exon transcripts. |
32
-
33
-
###Output
26
+
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT]|
| -j STR | Output file containing the aberrant junctions in BED12 format. |
29
+
| -s INT | Strand specificity of RNA library preparation, where 0 = unstranded/XS, 1 = first-strand/RF, 2 = second-strand/FR. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help. |
30
+
| -w INT | Window size in b.p to identify splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
31
+
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3]|
32
+
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2]|
33
+
| -I | Annotate variants in intronic space within a transcript(not to be used with -i). |
34
+
| -E | Annotate variants in exonic space within a transcript(not to be used with -e). |
35
+
| -S | Don't skip single exon transcripts. |
36
+
37
+
## Output
38
+
34
39
For an explanation of the annotated junctions that are identified by this command please refer to the output of the `junctions annotate` command [here](junctions-annotate.md#output)
35
40
For an explanation of the annotated variants that are identified by this command when using the -v option, please refer to the output of the `variants annotate` command [here](variants-annotate.md#output)
Copy file name to clipboardExpand all lines: docs/commands/commands.md
+10-5Lines changed: 10 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,5 @@
1
-
##Synopsis
1
+
# RegTools commands
2
+
2
3
To get a list of all available regtools commands `regtools --help`
3
4
4
5
The main regtools commands are
@@ -7,30 +8,34 @@ The main regtools commands are
7
8
-[junctions](#junctions)
8
9
-[variants](#variants)
9
10
10
-
##cis-splice-effects
11
+
## cis-splice-effects
12
+
11
13
This set of tools helps identify and work with aberrant splicing events near variants, these could be somatic variants or germline polymorphisms/mutations. These variants are hypothesized to act in cis and affect how the gene is transcribed.
12
14
13
15
Below are links to detailed explanations of the `cis-splice-effects` sub-commands:
14
16
15
17
-[identify](cis-splice-effects-identify.md)
16
18
-[associate](cis-splice-effects-associate.md)
17
19
18
-
##cis-ase
20
+
## cis-ase
21
+
19
22
This set of tools helps identify and work with allele-specific-expression near variants, these could be somatic variants or germline polymorphisms/mutations. These variants are hypothesized to act in cis and affect how the gene is transcribed.
20
23
21
24
Below are links to detailed explanations of the `cis-ase` sub-commands:
22
25
23
26
-[identify](cis-ase-identify.md)
24
27
25
-
##junctions
28
+
## junctions
29
+
26
30
The transcriptome structure is often summarized from a RNAseq experiment with a BED file. This BED file contains the exon-exon boundary co-ordinates which are referred to as junctions. For example, TopHat outputs a file called 'junctions.bed' which contains this information. This file is very useful if you are interested in studying which exons/transcripts are expressed, splicing effects etc. On the command line these commands can be accessed using the `regtools junctions` command.
27
31
28
32
Listed below are links to detailed explanations of the `junctions` sub-commands:
29
33
30
34
-[extract](junctions-extract.md)
31
35
-[annotate](junctions-annotate.md)
32
36
33
-
##variants
37
+
## variants
38
+
34
39
The variants sub-command contains a list of tools that deal with variants that are potentially regulatory in nature. Variants are generally accepted in the standard VCF format unless specified otherwise.
35
40
36
41
Below are links to detailed explanations of the `variants` sub-commands:
| junctions.bed | The BED file with the junctions that have be annotated. This file has to be in the BED12 format. One recommended way of obtaining this file is by running `regtools junctions extract`. See [here](junctions-extract.md) for more details.|
16
19
| ref.fa | The reference FASTA file. The donor and acceptor sequences used in the "splice-site" column are extracted from the FASTA file. |
17
20
| annotations.gtf | The GTF file specifies the transcriptome that is used to annotate the junctions. For examples, the Ensembl GTFs for release78 are [here](ftp://ftp.ensembl.org/pub/release-78/gtf/)|
18
21
22
+
## Options
19
23
20
-
###Options
21
24
| Option | Description |
22
25
| ------ | ----------- |
23
26
| -S | Do not skip single exon genes. The default is to skip the single exon genes while annotating junctions.|
24
27
| -o | File to write output to. STDOUT by default. The output format is described [here](#output)|
25
28
| -h | Display help message for this command.|
26
29
27
-
###Output
30
+
## Output
31
+
28
32
| Column name | Description |
29
33
| ----------- | ----------- |
30
34
| chrom | Chromosome of the junction.|
@@ -34,18 +38,20 @@ Gene Annotation databases such as Ensembl/RefSeq/UCSC etc. The goal of the annot
34
38
| score | The number of reads supporting the junction. [integer]|
35
39
| strand | The strand the junction is identified. Same as the input file. [+/-]|
36
40
| splice_site | The two basepairs at the donor and acceptor sites separated by a hyphen. [e.g CT-AG]|
37
-
| acceptors_skipped | Number of known acceptors skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
38
-
| exons_skipped | Number of known exons skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
39
-
| donors_skipped | Number of known donors skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
41
+
| acceptors_skipped | Number of known acceptors skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
42
+
| exons_skipped | Number of known exons skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
43
+
| donors_skipped | Number of known donors skipped by this junction according to the GTF. See[Notes](#notes) below for explanation. [integer]|
40
44
| anchor | Field that specifies the donor, acceptor configuration. See [Notes](#notes) below for explanation. [D/A/DA/NDA/N]|
41
45
| known_donor | Is the junction-donor a known donor in the GTF file? [0/1]|
42
46
| known_acceptor | Is junction-donor a known acceptor in the GTF file? [0/1]|
43
47
| known_junction | Does the junction have a known donor-acceptor pair according to the GTF file. This is equivalent to "DA" in the "anchor" column.|
44
48
| transcripts | The transcripts that overlap the junction according to the input GTF file. |
45
49
| genes | The genes that overlap the junction according to the input GTF file. |
46
50
47
-
###Notes
48
-
####Annotating observed junctions with known donor/acceptor/junction information
51
+
## Notes
52
+
53
+
### Annotating observed junctions with known donor/acceptor/junction information
54
+
49
55
It is useful to annotate the ends of junction with respect to known acceptors,
50
56
donors and junctions in the transcriptome. The known acceptor, donor and junction
51
57
information is computed from the GTF file and this information is then used to annotate the observed
@@ -56,7 +62,6 @@ The junctions are annotated using the following nomenclature (and as shown in th
56
62
1. DA - The ends of this junction are known donor and known acceptor sites according to "annotations.gtf".
57
63
This junction is known to the transcriptome.
58
64
59
-
60
65
2. NDA - The ends of this junction are known donor and known acceptor sites, according to "annotations.gtf".
61
66
This junction is not known to the transcriptome (novel).
62
67
@@ -69,10 +74,10 @@ This junction is not known to the transcriptome (novel).
69
74
5. N - The ends of this junction are a novel donor site and a novel acceptor site, according to "annotations.gtf".
70
75
This junction is not known to the transcriptome (novel).
71
76
72
-
73
77
![Anchor-annotation example][anchor_annotation]
74
78
75
-
####Annotating a junction with number of donors/acceptors/exons skipped
79
+
### Annotating a junction with number of donors/acceptors/exons skipped
80
+
76
81
Exon skipping is a form of RNA splicing that can be identified using RNAseq data. It is hence useful
77
82
to compute for every observed putative exon-exon junction, the number of exons skipped, the number of
78
83
known donor sites skipped and the number of known acceptor sites skipped. The known exons, donors and
@@ -85,5 +90,4 @@ considered to be different the number of exons skipped is 3. We try and provide
If any of the examples are not clear or if you would like more information please feel free to open an issue on GitHub [here](https://github.com/griffithlab/regtools)
89
-
or post on the discussion page [here.](https://groups.google.com/d/forum/regtools)
93
+
If any of the examples are not clear or if you would like more information please feel free to open an issue on GitHub [here](https://github.com/griffithlab/regtools) or post on the discussion page [here](https://groups.google.com/d/forum/regtools).
0 commit comments