Skip to content

Commit 2e3a291

Browse files
authored
Merge pull request #163 from griffithlab/singlecell
Incorporate single cell changes
2 parents d8a3110 + f0a0cdb commit 2e3a291

19 files changed

+701
-70
lines changed

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ include(TestHelper)
1111

1212
#versioning stuff
1313
set (regtools_VERSION_MAJOR 0)
14-
set (regtools_VERSION_MINOR 5)
15-
set (regtools_VERSION_PATCH 2)
14+
set (regtools_VERSION_MINOR 0)
15+
set (regtools_VERSION_PATCH 1)
1616

1717
configure_file (
1818
"${PROJECT_SOURCE_DIR}/src/version.h.in"

Dockerfile

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,12 @@
22
##################### Set Inital Image to work from ############################
33

44
# work from latest LTS ubuntu release
5-
FROM ubuntu:18.04
5+
FROM ubuntu:20.04
66

77
# set variables
88
ENV r_version 3.6.0
9+
ENV TZ=US/Chicago
10+
ENV DEBIAN_FRONTEND noninteractive
911

1012
# run update
1113
RUN apt-get update -y && apt-get install -y \
@@ -25,7 +27,8 @@ RUN apt-get update -y && apt-get install -y \
2527
git \
2628
build-essential \
2729
cmake \
28-
python3
30+
python3 \
31+
python3-pip
2932

3033
################################################################################
3134
##################### Add Container Labels #####################################
@@ -51,23 +54,38 @@ RUN make install
5154
# install R packages
5255
RUN R --vanilla -e 'install.packages(c("data.table", "plyr", "tidyverse"), repos = "http://cran.us.r-project.org")'
5356

57+
################################################################################
58+
##################### Install SpliceAI #########################################
59+
60+
RUN pip3 install spliceai
61+
RUN pip3 install --upgrade tensorflow
62+
RUN pip3 install keras==2.4.3
63+
5464
################################################################################
5565
##################### Install Regtools #########################################
5666

57-
# add repo source
67+
68+
# removed this due to docker build pulling the correct branch already and the below command actually overwriting the desired branch to master
69+
# clone git repository
5870
ADD . /regtools
5971

60-
# make a build directory for regtools
72+
# change to regtools to build it
73+
6174
WORKDIR /regtools
6275

6376
# compile from source
6477
RUN mkdir build && cd build && cmake .. && make
6578

6679
################################################################################
67-
###################### set environment path #################################
80+
################### Make scripts executable ####################################
6881

69-
# make a build directory for regtools
70-
WORKDIR /scripts/
82+
WORKDIR /regtools/scripts
83+
84+
RUN chmod ugo+x *
85+
86+
################################################################################
87+
###################### set environment path #################################
7188

7289
# add regtools executable to path
73-
ENV PATH="/regtools/build:/usr/local/bin/R-${r_version}:${PATH}"
90+
ENV PATH="/regtools/build:/usr/local/bin:/usr/local/bin/R-${r_version}:${PATH}"
91+

README.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
[![Documentation Status](https://readthedocs.org/projects/regtools/badge/?version=latest)](https://readthedocs.org/projects/regtools/?badge=latest)
33
[![Coverage Status](https://coveralls.io/repos/griffithlab/regtools/badge.svg?branch=master&service=github)](https://coveralls.io/github/griffithlab/regtools?branch=master)
44

5-
# regtools
5+
# RegTools
66

77
Tools that integrate DNA-seq and RNA-seq data to help interpret mutations
88
in a regulatory and splicing context.
@@ -14,6 +14,17 @@ in a regulatory and splicing context.
1414
- Annotate exon-exon junctions with information from a known transcriptome.
1515
- Annotate variants with splice-region(the definition of this region is configurable) annotations.
1616

17+
## Hardware requirements
18+
RegTools requires only a standard computer with enough RAM to support the in-memory operations.
19+
20+
## Software requirements
21+
OS Requirements
22+
This package is supported for macOS and Linux. The package has been tested on the following systems:
23+
24+
macOS: macOS 10.12 (Sierra), macOS 10.13 (High Sierra), macOS 10.14 (Mojave), macOS 10.15 (Catalina), macOS 11 (Big Sur), macOS 12 (Monterey)
25+
26+
Linux: Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04
27+
1728
## Installation
1829

1930
Clone and install regtools by running the following:
@@ -26,6 +37,8 @@ Clone and install regtools by running the following:
2637
make
2738
```
2839

40+
Installation should take 1-5 minutes.
41+
2942
For convienience we also maintain a docker image available at [https://hub.docker.com/r/griffithlab/regtools/](https://hub.docker.com/r/griffithlab/regtools/)
3043

3144
## Usage:
@@ -34,6 +47,34 @@ For convienience we also maintain a docker image available at [https://hub.docke
3447
regtools --help
3548
```
3649

50+
If one wishes to test their installation, we include test data under `test_data`.
51+
52+
Here's an example command using that data along with the example output. This should run in under a minute.
53+
54+
```sh
55+
regtools cis-splice-effects identify -s RF -e 10 -i 10 test_data/HCC1395_chr22.vcf.gz test_data/HCC1395_tumor.bam test_data/chr22_with_ERCC92.fa test_data/chr22_with_ERCC92.gtf
56+
57+
Variant 22 42129188 42129189 -1
58+
Variant region is 22:42128784-42130813
59+
60+
chrom start end name score strand splice_site acceptors_skipped exons_skipped donors_skipped anchor known_donor known_acceptor known_junction gene_names gene_ids transcripts variant_info
61+
position = 22:42125408-42125409
62+
position = 22:42130565-42130566
63+
22 42125407 42130567 JUNC00000001 4 + GT-AG 0 0 0 D 1 0 0 NDUFA6-AS1 ENSG00000237037 ENST00000439129 22:42129188-42129189
64+
position = 22:42128881-42128882
65+
position = 22:42129670-42129671
66+
22 42128880 42129672 JUNC00000002 3 + GT-AG 0 0 0 N 0 0 0 NA NA NA 22:42129188-42129189
67+
position = 22:42128944-42128945
68+
position = 22:42129031-42129032
69+
22 42128943 42129033 JUNC00000003 4 - GT-GG 1 0 0 D 1 0 0 CYP2D6 ENSG00000100197 ENST00000360608,ENST00000389970,ENST00000488442 22:42129188-42129189
70+
position = 22:42129783-42129784
71+
position = 22:42143453-42143454
72+
22 42129782 42143455 JUNC00000004 2 + GC-AG 9 8 9 N 0 0 0 NA NA NA 22:42129188-42129189
73+
position = 22:42130224-42130225
74+
position = 22:42130565-42130566
75+
22 42130223 42130567 JUNC00000005 2 + GT-AG 0 0 0 N 0 0 0 NA NA NA 22:42129188-42129189
76+
```
77+
3778
## Contribute
3879

3980
- Issue Tracker: github.com/griffithlab/regtools/issues

build-common/python/integrationtest.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ def inputFiles(self, *names):
3434
return rv
3535

3636
def execute(self, args):
37-
print("args ", args)
37+
print(args ), args
38+
3839
cmdline = "%s %s" %(self.exe_path, " ".join(args))
3940
vglog_file = self.tempFile("valgrind.log")
4041
return ValgrindWrapper(shlex.split(cmdline), vglog_file).run()

docs/commands/cis-splice-effects-identify.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ The `cis-splice-effects identify` command is used to identify splicing misregula
1313
| Input | Description |
1414
| ------ | ----------- |
1515
| variants.vcf | Variant call in VCF format from which to look for cis-splice-effects.|
16-
| alignments.bam | Aligned RNAseq BAM/CRAM produced with a splice aware aligner, that has been indexed for example with `samtools index`. We have tested this command with alignments from TopHat.|
16+
| alignments.bam | Aligned RNAseq BAM/CRAM produced with a splice aware aligner, that has been indexed for example with `samtools index`. We have tested this command with alignments from HISAT2, TopHat2, STAR, kallisto, and minimap2.|
1717
| ref.fa | The reference FASTA file. The donor and acceptor sequences used in the "splice-site" column of the annotated junctions are extracted from the FASTA file. |
18-
| annotations.gtf | The GTF file specifies the transcriptome that is used to annotate the junctions and variants. For examples, the Ensembl GTFs for release78 are [here](ftp://ftp.ensembl.org/pub/release-78/gtf/).|
18+
| annotations.gtf | The GTF file specifies the transcriptome that is used to annotate the junctions and variants. For examples, the Ensembl GTFs for release 106 are [here](http://ftp.ensembl.org/pub/release-106/gtf/).|
1919

2020
**Note** - Please make sure that the version of the annotation GTF that you use corresponds with the version of the assembly build (ref.fa) and that the co-ordinates in the VCF file are also from the same build.
2121

@@ -26,7 +26,7 @@ The `cis-splice-effects identify` command is used to identify splicing misregula
2626
| -o STR | Output file containing the aberrant splice junctions with annotations. [STDOUT] |
2727
| -v STR | Output file containing variants annotated as splice relevant (VCF format). |
2828
| -j STR | Output file containing the aberrant junctions in BED12 format. |
29-
| -s INT | Strand specificity of RNA library preparation, where 0 = unstranded/XS, 1 = first-strand/RF, 2 = second-strand/FR. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help. |
29+
| -s INT | Strand specificity of RNA library preparation, where the options XS, use XS tags provided by aligner; RF, first-strand; FR, second-strand. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help. |
3030
| -w INT | Window size in b.p to identify splicing events in. The tool identifies events in variant.start +/- w basepairs. Default behaviour is to look at the window between previous and next exons. |
3131
| -e INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in exonic space, i.e a coding variant. [3] |
3232
| -i INT | Maximum distance from the start/end of an exon to annotate a variant as relevant to splicing, the variant is in intronic space. [2] |

docs/commands/junctions-annotate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Gene Annotation databases such as Ensembl/RefSeq/UCSC etc. The goal of the annot
1717
| ------ | ----------- |
1818
| junctions.bed | The BED file with the junctions that have be annotated. This file has to be in the BED12 format. One recommended way of obtaining this file is by running `regtools junctions extract`. See [here](junctions-extract.md) for more details.|
1919
| ref.fa | The reference FASTA file. The donor and acceptor sequences used in the "splice-site" column are extracted from the FASTA file. |
20-
| annotations.gtf | The GTF file specifies the transcriptome that is used to annotate the junctions. For examples, the Ensembl GTFs for release78 are [here](ftp://ftp.ensembl.org/pub/release-78/gtf/)|
20+
| annotations.gtf | The GTF file specifies the transcriptome that is used to annotate the junctions. For examples, the Ensembl GTFs for release 106 are [here](http://ftp.ensembl.org/pub/release-106/gtf/).||
2121

2222
## Options
2323

docs/commands/junctions-extract.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Overview of `junctions extract` command
22

3-
The `junctions extract` command can be used to extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format. We have tested this command with alignments from TopHat and by comparing the exon-exon junctions with the `junctions.bed` file produced from TopHat.
3+
The `junctions extract` command can be used to extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format. We have tested this command with alignments from HISAT2, TopHat2, STAR, kallisto, and minimap2 and by comparing the exon-exon junctions with the `junctions.bed` file produced from TopHat.
44

55

66
## Usage
@@ -23,7 +23,7 @@ The `junctions extract` command can be used to extract exon-exon junctions from
2323
| -o | File to write output to. STDOUT by default.|
2424
| -r | Region to extract junctions in. This is specified in the format "chr:start-end". If not specified, junctions are extracted from the entire BAM file.|
2525
| -h | Display help message for this command.|
26-
| -s | Strand specificity of RNA library preparation, where 0 = unstranded/XS, 1 = first-strand/RF, 2 = second-strand/FR. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help.
26+
| -s | Strand specificity of RNA library preparation, where the options XS, use XS tags provided by aligner; RF, first-strand; FR, second-strand. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this [table](https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/) to help.
2727

2828
## Output
2929

docs/index.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,34 @@ make
2727
regtools --help
2828
```
2929

30+
If one wishes to test their installation, we include test data under `test_data`.
31+
32+
Here's an example command using that data along with the example output.
33+
34+
```sh
35+
regtools cis-splice-effects identify -s RF -e 10 -i 10 test_data/HCC1395_chr22.vcf.gz test_data/HCC1395_tumor.bam test_data/chr22_with_ERCC92.fa test_data/chr22_with_ERCC92.gtf
36+
37+
Variant 22 42129188 42129189 -1
38+
Variant region is 22:42128784-42130813
39+
40+
chrom start end name score strand splice_site acceptors_skipped exons_skipped donors_skipped anchor known_donor known_acceptor known_junction gene_names gene_ids transcripts variant_info
41+
position = 22:42125408-42125409
42+
position = 22:42130565-42130566
43+
22 42125407 42130567 JUNC00000001 4 + GT-AG 0 0 0 D 1 0 0 NDUFA6-AS1 ENSG00000237037 ENST00000439129 22:42129188-42129189
44+
position = 22:42128881-42128882
45+
position = 22:42129670-42129671
46+
22 42128880 42129672 JUNC00000002 3 + GT-AG 0 0 0 N 0 0 0 NA NA NA 22:42129188-42129189
47+
position = 22:42128944-42128945
48+
position = 22:42129031-42129032
49+
22 42128943 42129033 JUNC00000003 4 - GT-GG 1 0 0 D 1 0 0 CYP2D6 ENSG00000100197 ENST00000360608,ENST00000389970,ENST00000488442 22:42129188-42129189
50+
position = 22:42129783-42129784
51+
position = 22:42143453-42143454
52+
22 42129782 42143455 JUNC00000004 2 + GC-AG 9 8 9 N 0 0 0 NA NA NA 22:42129188-42129189
53+
position = 22:42130224-42130225
54+
position = 22:42130565-42130566
55+
22 42130223 42130567 JUNC00000005 2 + GT-AG 0 0 0 N 0 0 0 NA NA NA 22:42129188-42129189
56+
```
57+
3058
For information about the individual RegTools commands, please see [the Commands page](commands/commands.md)
3159

3260
## Contribute

scripts/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
test

0 commit comments

Comments
 (0)