Skip to content

Commit ffa9a5c

Browse files
committed
1.4.1 release
1 parent db93d30 commit ffa9a5c

File tree

8 files changed

+74
-47
lines changed

8 files changed

+74
-47
lines changed

README.md

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,15 @@
1010

1111
### Overview
1212

13-
The germline variant annotator (*gvanno*) is a simple software package intended for analysis and interpretation of human DNA variants of germline origin. Variants and genes are annotated with disease-related and functional associations from a wide range of sources (see below). Technically, the workflow is built with the [Docker](https://www.docker.com) technology, but it can also be installed through the [Singularity](https://sylabs.io/docs/) framework.
13+
The germline variant annotator (*gvanno*) is a simple software package intended for analysis and interpretation of human DNA variants of germline origin. Variants and genes are annotated with disease-related and functional associations from a wide range of sources (see below). Technically, the workflow is built with the [Docker](https://www.docker.com) technology, and it can also be installed through the [Singularity](https://sylabs.io/docs/) framework.
1414

15-
*gvanno* accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The workflow relies heavily upon [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html), and [vcfanno](https://github.com/brentp/vcfanno). It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record.
15+
*gvanno* accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The workflow relies heavily upon [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html), and [vcfanno](https://github.com/brentp/vcfanno). It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record. Note that if your input VCF contains data (genotypes) from multiple samples (i.e. a multisample VCF), the output TSV file will contain one line/record __per sample variant__.
1616

1717
### News
18-
18+
* December 7th 2020 - **1.4.1 release**
19+
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
20+
* Software update (VEP 102)
21+
* Skipped DisGenet annotations (Open Targets serve similar purpose)
1922
* September 29th 2020 - **1.4.0 release**
2023
* Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
2124
* Software updates (VEP 101)
@@ -36,17 +39,16 @@ The germline variant annotator (*gvanno*) is a simple software package intended
3639

3740
### Annotation resources
3841

39-
* [VEP](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor v101 (GENCODE v35/v19 as the gene reference dataset)
42+
* [VEP](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor v102 (GENCODE v36/v19 as the gene reference dataset)
4043
* [dBNSFP](https://sites.google.com/site/jpopgen/dbNSFP) - Database of non-synonymous functional predictions (v4.1, June 2020)
4144
* [gnomAD](http://gnomad.broadinstitute.org/) - Germline variant frequencies exome-wide (release 2.1, October 2018) - from VEP
4245
* [dbSNP](http://www.ncbi.nlm.nih.gov/SNP/) - Database of short genetic variants (build 153) - from VEP
4346
* [1000 Genomes Project - phase3](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) - Germline variant frequencies genome-wide (May 2013) - from VEP
44-
* [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (August 2020)
45-
* [DisGeNET](http://www.disgenet.org) - Database of gene-disease associations (v7.0, May 2020)
46-
* [Open Targets Platform](https://targetvalidation.org) - Target-disease and target-drug associations (2020_09, September 2020)
47-
* [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - Resource on protein sequence and functional information (2020_04, August 2020)
47+
* [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (December 2020)
48+
* [Open Targets Platform](https://targetvalidation.org) - Target-disease and target-drug associations (2020_11, November 2020)
49+
* [UniProt/SwissProt KnowledgeBase](http://www.uniprot.org) - Resource on protein sequence and functional information (2020_06, December 2020)
4850
* [Pfam](http://pfam.xfam.org) - Database of protein families and domains (v33.1, May 2020)
49-
* [NHGRI-EBI GWAS Catalog](https://www.ebi.ac.uk/gwas/home) - Catalog of published genome-wide association studies (September 9th 2020)
51+
* [NHGRI-EBI GWAS Catalog](https://www.ebi.ac.uk/gwas/home) - Catalog of published genome-wide association studies (December 2nd 2020)
5052

5153

5254
### Getting started
@@ -80,15 +82,15 @@ An installation of Python (version _3.6_) is required to run *gvanno*. Check tha
8082

8183
#### STEP 2: Download *gvanno* and data bundle
8284

83-
1. Download and unpack the [latest software release (1.4.0)](https://github.com/sigven/gvanno/releases/tag/v1.4.0)
85+
1. Download and unpack the [latest software release (1.4.1)](https://github.com/sigven/gvanno/releases/tag/v1.4.1)
8486
2. Download and unpack the assembly-specific data bundle in the gvanno directory
85-
* [grch37 data bundle](https://drive.google.com/file/d/1VnABjA3ZCJLlQxhQKcIGaC17MD0kItVd) (approx 16Gb)
86-
* [grch38 data bundle](https://drive.google.com/file/d/13fbKtAFzcUGDnPfruzgK43PvAKiFc8XL/) (approx 17Gb)
87+
* [grch37 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch37.20201206.tgz) (approx 16Gb)
88+
* [grch38 data bundle](http://insilico.hpc.uio.no/pcgr/gvanno/gvanno.databundle.grch38.20201206.tgz) (approx 17Gb)
8789
* *Unpacking*: `gzip -dc gvanno.databundle.grch37.YYYYMMDD.tgz | tar xvf -`
8890

8991
A _data/_ folder within the _gvanno-X.X_ software folder should now have been produced
90-
3. Pull the [gvanno Docker image (1.4.0)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 1.9Gb):
91-
* `docker pull sigven/gvanno:1.4.0` (gvanno annotation engine)
92+
3. Pull the [gvanno Docker image (1.4.1)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.3Gb):
93+
* `docker pull sigven/gvanno:1.4.1` (gvanno annotation engine)
9294

9395
#### STEP 3: Input preprocessing
9496

@@ -117,7 +119,7 @@ Run the workflow with **gvanno.py**, which takes the following arguments and opt
117119
--query_vcf QUERY_VCF
118120
VCF input file with germline query variants (SNVs/InDels).
119121
--gvanno_dir GVANNO_DIR
120-
Directory that contains the gvanno data bundle, e.g. ~/gvanno-1.4.0
122+
Directory that contains the gvanno data bundle, e.g. ~/gvanno-1.4.1
121123
--output_dir OUTPUT_DIR
122124
Output directory
123125
--genome_assembly {grch37,grch38}
@@ -149,10 +151,10 @@ Run the workflow with **gvanno.py**, which takes the following arguments and opt
149151

150152
The _examples_ folder contains an example VCF file. Analysis of the example VCF can be performed by the following command:
151153

152-
python ~/gvanno-1.4.0/gvanno.py
153-
--query_vcf ~/gvanno-1.4.0/examples/example.grch37.vcf.gz
154-
--gvanno_dir ~/gvanno-1.4.0
155-
--output_dir ~/gvanno-1.4.0
154+
python ~/gvanno-1.4.1/gvanno.py
155+
--query_vcf ~/gvanno-1.4.1/examples/example.grch37.vcf.gz
156+
--gvanno_dir ~/gvanno-1.4.1
157+
--output_dir ~/gvanno-1.4.1
156158
--sample_id example
157159
--genome_assembly grch37
158160
--container docker

gvanno.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@
1212
import toml
1313
from argparse import RawTextHelpFormatter
1414

15-
GVANNO_VERSION = '1.4.0'
16-
DB_VERSION = 'GVANNO_DB_VERSION = 20200928'
17-
VEP_VERSION = '101'
18-
GENCODE_VERSION = '35'
15+
GVANNO_VERSION = '1.4.1'
16+
DB_VERSION = 'GVANNO_DB_VERSION = 20201206'
17+
VEP_VERSION = '102'
18+
GENCODE_VERSION = '36'
1919
VEP_ASSEMBLY = "GRCh38"
2020
DOCKER_IMAGE_VERSION = 'sigven/gvanno:' + str(GVANNO_VERSION)
2121

src/Dockerfile

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ WORKDIR /
201201

202202
ENV PACKAGE_BIO="libhts2 bedtools"
203203
ENV PACKAGE_DEV="gfortran gcc-multilib autoconf liblzma-dev libncurses5-dev libblas-dev liblapack-dev libssh2-1-dev libxml2-dev vim libssl-dev libcairo2-dev libbz2-dev libcurl4-openssl-dev"
204-
ENV PYTHON_MODULES="numpy cython scipy pandas cyvcf2 toml"
204+
ENV PYTHON_MODULES="numpy==1.19.2 cython==0.29.21 scipy==1.5.3 pandas==1.1.3 cyvcf2==0.20.9 toml==0.10.1"
205205
RUN apt-get update \
206206
&& apt-get install -y --no-install-recommends \
207207
nano ed locales vim-tiny fonts-texgyre \
@@ -243,12 +243,16 @@ RUN apt-get update \
243243
USER root
244244
WORKDIR /
245245

246-
RUN git clone https://github.com/atks/vt.git
247-
WORKDIR vt
248-
RUN make
249-
RUN make test
250-
RUN cp vt /usr/local/bin
251-
RUN export PATH=/usr/local/bin:$PATH
246+
## vt - variant tool set - use conda version
247+
## primary use in PCGR/CPSR: decomposition of multiallelic variants in a VCF file
248+
RUN wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh \
249+
&& chmod 0755 miniconda.sh
250+
RUN ["/bin/bash", "-c", "/miniconda.sh -b -p /conda"]
251+
RUN rm miniconda.sh
252+
253+
# update conda & install vt
254+
RUN /conda/bin/conda update conda
255+
RUN /conda/bin/conda install -c bioconda vt
252256

253257
## Clean Up
254258
RUN apt-get clean autoclean
@@ -268,9 +272,9 @@ WORKDIR /
268272
RUN rm -rf $HOME/src/ensembl-vep/t/
269273
RUN rm -f $HOME/src/v335_base.tar.gz
270274
RUN rm -f $HOME/src/release-1-6-924.zip
271-
RUN rm -rf /vt
272-
RUN rm -rf /samtools-1.9.tar.bz2
275+
RUN rm -rf /samtools-1.10.tar.bz2
276+
RUN rm -f /conda/bin/python
273277

274278
ADD gvanno.tgz /
275-
ENV PATH=$PATH:/gvanno
279+
ENV PATH=$PATH:/conda/bin:/gvanno
276280
ENV PYTHONPATH=:/gvanno/lib:${PYTHONPATH}

src/buildDocker.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@ cp /Users/sigven/research/docker/pcgr/src/pcgr/lib/annoutils.py gvanno/lib/
44
tar czvfh gvanno.tgz gvanno/
55
echo "Build the Docker Image"
66
TAG=`date "+%Y%m%d"`
7-
docker build -t sigven/gvanno:$TAG --rm=true .
7+
docker build --no-cache -t sigven/gvanno:$TAG --rm=true .
88

src/gvanno.tgz

26 Bytes
Binary file not shown.

src/gvanno/gvanno_summarise.py

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -50,17 +50,36 @@ def extend_vcf_annotations(query_vcf, gvanno_db_directory, lof_prediction = 0):
5050
w = Writer(out_vcf, vcf)
5151
current_chrom = None
5252
num_chromosome_records_processed = 0
53-
gvanno_xref_map = {'ENSEMBL_TRANSCRIPT_ID':0, 'ENSEMBL_GENE_ID':1, 'ENSEMBL_PROTEIN_ID':2,
54-
'SYMBOL':3, 'SYMBOL_ENTREZ':4,'ENTREZ_ID':5, 'UNIPROT_ID':6, 'APPRIS':7,
55-
'UNIPROT_ACC':8,'REFSEQ_MRNA':9, 'CORUM_ID':10,'TUMOR_SUPPRESSOR':11,
56-
'TUMOR_SUPPRESSOR_EVIDENCE':12, 'ONCOGENE':13, 'ONCOGENE_EVIDENCE':14,'DISGENET_CUI':15,
57-
'MIM_PHENOTYPE_ID':16, 'OPENTARGETS_DISEASE_ASSOCS':17,
58-
'OPENTARGETS_TRACTABILITY_COMPOUND':18, 'OPENTARGETS_TRACTABILITY_ANTIBODY':19,
59-
'PROB_HAPLOINSUFFICIENCY': 20,'PROB_EXAC_LOF_INTOLERANT':21,'PROB_EXAC_LOF_INTOLERANT_HOM':22,
60-
'PROB_EXAC_LOF_TOLERANT_NULL':23,'PROB_EXAC_NONTCGA_LOF_INTOLERANT':24,
61-
'PROB_EXAC_NONTCGA_LOF_INTOLERANT_HOM':25, 'PROB_EXAC_NONTCGA_LOF_TOLERANT_NULL': 26,
62-
'PROB_GNOMAD_LOF_INTOLERANT':27, 'PROB_GNOMAD_LOF_INTOLERANT_HOM': 28, 'PROB_GNOMAD_LOF_TOLERANT_NULL':29,
63-
'ESSENTIAL_GENE_CRISPR': 30, 'ESSENTIAL_GENE_CRISPR2': 31}
53+
gvanno_xref_map = {'ENSEMBL_TRANSCRIPT_ID':0,
54+
'ENSEMBL_GENE_ID':1,
55+
'ENSEMBL_PROTEIN_ID':2,
56+
'SYMBOL':3,
57+
'SYMBOL_ENTREZ':4,
58+
'ENTREZ_ID':5,
59+
'UNIPROT_ID':6,
60+
'UNIPROT_ACC':7,
61+
'REFSEQ_MRNA':8,
62+
'CORUM_ID':9,
63+
'TUMOR_SUPPRESSOR':10,
64+
'TUMOR_SUPPRESSOR_EVIDENCE':11,
65+
'ONCOGENE':12,
66+
'ONCOGENE_EVIDENCE':13,
67+
'MIM_PHENOTYPE_ID':14,
68+
'OPENTARGETS_DISEASE_ASSOCS':15,
69+
'OPENTARGETS_TRACTABILITY_COMPOUND':16,
70+
'OPENTARGETS_TRACTABILITY_ANTIBODY':17,
71+
'PROB_HAPLOINSUFFICIENCY': 18,
72+
'PROB_EXAC_LOF_INTOLERANT':19,
73+
'PROB_EXAC_LOF_INTOLERANT_HOM':20,
74+
'PROB_EXAC_LOF_TOLERANT_NULL':21,
75+
'PROB_EXAC_NONTCGA_LOF_INTOLERANT':22,
76+
'PROB_EXAC_NONTCGA_LOF_INTOLERANT_HOM':23,
77+
'PROB_EXAC_NONTCGA_LOF_TOLERANT_NULL': 24,
78+
'PROB_GNOMAD_LOF_INTOLERANT':25,
79+
'PROB_GNOMAD_LOF_INTOLERANT_HOM': 26,
80+
'PROB_GNOMAD_LOF_TOLERANT_NULL':27,
81+
'ESSENTIAL_GENE_CRISPR': 28,
82+
'ESSENTIAL_GENE_CRISPR2': 29}
6483

6584
vcf_info_element_types = {}
6685
for e in vcf.header_iter():

src/gvanno/lib/annoutils.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -372,10 +372,12 @@ def map_variant_effect_predictors(rec, algorithms):
372372
rec.INFO['PRIMATEAI_DBNSFP'] = str(algo_pred.split(':')[1])
373373
if algo_pred.startswith('list_s2:'):
374374
rec.INFO['LIST_S2_DBNSFP'] = str(algo_pred.split(':')[1])
375+
if algo_pred.startswith('gerp_rs:'):
376+
rec.INFO['GERP_DBNSFP'] = str(algo_pred.split(':')[1])
375377
if algo_pred.startswith('bayesdel_addaf:'):
376378
rec.INFO['BAYESDEL_ADDAF_DBNSFP'] = str(algo_pred.split(':')[1])
377-
if algo_pred.startswith('clinpred:'):
378-
rec.INFO['CLINPRED_DBNSFP'] = str(algo_pred.split(':')[1])
379+
if algo_pred.startswith('aloft:'):
380+
rec.INFO['ALOFTPRED_DBNSFP'] = str(algo_pred.split(':')[1])
379381
if algo_pred.startswith('splice_site_rf:'):
380382
rec.INFO['SPLICE_SITE_RF_DBNSFP'] = str(algo_pred.split(':')[1])
381383
if algo_pred.startswith('splice_site_ada:'):

src/loftee_1.0.3.tgz

-1.71 KB
Binary file not shown.

0 commit comments

Comments
 (0)