Integrated Mapping of Phenotype-Associated Candidate Targets for SNV/Indel Analysis
This repository contains the IMPACT-SNV pipeline, which processes and prioritizes single nucleotide variants (SNVs) and indels for rare disease analysis using the FAVOR database and phenotype-specific gene-disease associations.
IMPACT-SNV is part of the broader IMPACT framework for phenotype-configurable interpretation of genomic variants. This module specifically handles SNV/Indel processing and produces output files compatible with IMPACT-VIS for interactive visualization and analysis.
The pipeline consists of four modular steps, each available as a DNAnexus applet:
VCF Files → [Step 1: Merge] → [Step 2: VCF2GDS] → [Step 3: FAVOR Annotate] → [Step 4: Prioritize] → *_SNV_IMPACT.gds
| Step | Folder | Description | Input | Output |
|---|---|---|---|---|
| 1 | step1_vcf_merge/ |
Merges multiple VCF files into chromosome-separated files | .vcf, .vcf.gz |
merged_chr*.vcf.gz |
| 2 | step2_vcf2gds/ |
Converts VCF to GDS format for efficient processing | .vcf.gz |
merged_chr*.gds |
| 3 | step3_favorannotator-rap/ |
Annotates variants using FAVOR database | .gds |
favor_merged_chr*.gds |
| 4 | step4_impact_prioritization/ |
Scores and prioritizes variants by pathogenicity | .gds, GeneList.txt |
*_SNV_IMPACT.gds |
- R ≥ 4.0 with Bioconductor packages:
SeqArray,SeqVarTools,gdsfmtstringi,dplyr,tidyr
- Python 3 (for VCF merging scripts)
- BCFtools, bgzip, tabix (for VCF processing)
- Rust and xsv (for FAVOR annotation step)
These tools are designed for deployment on the DNAnexus Research Analysis Platform. Each folder contains a dxapp.json configuration file for building DNAnexus applets.
- Input: One or more VCF or VCF.GZ files
- Output: Chromosome-separated VCF.GZ files (
merged_chr1.vcf.gz,merged_chr2.vcf.gz, etc.)
- Input: VCF.GZ file from Step 1
- Output: GDS file (
merged_chr*.gds)
- Input: GDS file from Step 2
- Output: Annotated GDS with FAVOR functional annotations
- Input:
- Annotated GDS files from Step 3 (
merged_chr*.gds) - Gene-disease association file (
GeneList.txt) - tab-separated with columns:symbol: Gene symbol (e.g.,GJB2,OTOF)globalScore: Open Targets association score (0-1)
- Annotated GDS files from Step 3 (
- Output: Per-sample GDS files (
{sample_id}_SNV_IMPACT.gds)
The final *_SNV_IMPACT.gds files contain:
| Node | Type | Description |
|---|---|---|
variant.id |
integer | Unique variant identifier |
chromosome |
integer/character | Chromosome |
position |
integer | 1-based genomic position |
sample.id |
character | Sample identifiers |
genotype |
integer | Genotype array |
| Node | Type | Description |
|---|---|---|
annotation/info/impact_score |
numeric | Pathogenicity score (0-100) |
annotation/info/impact_score_calc |
character | Tier and calculation formula |
annotation/info/tier |
integer | Priority tier (1-4) |
Boolean indicators under annotation/info/clnsig_flags/:
pathogenic,likely_pathogenic,uncertain_significancelikely_benign,benignpathogenic_low_penetrance,likely_pathogenic_low_penetranceestablished_risk_allele,likely_risk_allele,uncertain_risk_alleleaffects,association,drug_response,confers_sensitivityprotective,other,conflicting_interpretations_of_pathogenicity,not_provided
| Node | Description |
|---|---|
annotation/info/FunctionalAnnotation/VarInfo |
Functional consequence |
annotation/info/FunctionalAnnotation/genecode_comprehensive_info |
Gene information |
annotation/info/FunctionalAnnotation/clnsig |
ClinVar clinical significance |
annotation/info/FunctionalAnnotation/clndn |
ClinVar disease name |
annotation/info/FunctionalAnnotation/bravo_af |
Bravo allele frequency |
annotation/info/FunctionalAnnotation/apc_protein_function_v3 |
Protein function score |
dx run step1_vcf_merge \
-ivcfs=sample1.vcf.gz \
-ivcfs=sample2.vcf.gz \
-o merged_outputdx run vcf2gds \
-ivcf_file=merged_chr1.vcf.gz \
-igds_filename=merged_chr1.gdsdx run favorannotator \
-igds=merged_chr1.gds \
-ichromosome=1 \
-iuse_compression=TRUEdx run step4_impact_prioritization \
-igenelist=GeneList.txt \
-igds_files=favor_merged_chr1.gds \
-igds_files=favor_merged_chr2.gdsFor Step 4 local execution:
cd step4_impact_prioritization/resources/home/dnanexus/
Rscript IMPACT-prioritization.r --genelist GeneList.txt --outprefix anno_merged_The output *_SNV_IMPACT.gds files are designed for direct use with IMPACT-VIS, an interactive R Shiny application for variant visualization.
- Place output files in the IMPACT-VIS
app/data/directory - Follow the naming convention:
{sample_id}_SNV_IMPACT.gds - See the IMPACT-VIS Data Preparation Guide for details
- Interactive visualization of prioritized variants
- Filtering by tier, ClinVar significance, allele frequency
- Gene-based and genomic region filtering
- Publication-ready plots with Plotly
- Persistent annotation states for sample curation
Variants are assigned to tiers based on evidence strength:
| Tier | Criteria | Score Formula |
|---|---|---|
| Tier 1 | Pathogenic/Likely Pathogenic in ClinVar + gene match | 80 + 20 × globalScore |
| Tier 2 | Frameshift, stopgain mutations | 60 + 40 × globalScore |
| Tier 3 | Nonsynonymous, nonframeshift, stoploss | 20 + 80 × globalScore |
| Tier 4 | APC protein function evidence | 100 × (0.5 × APC + 0.5 × globalScore) |
The GeneList.txt file should contain phenotype-specific gene associations from Open Targets:
symbol globalScore
GJB2 0.858985237
OTOF 0.850795742
MYO6 0.845566359
...
Generate this file by:
- Querying Open Targets for your phenotype of interest
- Exporting gene associations with global scores
- Formatting as tab-separated with header row
- SeqArray - Efficient storage of sequence data
- FAVOR - Functional Annotation of Variants Online Resource
- Open Targets - Gene-disease association platform
- DNAnexus - Cloud-based genomics platform
- IMPACT-VIS - Interactive visualization
- IMPACT-SV - Structural variant processing
- IMPACT-CNV - Copy number variant processing
If you use this pipeline, please cite:
@article{impact-TBA,
title={Integrated Mapping of Phenotype-Associated Candidate Targets for
interpretation and prioritization of genomic variants},
authors={Boehler, N. and Cheng, H. Y. M.},
journal={TBA},
year={TBA}
}Zhou H., et al. (2023). FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research, 51(D1), D1300-D1311. DOI: 10.1093/nar/gkac966
This project is licensed under the terms specified in the LICENSE file.
- Issues: Report bugs or request features on the GitHub Issues page
- Documentation: See individual step READMEs for detailed usage
- IMPACT-VIS Help: Visit the IMPACT-VIS documentation