IMPACT-SNV

Integrated Mapping of Phenotype-Associated Candidate Targets for SNV/Indel Analysis

This repository contains the IMPACT-SNV pipeline, which processes and prioritizes single nucleotide variants (SNVs) and indels for rare disease analysis using the FAVOR database and phenotype-specific gene-disease associations.

Overview

IMPACT-SNV is part of the broader IMPACT framework for phenotype-configurable interpretation of genomic variants. This module specifically handles SNV/Indel processing and produces output files compatible with IMPACT-VIS for interactive visualization and analysis.

Pipeline Architecture

The pipeline consists of four modular steps, each available as a DNAnexus applet:

VCF Files → [Step 1: Merge] → [Step 2: VCF2GDS] → [Step 3: FAVOR Annotate] → [Step 4: Prioritize] → *_SNV_IMPACT.gds

Step	Folder	Description	Input	Output
1	`step1_vcf_merge/`	Merges multiple VCF files into chromosome-separated files	`.vcf`, `.vcf.gz`	`merged_chr*.vcf.gz`
2	`step2_vcf2gds/`	Converts VCF to GDS format for efficient processing	`.vcf.gz`	`merged_chr*.gds`
3	`step3_favorannotator-rap/`	Annotates variants using FAVOR database	`.gds`	`favor_merged_chr*.gds`
4	`step4_impact_prioritization/`	Scores and prioritizes variants by pathogenicity	`.gds`, `GeneList.txt`	`*_SNV_IMPACT.gds`

Requirements

Software Dependencies

R ≥ 4.0 with Bioconductor packages:
- SeqArray, SeqVarTools, gdsfmt
- stringi, dplyr, tidyr
Python 3 (for VCF merging scripts)
BCFtools, bgzip, tabix (for VCF processing)
Rust and xsv (for FAVOR annotation step)

Platform

These tools are designed for deployment on the DNAnexus Research Analysis Platform. Each folder contains a dxapp.json configuration file for building DNAnexus applets.

Input Requirements

Step 1: VCF Merge

Input: One or more VCF or VCF.GZ files
Output: Chromosome-separated VCF.GZ files (merged_chr1.vcf.gz, merged_chr2.vcf.gz, etc.)

Step 2: VCF to GDS

Input: VCF.GZ file from Step 1
Output: GDS file (merged_chr*.gds)

Step 3: FAVOR Annotation

Input: GDS file from Step 2
Output: Annotated GDS with FAVOR functional annotations

Step 4: IMPACT Prioritization

Input:
- Annotated GDS files from Step 3 (merged_chr*.gds)
- Gene-disease association file (GeneList.txt) - tab-separated with columns:
  - symbol: Gene symbol (e.g., GJB2, OTOF)
  - globalScore: Open Targets association score (0-1)
Output: Per-sample GDS files ({sample_id}_SNV_IMPACT.gds)

Output File Format

The final *_SNV_IMPACT.gds files contain:

Core SeqArray Nodes

Node	Type	Description
`variant.id`	integer	Unique variant identifier
`chromosome`	integer/character	Chromosome
`position`	integer	1-based genomic position
`sample.id`	character	Sample identifiers
`genotype`	integer	Genotype array

IMPACT Score Annotations

Node	Type	Description
`annotation/info/impact_score`	numeric	Pathogenicity score (0-100)
`annotation/info/impact_score_calc`	character	Tier and calculation formula
`annotation/info/tier`	integer	Priority tier (1-4)

ClinVar Significance Flags

Boolean indicators under annotation/info/clnsig_flags/:

pathogenic, likely_pathogenic, uncertain_significance
likely_benign, benign
pathogenic_low_penetrance, likely_pathogenic_low_penetrance
established_risk_allele, likely_risk_allele, uncertain_risk_allele
affects, association, drug_response, confers_sensitivity
protective, other, conflicting_interpretations_of_pathogenicity, not_provided

FAVOR Functional Annotations

Node	Description
`annotation/info/FunctionalAnnotation/VarInfo`	Functional consequence
`annotation/info/FunctionalAnnotation/genecode_comprehensive_info`	Gene information
`annotation/info/FunctionalAnnotation/clnsig`	ClinVar clinical significance
`annotation/info/FunctionalAnnotation/clndn`	ClinVar disease name
`annotation/info/FunctionalAnnotation/bravo_af`	Bravo allele frequency
`annotation/info/FunctionalAnnotation/apc_protein_function_v3`	Protein function score

Usage

DNAnexus Platform

Step 1: VCF Merge

dx run step1_vcf_merge \
  -ivcfs=sample1.vcf.gz \
  -ivcfs=sample2.vcf.gz \
  -o merged_output

Step 2: VCF to GDS Conversion

dx run vcf2gds \
  -ivcf_file=merged_chr1.vcf.gz \
  -igds_filename=merged_chr1.gds

Step 3: Functional Annotation

dx run favorannotator \
  -igds=merged_chr1.gds \
  -ichromosome=1 \
  -iuse_compression=TRUE

Step 4: Variant Prioritization

dx run step4_impact_prioritization \
  -igenelist=GeneList.txt \
  -igds_files=favor_merged_chr1.gds \
  -igds_files=favor_merged_chr2.gds

Local Execution

For Step 4 local execution:

cd step4_impact_prioritization/resources/home/dnanexus/
Rscript IMPACT-prioritization.r --genelist GeneList.txt --outprefix anno_merged_

Integration with IMPACT-VIS

The output *_SNV_IMPACT.gds files are designed for direct use with IMPACT-VIS, an interactive R Shiny application for variant visualization.

Preparing Data for IMPACT-VIS

Place output files in the IMPACT-VIS app/data/ directory
Follow the naming convention: {sample_id}_SNV_IMPACT.gds
See the IMPACT-VIS Data Preparation Guide for details

IMPACT-VIS Capabilities

Interactive visualization of prioritized variants
Filtering by tier, ClinVar significance, allele frequency
Gene-based and genomic region filtering
Publication-ready plots with Plotly
Persistent annotation states for sample curation

Tiering System

Variants are assigned to tiers based on evidence strength:

Tier	Criteria	Score Formula
Tier 1	Pathogenic/Likely Pathogenic in ClinVar + gene match	80 + 20 × globalScore
Tier 2	Frameshift, stopgain mutations	60 + 40 × globalScore
Tier 3	Nonsynonymous, nonframeshift, stoploss	20 + 80 × globalScore
Tier 4	APC protein function evidence	100 × (0.5 × APC + 0.5 × globalScore)

Gene-Disease Association File

The GeneList.txt file should contain phenotype-specific gene associations from Open Targets:

symbol	globalScore
GJB2	0.858985237
OTOF	0.850795742
MYO6	0.845566359
...

Generate this file by:

Querying Open Targets for your phenotype of interest
Exporting gene associations with global scores
Formatting as tab-separated with header row

References

Tools and Databases

SeqArray - Efficient storage of sequence data
FAVOR - Functional Annotation of Variants Online Resource
Open Targets - Gene-disease association platform
DNAnexus - Cloud-based genomics platform

Related IMPACT Modules

IMPACT-VIS - Interactive visualization
IMPACT-SV - Structural variant processing
IMPACT-CNV - Copy number variant processing

Citation

If you use this pipeline, please cite:

@article{impact-TBA,
  title={Integrated Mapping of Phenotype-Associated Candidate Targets for 
         interpretation and prioritization of genomic variants},
  authors={Boehler, N. and Cheng, H. Y. M.},
  journal={TBA},
  year={TBA}
}

FAVOR Citation

Zhou H., et al. (2023). FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research, 51(D1), D1300-D1311. DOI: 10.1093/nar/gkac966

License

This project is licensed under the terms specified in the LICENSE file.

Support

Issues: Report bugs or request features on the GitHub Issues page
Documentation: See individual step READMEs for detailed usage
IMPACT-VIS Help: Visit the IMPACT-VIS documentation

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Step3_favorannotator-rap		Step3_favorannotator-rap
step1_vcf_merge		step1_vcf_merge
step2_vcf2gds		step2_vcf2gds
step4_impact_prioritization		step4_impact_prioritization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

IMPACT-SNV

Overview

Pipeline Architecture

Requirements

Software Dependencies

Platform

Input Requirements

Step 1: VCF Merge

Step 2: VCF to GDS

Step 3: FAVOR Annotation

Step 4: IMPACT Prioritization

Output File Format

Core SeqArray Nodes

IMPACT Score Annotations

ClinVar Significance Flags

FAVOR Functional Annotations

Usage

DNAnexus Platform

Step 1: VCF Merge

Step 2: VCF to GDS Conversion

Step 3: Functional Annotation

Step 4: Variant Prioritization

Local Execution

Integration with IMPACT-VIS

Preparing Data for IMPACT-VIS

IMPACT-VIS Capabilities

Tiering System

Gene-Disease Association File

References

Tools and Databases

Related IMPACT Modules

Citation

FAVOR Citation

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages