Skip to content

mdabrarfaiyaj/dengue-variant-tracker-prototype

Repository files navigation

🦟 Dengue Variant Tracker – Prototype (Reference Strains)

Prototype version – Demonstrates the full pipeline using global reference dengue genomes.

Real Bangladeshi variant version (recommended for portfolio showcase):
β†’ https://github.com/mdabrarfaiyaj/bangladesh-dengue-variant-tracker

Current dataset: Reference strains from NCBI RefSeq

  • DENV-1: NC_001477.1
  • DENV-2: NC_001474.2
  • Used for testing, learning, and debugging the pipeline

🎯 Objectives

  • Fetch and process public dengue virus sequences
  • Perform quality control on genomic data
  • Identify mutation patterns and motifs in viral genomes
  • Visualize variant patterns through an interactive dashboard
  • Demonstrate end-to-end bioinformatics workflow

πŸ”¬ Real-World Impact

This dashboard addresses practical public health needs:

  • Monitoring vaccine escape mutations
  • Tracking dengue serotype evolution
  • Supporting local health surveillance efforts in Dhaka and Bangladesh

πŸ“Š Data Source

  • Source: NCBI Virus Database (public, open-access data)
  • Virus: Dengue virus (DENV) - all serotypes
  • Data Type: Nucleotide sequences in FASTA format
  • Dataset Size: ~10-50 sequences (optimized for low RAM environments)
  • Citation: NCBI Virus Database. https://www.ncbi.nlm.nih.gov/labs/virus/

πŸ”§ Technical Stack

  • Language: R (4.0+)
  • Core Packages:
    • Bioconductor: Biostrings, ShortRead, BSgenome
    • Data Processing: dplyr, tidyr
    • Visualization: ggplot2, plotly
    • Dashboard: shiny, shinydashboard
  • Automation: Bash shell scripting
  • Version Control: Git/GitHub
  • Deployment: shinyapps.io (free tier)

πŸš€ Quick Start

Prerequisites

# Install R packages
install.packages(c("shiny", "shinydashboard", "ggplot2", "dplyr", "tidyr", "plotly", "DT"))

# Install Bioconductor
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("Biostrings", "ShortRead", "BSgenome"))

Setup & Run

# 1. Clone repository
git clone <your-repo-url>
cd viral_tracker_dashboard

# 2. Download data (requires internet)
chmod +x download_data.sh
./download_data.sh

# 3. Run quality control and analysis
Rscript qc_analysis.R

# 4. Launch dashboard
Rscript -e "shiny::runApp('app.R')"

πŸ“ Project Structure

viral_tracker_dashboard/
β”œβ”€β”€ README.md                 # Project documentation
β”œβ”€β”€ download_data.sh          # Data acquisition script
β”œβ”€β”€ qc_analysis.R            # Quality control and motif analysis
β”œβ”€β”€ app.R                    # Shiny dashboard application
β”œβ”€β”€ data/                    # Data directory (gitignored)
β”‚   β”œβ”€β”€ raw/                 # Raw downloaded sequences
β”‚   └── processed/           # Processed analysis results
β”œβ”€β”€ plots/                   # Generated visualizations
β”œβ”€β”€ utils/                   # Helper functions
β”‚   └── analysis_helpers.R   # Reusable analysis functions
└── docs/                    # Additional documentation
    └── methodology.md       # Detailed methods

🧬 Key Features

1. Data Processing Pipeline

  • Automated sequence downloading from NCBI
  • Quality filtering (sequence length, ambiguous bases)
  • Sequence alignment and motif detection

2. Analysis Capabilities

  • Motif Scanning: Identifies known dengue mutation hotspots
  • Variant Statistics: Calculates mutation frequencies
  • Sequence Composition: GC content, codon usage
  • Phylogenetic Markers: Serotype-specific signatures

3. Interactive Dashboard

  • Overview Tab: Dataset summary statistics
  • Motif Explorer: Search custom or predefined motifs
  • Variant Visualization: Interactive plots of mutation patterns
  • Data Table: Browse and filter sequence information

πŸ§ͺ Analysis Workflow

Step 1: Data Acquisition

./download_data.sh
# Downloads dengue sequences from NCBI (limited to 50 for low RAM)

Step 2: Quality Control

source("qc_analysis.R")
# - Loads sequences
# - Filters by length (>500 bp)
# - Removes sequences with >5% ambiguous bases
# - Generates QC report

Step 3: Motif Analysis

# Searches for known dengue motifs:
# - ATG: Start codons
# - GAC: Common mutation site in E protein
# - AATAAA: Poly-A signal regions
# - Custom user-defined patterns

Step 4: Dashboard Launch

shiny::runApp("app.R")
# Opens interactive dashboard in browser

πŸ”’ Ethical Considerations

  • Public Data Only: Uses exclusively open-access, anonymized viral sequences
  • No Personal Health Information: Complies with data privacy regulations
  • Proper Attribution: All data sources are cited
  • Reproducibility: Complete code and methodology shared openly

πŸ’Ύ Low-RAM Optimization

This project is optimized for systems with 4GB RAM:

  • Processes sequences in small batches
  • Limits dataset to 10-50 sequences
  • Uses efficient Bioconductor data structures
  • Includes memory cleanup (gc()) after intensive operations
  • Alternative: Use Google Colab for R sessions (free cloud computing)

πŸ“ˆ Results & Outputs

  • motif_matches.csv: Detected motif positions and frequencies
  • qc_summary.csv: Quality control metrics
  • variant_plots.png: Static visualizations for reports
  • Interactive Dashboard: Real-time exploration via Shiny

🌐 Deployment

# Deploy to shinyapps.io (requires free account)
library(rsconnect)
rsconnect::setAccountInfo(name="<ACCOUNT>", token="<TOKEN>", secret="<SECRET>")
rsconnect::deployApp()

🀝 Contributing

Suggestions and improvements welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request with clear description

πŸ“š References & Resources

πŸ“ License

MIT License - Free to use with attribution

πŸ‘€ Author

πŸ† Skills Demonstrated

  • Bioinformatics data analysis (Bioconductor/R)
  • Shell scripting automation
  • Interactive data visualization (Shiny)
  • Version control (Git)
  • Public health data interpretation
  • Low-resource computing optimization

This project was developed as part of a bioinformatics portfolio showcasing real-world genomic data analysis skills applicable to tropical disease surveillance.


Note: This prototype uses reference sequences. For Bangladeshi outbreak strains see the dedicated repository.

About

Prototype version: General dengue motif & variant tracker using reference strains (NC_001477.1 Dengue Virus 1, complete genome, NC_001474.2, Dengue Virus 2,complete genome)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors