Prototype version β Demonstrates the full pipeline using global reference dengue genomes.
Real Bangladeshi variant version (recommended for portfolio showcase):
β https://github.com/mdabrarfaiyaj/bangladesh-dengue-variant-tracker
Current dataset: Reference strains from NCBI RefSeq
- DENV-1: NC_001477.1
- DENV-2: NC_001474.2
- Used for testing, learning, and debugging the pipeline
- Fetch and process public dengue virus sequences
- Perform quality control on genomic data
- Identify mutation patterns and motifs in viral genomes
- Visualize variant patterns through an interactive dashboard
- Demonstrate end-to-end bioinformatics workflow
This dashboard addresses practical public health needs:
- Monitoring vaccine escape mutations
- Tracking dengue serotype evolution
- Supporting local health surveillance efforts in Dhaka and Bangladesh
- Source: NCBI Virus Database (public, open-access data)
- Virus: Dengue virus (DENV) - all serotypes
- Data Type: Nucleotide sequences in FASTA format
- Dataset Size: ~10-50 sequences (optimized for low RAM environments)
- Citation: NCBI Virus Database. https://www.ncbi.nlm.nih.gov/labs/virus/
- Language: R (4.0+)
- Core Packages:
- Bioconductor:
Biostrings,ShortRead,BSgenome - Data Processing:
dplyr,tidyr - Visualization:
ggplot2,plotly - Dashboard:
shiny,shinydashboard
- Bioconductor:
- Automation: Bash shell scripting
- Version Control: Git/GitHub
- Deployment: shinyapps.io (free tier)
# Install R packages
install.packages(c("shiny", "shinydashboard", "ggplot2", "dplyr", "tidyr", "plotly", "DT"))
# Install Bioconductor
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("Biostrings", "ShortRead", "BSgenome"))# 1. Clone repository
git clone <your-repo-url>
cd viral_tracker_dashboard
# 2. Download data (requires internet)
chmod +x download_data.sh
./download_data.sh
# 3. Run quality control and analysis
Rscript qc_analysis.R
# 4. Launch dashboard
Rscript -e "shiny::runApp('app.R')"viral_tracker_dashboard/
βββ README.md # Project documentation
βββ download_data.sh # Data acquisition script
βββ qc_analysis.R # Quality control and motif analysis
βββ app.R # Shiny dashboard application
βββ data/ # Data directory (gitignored)
β βββ raw/ # Raw downloaded sequences
β βββ processed/ # Processed analysis results
βββ plots/ # Generated visualizations
βββ utils/ # Helper functions
β βββ analysis_helpers.R # Reusable analysis functions
βββ docs/ # Additional documentation
βββ methodology.md # Detailed methods
- Automated sequence downloading from NCBI
- Quality filtering (sequence length, ambiguous bases)
- Sequence alignment and motif detection
- Motif Scanning: Identifies known dengue mutation hotspots
- Variant Statistics: Calculates mutation frequencies
- Sequence Composition: GC content, codon usage
- Phylogenetic Markers: Serotype-specific signatures
- Overview Tab: Dataset summary statistics
- Motif Explorer: Search custom or predefined motifs
- Variant Visualization: Interactive plots of mutation patterns
- Data Table: Browse and filter sequence information
./download_data.sh
# Downloads dengue sequences from NCBI (limited to 50 for low RAM)source("qc_analysis.R")
# - Loads sequences
# - Filters by length (>500 bp)
# - Removes sequences with >5% ambiguous bases
# - Generates QC report# Searches for known dengue motifs:
# - ATG: Start codons
# - GAC: Common mutation site in E protein
# - AATAAA: Poly-A signal regions
# - Custom user-defined patternsshiny::runApp("app.R")
# Opens interactive dashboard in browser- Public Data Only: Uses exclusively open-access, anonymized viral sequences
- No Personal Health Information: Complies with data privacy regulations
- Proper Attribution: All data sources are cited
- Reproducibility: Complete code and methodology shared openly
This project is optimized for systems with 4GB RAM:
- Processes sequences in small batches
- Limits dataset to 10-50 sequences
- Uses efficient Bioconductor data structures
- Includes memory cleanup (
gc()) after intensive operations - Alternative: Use Google Colab for R sessions (free cloud computing)
- motif_matches.csv: Detected motif positions and frequencies
- qc_summary.csv: Quality control metrics
- variant_plots.png: Static visualizations for reports
- Interactive Dashboard: Real-time exploration via Shiny
# Deploy to shinyapps.io (requires free account)
library(rsconnect)
rsconnect::setAccountInfo(name="<ACCOUNT>", token="<TOKEN>", secret="<SECRET>")
rsconnect::deployApp()Suggestions and improvements welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request with clear description
- NCBI Virus Database: https://www.ncbi.nlm.nih.gov/labs/virus/
- Bioconductor: https://bioconductor.org/
- Dengue WHO Fact Sheet: https://www.who.int/news-room/fact-sheets/detail/dengue-and-severe-dengue
- R Shiny: https://shiny.posit.co/
MIT License - Free to use with attribution
- LinkedIn: [https://www.linkedin.com/in/md-abrar-faiyaj-559246381/]
- GitHub: [https://github.com/mdabrarfaiyaj]
- Email: faiyaj.mdabrar@gmail.com
- Bioinformatics data analysis (Bioconductor/R)
- Shell scripting automation
- Interactive data visualization (Shiny)
- Version control (Git)
- Public health data interpretation
- Low-resource computing optimization
This project was developed as part of a bioinformatics portfolio showcasing real-world genomic data analysis skills applicable to tropical disease surveillance.
Note: This prototype uses reference sequences. For Bangladeshi outbreak strains see the dedicated repository.