Despite glycosylation's critical role in health and disease, the expression patterns of glycoenzymes and their impact on biological pathways are not well-understood and remain underexplored. Existing resources do not effectively link differential glycoenzyme expression to pathway dysregulation across disease states, making it difficult for researchers to uncover disease mechanisms or identify therapeutic targets related to glycosylation.
- Glycosylation is involved in key cellular processes, yet is often overlooked in genomic and pathway-level analyses.
- Glycoenzymes are potential biomarkers and drug targets, but are not widely studied due to lack of integrated resources.
- Current databases are fragmented, lacking a centralized, open-source platform that connects expression data to functional impact.
- Solving this will empower researchers and clinicians to uncover glycosylation-related disease mechanisms, aiding precision medicine efforts.
To build an open-source, interactive atlas that integrates transcriptomic data, glycoenzyme gene sets, and pathway/network analysis — enabling researchers to explore glycosylation-related dysregulation in diseases, starting with muscular dystrophy.
The GlycoEnzyme Expression Atlas project was conducted in four key phases, each building toward identifying and visualizing glycoenzyme-driven pathway dysregulation in disease states:
Identified 906 disease-associated glycosylation-related genes from databases like GlyGen. Data used:
- GLY_000623 – Human Glycogenes
- GLY_000004 – Human Glycosyltransferases
- GLY_000025 – Human Glycoside Hydrolases
Through systematic analysis of glycogene-disease associations and glycosylation sites, we identified muscular dystrophy as the most promising disease model for RNA-seq analysis. This selection was based on:
- GLY_000225 – Human Protein Glycosylation Diseases
- GLY_000230 – Human Protein Glycosylation Disease Annotations
- GLY_000308 – Human Glycogenes with Disease Associations
Disease-Glycosylation Association Analysis Selected two RNA-seq datasets related to muscular dystrophy from public repositories:
- GSE140261 – RNA-seq data for Facioscapulohumeral Muscular Dystrophy (FSHD)
- PMCID: PMC8756543 – Molecular Features of FSHD through Transcriptomic Analysis
-
Data Collection and Preprocessing
- RNA-seq count matrix preparation
- Quality control and normalization
- Metadata organization and validation
-
Differential Expression Analysis
- DESeq2/Edge R implementation for RNA-seq analysis
- Statistical testing and multiple testing correction
- Fold change and p-value calculations
- These were filtered for glycoenzymes to focus on glycosylation-related dysregulation
Validated the common genes using CFDE portal Mapped filtered glycoenzymes to biological pathways using KEGG. KEGG enrichment analysis using clusterProfiler to identify significantly enriched pathways involving glycoenzymes Cytoscape to visualize and analyze the interaction networks of glycoenzymes, helping identify key hubs and functional modules involved in disease
To make our findings accessible and actionable, we developed an R Shiny application that allows researchers to:
- Input a gene of interest to explore its role in muscular dystrophy.
- View expression patterns and differential expression status.
- Visualize KEGG pathway associations and heatmaps across patients.
- Explore the gene's network context via STRING-based interactions. This tool bridges the gap between static analysis and interactive exploration, enabling users to dive deeper into glycosylation-related dysregulation in a user-friendly interface.
- Extracted 906 glycosylation-related genes using curated datasets from GlyGen.
- Enrichment analysis of these genes revealed muscular dystrophy as a glycosylation-impacted disease.
- After RNA-seq analysis of two muscular dystrophy datasets: GSE140261 and GSE162108. We found: 152 common (between both datasets)upregulated genes AND 54 common (between both datasets) downregulated genes
- Pathway Enrichment (KEGG) analysis revealed: 56 enriched pathways in one dataset AND 39 enriched pathways in the other
- Cytoscape analysis revealed: Strong Glycoenzyme Involvement in GAG Metabolism Alteration in Chondroitin, Heparan, and Keratan Sulfate Pathways This suggests a widespread disruption in GAG metabolism, which is known to be critical for muscle structure, signaling, and regeneration. Alterations in these glycosylation-related pathways have been linked to muscle degeneration, inflammation, and impaired extracellular matrix remodeling — all hallmarks of muscular dystrophy.
Therefore, our findings strengthen the case that glycosylation, especially GAG-related pathways, plays a key role in muscular dystrophy pathogenesis.
This project demonstrates a reusable pipeline to explore glycoenzyme expression and pathway dysregulation in disease. By integrating gene expression, pathway enrichment, and interaction networks: We validated muscular dystrophy as a disease strongly linked to glycosylation dysfunction. Identified core glycoenzymes and pathways that may serve as biomarkers or therapeutic targets. Created a scalable framework that can be applied to other glycosylation-impacted diseases. Developed an interactive R Shiny tool to make the results accessible to researchers, supporting gene-centric exploration and hypothesis generation. This work highlights the importance of glycosylation in disease mechanisms and provides an open-source, FAIR-compliant approach to facilitate glyco-bioinformatics research.
Our integrated approach not only identified differentially expressed glycoenzymes, but also clearly revealed key biochemical pathways—such as GAG biosynthesis and degradation—that are significantly disrupted in muscular dystrophy. This pipeline provides a framework for validating glycosylation pathway dysregulation in any disease. By simply plugging in disease-specific expression data, the workflow can be applied to:
- Identify glycoenzyme-related expression changes
- Perform pathway enrichment
- Map functional networks
- And ultimately highlight disease-relevant glycosylation mechanisms
This demonstrates that our tool is not limited to muscular dystrophy — it can be extended to any glycosylation-impacted condition, such as:
- Cancer
- Inflammatory bowel disease
- Congenital disorders of glycosylation
- Neurodegenerative diseases
This positions the GlycoEnzyme Expression Atlas as a versatile platform for exploring glycosylation biology in precision medicine and driving future therapeutic research.
- Python 3.x
- R version ≥ 4.0.0
- Bioconductor 3.12 or higher
- Clone this repository:
# Clone the repository to your local machine
git clone https://github.com/yourusername/GlycoEnzyme-Atlas.git
cd GlycoEnzyme-Atlas- Install required dependencies:
pip install pandas numpy.
├── data/ # Input data files
├── results/ # Analysis results
│ ├── pathway/ # Pathway analysis results
│ └── www/ # Visualization assets
├── scripts/ # Analysis scripts
└── shiny/ # R Shiny application
└── app.R # Main Shiny application file
- Important: Ensure you have Python 3.x and R ≥ 4.2.0 installed
- Install required R packages:
install.packages(c("shiny", "shinydashboard", "DT", "plotly", "dplyr", "ggplot2", "pheatmap", "tidyr", "markdown", "RColorBrewer"))
- Run analysis scripts from the project root directory
- To run the Shiny app:
# From R console shiny::runApp("shiny/app.R") # Or from terminal Rscript -e "shiny::runApp('shiny/app.R')"
The interactive R Shiny application (scripts/shiny/app.R) provides several key features:
-
Overview Dashboard
- Technical architecture and workflow visualization
- Summary statistics and key findings
- Project methodology and implementation details
-
Muscular Dystrophy Analysis
- Disease-glycosylation association visualization
- FSHD-specific expression patterns
- Clinical and sample information
-
Glycogene Expression Analysis
- Interactive volcano plots
- Expression heatmaps
- Detailed gene-level statistics
-
Pathway Analysis
- KEGG pathway enrichment visualization
- Network analysis results
- Pathway-gene associations
The app is designed to be user-friendly and provides interactive visualizations of our analysis results, making it easy for researchers to explore glycosylation-related dysregulation in muscular dystrophy.
- Vlado Dancik, PhD (Team Lead) - Computational Chemical Biologist, Broad Institute
- Isha Parikh - Team Member: parikh.i@northeastern.edu
- Aymen Maqsood - Team Member: aymenmaqsood.2000@gmail.com
- Chandani Shrestha - Team Member: shrestha.chandani08@gmail.com
https://info.cfde.cloud/ https://cytoscape.org/ https://info.cfde.cloud/ https://www.cazy.org/ https://www.glygen.org/
- CFDE/GlyGen for project support
- GlyCosmos and CAZy databases
- Bioconductor project and all package maintainers
- Broad Institute for computational resources
