Skip to content

bsphinney/DE-LIMP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

552 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DE-LIMP Logo

DE-LIMP: Differential Expression & Limpa Proteomics

Find which proteins are significantly different between your experimental conditions -- upload a DIA-NN output file and get interactive volcano plots, heatmaps, pathway enrichment, and AI-powered interpretation, all without writing code.

Built on R Shiny with the limpa pipeline for normalization and protein quantification, and limma for statistical testing with FDR correction. See USER_GUIDE.md for methodology details.

Input: DIA-NN report.parquet | Not for: DDA data, TMT/iTRAQ, Spectronaut/MaxQuant output

Not sure if your data is DIA? If your core facility used DIA-NN to process your samples, you have DIA data. Look for a report.parquet file in your results folder. If your data was processed with MaxQuant, Spectronaut, or Proteome Discoverer, or if you used isobaric labels (TMT, iTRAQ), DE-LIMP is not the right tool.


Try it now: huggingface.co/spaces/brettsp/de-limp-proteomics -- no installation required

Project Website: bsphinney.github.io/DE-LIMP | Docs: USER_GUIDE.md | CLAUDE.md


What's New in v3.7.0

NCBI Proteome Download -- Search and download RefSeq protein FASTA databases from NCBI Datasets, with automatic gene symbol mapping via E-utilities. Supports all organisms with NCBI reference proteomes, complementing the existing UniProt download for non-model organisms.

Contaminant Analysis -- New subtab in Data Overview with summary cards (contaminant count, % of total, median intensity ratio, keratin count), per-sample stacked bar chart, top contaminants table with keratin flagging, and contaminant heatmap. Signal Distribution and Expression Grid also highlight contaminants.

Data Explorer -- Quartile-based abundance profiles and sample-sample scatter plots for exploring data without requiring DE analysis. Variable proteins that shift 2+ quartiles across samples are flagged. Works with no-replicates mode.

SSH File Browser -- Visual directory browser for remote HPC navigation. Clickable breadcrumbs, color-coded entries, file type filtering. Replaces manual path entry for raw data and FASTA directories.

Load from HPC -- One-click button to download and analyze completed search results from the cluster via the SSH file browser.

Docker Launcher for Windows -- One-click batch file (Launch_DE-LIMP_Docker.bat) handles SSH key detection, shared PC accounts, container startup, and browser launch. Docker + SSH to HPC is now the recommended Windows deployment.

No-Replicates Mode -- Quantification completes normally with n=1 per group (normalization, protein aggregation, PCA, Expression Grid). DE analysis is gracefully skipped with an informational message.

SSH Auto-Connect & Environment Badge -- Auto-connects to HPC on startup when an SSH key is detected. Colored navbar badge shows deployment mode (Docker/HPC/Local/HF).

Previous highlights: v3.5 Run Comparator, Search & Analysis History, Chromatography QC, smart HPC partitions. v3.1 UI overhaul, Core Facility Mode. v3.0 MOFA2, Docker search, phosphoproteomics, GSEA.

See CHANGELOG.md for full release history.


Key Features

Analysis & Visualization

  • Volcano Plots -- Interactive (Plotly), click or box-select proteins to highlight across all views; all pairwise contrasts available
  • Heatmaps -- Z-score heatmaps of selected or significant proteins (ComplexHeatmap)
  • Contaminant Analysis -- Summary cards, per-sample stacked bar chart, top contaminants table with keratin flagging, and contaminant heatmap; Signal Distribution and Expression Grid also highlight contaminants
  • Data Explorer -- Quartile-based abundance profiles and sample-sample scatter plots for exploring data without DE analysis
  • QC Sample Metrics -- Faceted trend plot (Precursors, Proteins, MS1 Signal, Data Completeness) with LOESS smoother for drift detection and group average lines
  • MDS & DPC Plots -- Sample clustering and normalization diagnostics
  • Covariates -- Include batch, sex, diet, or custom covariates in the linear model
  • XIC Chromatogram Viewer -- Fragment-level chromatogram validation, MS2 intensity alignment (Spectronaut-style), ion mobility/mobilogram support for timsTOF, DIA-NN v1/v2 formats (local/HPC only)
  • CV Analysis (Robust Changes) -- Identify highly reproducible DE proteins via coefficient of variation analysis across replicates

Phosphoproteomics

  • Auto-detection of phospho-enriched data on upload (scans for UniMod:21 in Modified.Sequence)
  • Phosphosite-level DE via limma (independent from protein-level analysis); supports DIA-NN site_matrix_*.parquet or parsed from report.parquet
  • KSEA (Kinase-Substrate Enrichment Analysis) -- infer upstream kinase activity from phosphosite fold-changes using PhosphoSitePlus + NetworKIN databases
  • Motif analysis -- sequence logos (ggseqlogo) of flanking residues around regulated phosphosites
  • Abundance correction -- subtract protein-level logFC from site logFC to isolate phosphorylation stoichiometry changes

Gene Set Enrichment & Multi-Omics

  • GSEA -- GO (BP/MF/CC) and KEGG pathways via clusterProfiler; per-ontology caching; automatic organism detection (12 species via UniProt REST API or protein ID suffix)
  • MOFA2 (Multi-Omics Factor Analysis) -- unsupervised integration of 2-6 data views (e.g., proteomics + phosphoproteomics + transcriptomics). Import from RDS, CSV, TSV, or Parquet. Variance explained heatmap, factor weights, sample scores, Factor-DE correlation. Built-in example datasets (Mouse Brain, TCGA Breast Cancer)

AI-Powered Analysis (Google Gemini)

Requires a free Gemini API key. Get one at Google AI Studio and paste it into the DE-LIMP sidebar.

  • AI Summary -- Analyzes all contrasts simultaneously, identifying top DE proteins per comparison, cross-comparison biomarkers, and CV-based stability metrics. AI Summary sends only summary statistics (protein names, logFC, adj.P.Val); Data Chat sends per-sample expression data for top DE proteins to enable interactive Q&A
  • Export for Claude -- Download your complete analysis as a .zip optimized for deep analysis with Claude, ChatGPT, or other AI assistants (includes DE results, expression matrix, QC metrics, GSEA, methods text, and more)
  • AI Summary HTML Export -- Styled standalone HTML report with gradient header and markdown formatting, suitable for sharing with collaborators
  • Interactive Data Chat -- Conversational interface with Google Gemini, auto-injecting QC stats and 100-800 top DE proteins as context. Phospho context (top 20 sites + KSEA kinase results) auto-included when phospho analysis is active
  • Interactive AI + plot connection -- Select proteins in volcano/table to set AI context; AI can highlight proteins in plots via [[SELECT: protein1; protein2]] syntax
  • Auto-Analyze button for one-click dataset analysis; Save Chat to download conversation as plain text
  • Auto-generated methodology text for methods sections

Run Comparator

  • Cross-tool comparison -- Compare your DE-LIMP analysis against a second DE-LIMP run, Spectronaut export, or FragPipe output to understand how tool choice affects your results
  • 4 diagnostic layers -- Settings Diff (parameter-by-parameter comparison), Protein Universe (overlap analysis), Quantification (log2 intensity correlation, per-sample concordance, systematic bias detection), DE Concordance (3x3 Up/Down/NS matrix, volcano overlay, discordant protein table)
  • 7-rule hypothesis engine -- For each discordant protein, assigns a tool-aware hypothesis explaining why the tools disagree (direction reversal, normalization offset, variance estimation, missing values, peptide count, FC magnitude, or borderline significance)
  • Optional DIA-NN log upload -- Enrich Mode A comparisons with search-derived parameters (pg-level quantification, proteoforms, library precursor counts, pipeline step)
  • Optional MOFA2 decomposition -- Treats the two runs as views and decomposes joint variance to find hidden patterns among discordant proteins
  • AI integration -- Tool-aware Gemini prompt and Claude ZIP export for deeper analysis

Chromatography QC

  • Pre-search quality check -- Extract TIC traces from timsTOF .d files before committing to hours-long DIA-NN searches
  • Three views -- Faceted panels (per-run with median overlay), Overlay (all runs normalized 0-1 on one axis), Metrics (AUC bar chart + diagnostics table)
  • Automated diagnostics -- Shape deviation (Pearson r vs median trace), RT shift, loading anomaly (AUC outlier), file size outlier, late elution, elevated baseline, narrow gradient
  • SSH support -- SCP downloads analysis.tdf from remote .d directories, extracts locally

DIA-NN Search Integration

  • Three backends -- Local, Docker, and HPC (SSH/SLURM)
  • Parallel 5-step SLURM pipeline -- Optimized search with dependency chaining and array jobs for maximum HPC throughput
  • SSH file browser -- Visual directory browser for navigating remote HPC filesystems with clickable breadcrumbs, color-coded entries, and file type filtering
  • SSH auto-connect -- Automatically connects to HPC on startup when an SSH key is detected; environment badge shows deployment mode
  • UniProt FASTA download -- Search and download proteome databases directly; 6 bundled contaminant libraries
  • NCBI proteome download -- Download RefSeq protein FASTA from NCBI Datasets with automatic gene symbol mapping for non-model organisms
  • Load from HPC -- One-click button to browse, download, and analyze completed search results from the cluster
  • Spectral library caching -- Reuse predicted libraries across searches to save compute time
  • Custom FASTA sequences -- Add custom protein sequences inline when submitting searches
  • Smart partition selection -- Detects per-user SLURM CPU limits, auto-switches to public queue when at capacity
  • FASTA database library -- Shared catalog with auto-upload to HPC, fragment m/z range tracking, path validation
  • Cluster resource indicator -- Real-time HPC CPU usage monitoring with traffic-light display (green/yellow/red)
  • Windows Docker launcher -- One-click .bat file runs DE-LIMP + DIA-NN with zero R installation, shared PC support (guide)
  • Non-blocking job queue -- Submit multiple searches, results auto-load on completion
  • Phospho mode -- Auto-configures DIA-NN for phospho analysis (STY modification, --phospho-output)
  • Organized search logs -- SLURM .out/.err and local .log files written to {output_dir}/logs/

DIA-NN License: DIA-NN is developed by Vadim Demichev and is free for academic/non-commercial use. It is not open source and cannot be redistributed. DE-LIMP does not bundle DIA-NN. See the DIA-NN license.

Core Facility Mode (Optional)

  • Staff YAML profiles auto-fill SSH, SLURM, and instrument settings
  • SQLite job tracking with searchable history (6 filters), one-click result loading and report generation
  • Instrument QC dashboard with protein/precursor/TIC trends and control lines
  • Quarto HTML reports with QC bracket, volcanos, DE stats, and top proteins

Activated by setting DELIMP_CORE_DIR. Not visible on standard installations.

Session Management & History

  • Unified activity log -- Single audit trail for all DIA-NN searches and pipeline runs, with remote activity log via SSH for multi-user visibility
  • Search History -- Full audit trail for every DIA-NN search (26 parameters). Import Settings to reuse parameters; Import Results to load completed search output directly. View Log shows search metadata. Cross-reference links to Analysis History.
  • Analysis History & Projects -- Track every pipeline run with expandable detail rows. Assign analyses to projects for organized grouping with summary cards.
  • About tab -- Community stats dashboard with GitHub stars, forks, visitors, and clones (14-day trend sparklines), GitHub Discussions feed, version info, and project links
  • No-replicates mode -- Quantification without DE for n=1 experiments; PCA, Expression Grid, and Data Explorer still available
  • Save/load full analysis state as .rds; export reproducibility R code log
  • One-click example data (Affinisep vs Evosep comparison)
  • Group assignment templates (CSV export/import)
  • Embedded proteomics resources, UC Davis Proteomics videos, short course links

Which Installation Should I Use?

Platform Method DIA-NN Search? Guide
Any (just exploring) Web browser No Hugging Face
Windows Docker + SSH to HPC Yes (via HPC) WINDOWS_DOCKER_INSTALL.md
Mac / Linux R/RStudio (native) Via HPC or Docker See Installation below
HPC cluster Apptainer/Singularity Via SLURM HPC_DEPLOYMENT.md

Installation

Requirements: R 4.5+ (for limpa), Bioconductor 3.22+ (auto-configured with R 4.5+)

git clone https://github.com/bsphinney/DE-LIMP.git
cd DE-LIMP
shiny::runApp('.', port=3838, launch.browser=TRUE)

All dependencies install automatically on first run:

# Core: shiny, bslib, plotly, DT, rhandsontable, shinyjs
# Data: dplyr, tidyr, stringr, readr, arrow
# Stats: limpa, limma, ComplexHeatmap, clusterProfiler
#        org.Hs.eg.db, org.Mm.eg.db, AnnotationDbi
#        KSEAapp, ggseqlogo, MOFA2, basilisk, callr
# Viz:  ggplot2, ggrepel, ggridges, enrichplot
# AI:   httr2, curl

Usage

  1. Load Data -- Upload a DIA-NN report.parquet output file, or click "Load Example Data" for a demo HeLa dataset
  2. Assign Groups & Run -- Auto-guess groups from filenames or manually assign; optionally add covariates (batch, etc.); click "Run Pipeline" to execute DPC-CN normalization, DPC-Quant protein quantification, and limma DE
  3. Explore Results -- Data Overview, QC, DE Dashboard (Volcano/Table/PCA/CV Analysis), Phospho, GSEA, MOFA2, AI Analysis, XIC Viewer (local/HPC)
  4. Export -- Download reproducibility log (.R), save session (.rds), export tables and plots

Methodology

Step Method
Normalization Data Point Correspondence - Cyclic Normalization (DPC-CN) via limpa::dpcCN()
Quantification DPC-Quant (Detection Probability Curve Quantification): precursor-to-protein rollup via probabilistic missing-value modelling, via limpa::dpcQuant()
DE model Linear model fit via limpa::dpcDE() + limma::contrasts.fit()
Moderation Empirical Bayes moderated t-statistics via limma::eBayes()
FDR Benjamini-Hochberg adjusted p-values
Phospho DE Same limma pipeline at the phosphosite level (independent from protein-level)

Key Citations:


Resources


License

This project is open source. See repository for license details.

Contributing

Issues, pull requests, and Discussions welcome! See CLAUDE.md for development documentation.

Developer: Brett Phinney, UC Davis Proteomics Core Facility | Contact: GitHub Issues

Example Data

Demo dataset: Affinisep vs Evosep SPE column comparison using 50 ng Thermo HeLa protein digest standard (DIA, Orbitrap). Available at github.com/bsphinney/DE-LIMP/releases.

About

LIMPA DAVIS Proteomics Pipeline

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors