Sage: Decoy-Free Edition (EXPERIMENTAL)

Note: This is a fork of the original Sage search engine by Michael Lazear. This version includes a new feature for decoy-free analysis.

Decoy-Free Search Mode

This fork implements a complete workflow for False Discovery Rate (FDR) estimation without requiring a traditional decoy database. This is achieved by building a statistical model (Gumbel distribution) from lower-scoring peptide-spectrum matches.

The primary benefit of this mode is increased sensitivity and statistical power for ultra-low-input proteomics experiments, such as single-cell analysis, where every identified peptide is critical.

Implementation Details:

Decoy-free mode is automatically activated when the user sets "generate_decoys": false and the "decoy_tag": "" is empty in the parameters.
When activated, "report_psms" is enforced to be at least 10. This ensures a stable null distribution can be built from the high-rank PSMs required for the statistical model.

References

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics Dominik Madej and Henry Lam Journal of Proteome Research 2023 22 (4), 1159-1171 https://doi.org/10.1021/acs.jproteome.2c00604 https://pubs.acs.org/doi/10.1021/acs.jproteome.2c00604 https://github.com/dommad/pylord

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics Yisu Peng, Shantanu Jain, Yong Fuga Li, Michal Greguš, Alexander R. Ivanov, Olga Vitek, Predrag Radivojac Bioinformatics, Volume 36, Issue Supplement_2, December 2020, Pages i745–i753 https://doi.org/10.1093/bioinformatics/btaa807 https://academic.oup.com/bioinformatics/article/36/Supplement_2/i745/6055912 https://github.com/shawn-peng/DecoyFree-MSFDR

A Decoy-Free Approach to the Identification of Peptides Giulia Gonnelli, Michiel Stock, Jan Verwaeren, Davy Maddelein, Bernard De Baets, Lennart Martens, and Sven Degroeve Journal of Proteome Research 2015 14 (4), 1792-1798 https://doi.org/10.1021/pr501164r https://pubs.acs.org/doi/10.1021/pr501164r https://bio.tools/nokoi

Decoy-free protein-level false discovery rate estimation Ben Teng, Ting Huang, Zengyou He Bioinformatics, Volume 30, Issue 5, March 2014, Pages 675–681 https://doi.org/10.1093/bioinformatics/btt431 https://academic.oup.com/bioinformatics/article/30/5/675/244620

Sage: proteomics searching so fast it seems like magic

For more information please read the online documentation!

Introduction

Sage is, at it's core, a proteomics database search engine - a tool that transforms raw mass spectra from proteomics experiments into peptide identifications via database searching & spectral matching.

However, Sage includes a variety of advanced features that make it a one-stop shop: retention time prediction, quantification (both isobaric & LFQ), peptide-spectrum match rescoring, and FDR control. You can directly use results from Sage without needing to use other tools for these tasks.

Additionally, Sage was designed with cloud computing in mind - massively parallel processing and the ability to directly stream compressed mass spectrometry data to/from AWS S3 enables unprecedented search speeds with minimal cost.

Sage also runs just as well reading local files from your Mac/PC/Linux device!

Why use Sage instead of other tools?

Sage is simple to configure, powerful and flexible. It also happens to be well-tested, mind-boggingly fast, open-source (MIT-licensed) and free.

Citation

If you use Sage in a scientific publication, please cite the following paper:

Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale

Features

Incredible performance out of the box
Effortlessly cross-platform (Linux/MacOS/Windows), effortlessly parallel (uses all of your CPU cores)
Fragment indexing strategy allows for blazing fast narrow and open searches (> 500 Da precursor tolerance)
Isobaric quantification (MS2/MS3-TMT, or custom reporter ions)
Label-free quantification: consider all charge states & isotopologues a la FlashLFQ
Capable of searching for chimeric/co-fragmenting spectra
Wide-window (dynamic precursor tolerance) search mode - enables WWA/PRM/DIA searches
Retention time prediction models fit to each LC/MS run
PSM rescoring using built-in linear discriminant analysis (LDA)
PEP calculation using a non-parametric model (KDE)
FDR calculation using target-decoy competition and picked-peptide & picked-protein approaches
Percolator/Mokapot compatible output
Configuration by JSON file
Built-in support for reading gzipped-mzML files
Support for reading/writing directly from AWS S3

Interoperability

Sage is well-integrated into the open-source proteomics ecosystem. The following projects support analyzing results from Sage (typically in addition to other tools), or redistribute Sage binaries for use in their pipelines.

SearchGUI: a graphical user interface for running searches
PeptideShaker: visualize peptide-spectrum matches
MS2Rescore: AI-assisted rescoring of results
Picked group FDR: scalable protein group FDR for large-scale experiments
sagepy: Python bindings to the sage-core library
quantms: nextflow pipeline for running searches with Sage
OpenMS: Sage is included as a "TOPP" tool in OpenMS
sager: R package for analyzing results from Sage searches
Sage results to mzIdentML: Bash script to convert results.sage.tsv files to mzIdentML
i2MassChroQ: a graphical user interface for proteomics analysis
annotator: a graphical user interface for visualizing peptide-spectrum matches
rustyms: a Rust library (with Python bindings) to handle peptides and identified peptide files
If your project supports Sage and it's not listed, please open a pull request! If you need help integrating or interfacing with Sage in some way, please reach out.

Check out the (now outdated) blog post introducing the first version of Sage for more information and full benchmarks!

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
.github/workflows		.github/workflows
crates		crates
figures		figures
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DOCS.md		DOCS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sage: Decoy-Free Edition (EXPERIMENTAL)

Decoy-Free Search Mode

Implementation Details:

References

Sage: proteomics searching so fast it seems like magic

Introduction

Why use Sage instead of other tools?

Citation

Features

Interoperability

About

Uh oh!

Releases

Packages

Languages

License

jltovar/sage_decoy-free

Folders and files

Latest commit

History

Repository files navigation

Sage: Decoy-Free Edition (EXPERIMENTAL)

Decoy-Free Search Mode

Implementation Details:

References

Sage: proteomics searching so fast it seems like magic

Introduction

Why use Sage instead of other tools?

Citation

Features

Interoperability

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages