Skip to content

All the scripts and notebooks used to produce the data written in the manuscript titled "ProteoForge: An Imputation-Aware Framework for Differential Proteoform Discovery in Bottom-Up Proteomics".

License

Notifications You must be signed in to change notification settings

LangeLab/ProteoForge_Analysis

Repository files navigation

ProteoForge Manuscript Analysis

Status License Language Release Zenodo Citation
Status CC BY-NC 4.0 Language Release DOI Citation

This repository contains the code, analyses, and rendered figures supporting the ProteoForge manuscript. It includes real-data benchmarks, simulation studies, and an application to a hypoxia study. This repo is the analysis snapshot.

The scripts used here (ProteoForge) are not packaged; they are simply a collection of functions. However, a more rounded and complete package version in Python can be found at: LangeLab/ProteoForge. This was because the analysis and manuscript were developed in parallel with the package, and some features, especially plotting and printing functions, were added ad hoc for the manuscript. Please refer to the package repository for package-specific documentation, installation instructions, and citation information.

Repository Layout

Top-level folders and their purpose:

  • ProteoForge/ — core Python scripts used in analyses (parsers, processing, modelling, clustering, classifiers).
  • Benchmark/ — scripts and notebooks for benchmark analyses (R and Python).
  • NSCLC/ — notebooks, data and figures for the hypoxia/NSCLC application.
  • Simulation/ — simulation scripts, notebooks and utilities used to evaluate methods.
  • src/ — auxiliary Python library used by some scripts (utilities, plotting helpers, tests).
  • requirements.txt, setup_project.sh, setup_project.ps1, setup_env.R — environment and setup helpers.

The setup utilities ensure you have venv and renv folders created with the required dependencies. They setup the environment for both R and Python analyses to facilitate reproducibility across OSes.

Notes on data and outputs:

  • Raw and derived data/figures are not committed. Place raw inputs under the appropriate */data/input/ folders; scripts/notebooks will write to */data/ and */figures/ (see folder READMEs).
  • A snapshot of the repository with input data, and the html renders of all notebooks, is available at Zenodo: 10.5281/zenodo.17795845.

Environment Setup (Cross-Platform)

Use the provided setup scripts to configure both Python (venv) and R (renv + pak). R 4.5.0 or newer is required for the R environment.

Linux / macOS (bash):

git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
bash setup_project.sh

Windows (PowerShell):

git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
./setup_project.ps1

If R is not on PATH, install it from CRAN and rerun the setup command, or run Rscript setup_env.R after installation. To activate the Python environment later, use source .venv/bin/activate (Linux/macOS) or ./.venv/Scripts/Activate.ps1 (PowerShell).

Run Steps

Entry points for reproducing analyses and figures:

  • Notebooks: Benchmark/*.ipynb, Simulation/*.ipynb, NSCLC/*.ipynb.
  • Scripts (Python): Benchmark/04-runProteoForge.py, Simulation/04-runProteoForge.py.
  • Scripts (R): Benchmark/01-DataProcessing.R, Benchmark/02-runCOPF.R, Benchmark/03-runPeCorA.R, plus analogous scripts in Simulation/.

Each notebook/script documents its required inputs and outputs. Place raw inputs under the corresponding */data/input/ directory before running. Outputs will be written under */data/ and */figures/.

Reproducibility Notes

  • R environment: managed with renv; run via setup_project.sh/setup_project.ps1 or Rscript setup_env.R. Required R version: >= 4.5.0.
  • Python environment: requirements.txt lists dependencies; the setup scripts create .venv and install the requirements.
  • Data locations: inputs are expected under */data/input/; outputs are written to */data/ and */figures/. Large files are not tracked in git.
  • Software vs analysis: this repository is the analysis snapshot. The ProteoForge package is developed separately at LangeLab/ProteoForge (see its README and CITATION for package-specific details).

Citations

Please cite both the analysis snapshot (this repository) and the ProteoForge software package when applicable.

  • Analysis snapshot (this repository): use the Zenodo record and select the version matching the git tag you used.
    • “Snapshot of Benchmarking and Showcasing ProteoForge for Proteoform Deconvolution from Peptide Level Data. Version 1. Zenodo. 10.5281/zenodo.17795845.”
  • Software package (ProteoForge): cite the package separately.
  • Manuscript: cite the manuscript when referencing results or figures derived from this analysis.
    • [INSERT FULL MANUSCRIPT REFERENCE/DOI WHEN AVAILABLE]

License

This repository is licensed under CC BY-NC 4.0: see license.

CC BY-NC 4.0

About

All the scripts and notebooks used to produce the data written in the manuscript titled "ProteoForge: An Imputation-Aware Framework for Differential Proteoform Discovery in Bottom-Up Proteomics".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published