Status |
License |
Language |
Release |
Zenodo |
Citation |
|---|---|---|---|---|---|
This repository contains the code, analyses, and rendered figures supporting the ProteoForge manuscript. It includes real-data benchmarks, simulation studies, and an application to a hypoxia study. This repo is the analysis snapshot.
The scripts used here (
ProteoForge) are not packaged; they are simply a collection of functions. However, a more rounded and complete package version in Python can be found at: LangeLab/ProteoForge. This was because the analysis and manuscript were developed in parallel with the package, and some features, especially plotting and printing functions, were added ad hoc for the manuscript. Please refer to the package repository for package-specific documentation, installation instructions, and citation information.
Top-level folders and their purpose:
ProteoForge/— core Python scripts used in analyses (parsers, processing, modelling, clustering, classifiers).Benchmark/— scripts and notebooks for benchmark analyses (R and Python).NSCLC/— notebooks, data and figures for the hypoxia/NSCLC application.Simulation/— simulation scripts, notebooks and utilities used to evaluate methods.src/— auxiliary Python library used by some scripts (utilities, plotting helpers, tests).requirements.txt,setup_project.sh,setup_project.ps1,setup_env.R— environment and setup helpers.
The setup utilities ensure you have venv and renv folders created with the required dependencies. They setup the environment for both R and Python analyses to facilitate reproducibility across OSes.
Notes on data and outputs:
- Raw and derived data/figures are not committed. Place raw inputs under the appropriate
*/data/input/folders; scripts/notebooks will write to*/data/and*/figures/(see folder READMEs). - A snapshot of the repository with input data, and the html renders of all notebooks, is available at Zenodo: 10.5281/zenodo.17795845.
Use the provided setup scripts to configure both Python (venv) and R (renv + pak). R 4.5.0 or newer is required for the R environment.
Linux / macOS (bash):
git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
bash setup_project.shWindows (PowerShell):
git clone https://github.com/LangeLab/ProteoForge_Analysis.git
cd ProteoForge_Analysis
./setup_project.ps1If R is not on PATH, install it from CRAN and rerun the setup command, or run Rscript setup_env.R after installation. To activate the Python environment later, use source .venv/bin/activate (Linux/macOS) or ./.venv/Scripts/Activate.ps1 (PowerShell).
Entry points for reproducing analyses and figures:
- Notebooks:
Benchmark/*.ipynb,Simulation/*.ipynb,NSCLC/*.ipynb. - Scripts (Python):
Benchmark/04-runProteoForge.py,Simulation/04-runProteoForge.py. - Scripts (R):
Benchmark/01-DataProcessing.R,Benchmark/02-runCOPF.R,Benchmark/03-runPeCorA.R, plus analogous scripts inSimulation/.
Each notebook/script documents its required inputs and outputs. Place raw inputs under the corresponding */data/input/ directory before running. Outputs will be written under */data/ and */figures/.
- R environment: managed with
renv; run viasetup_project.sh/setup_project.ps1orRscript setup_env.R. Required R version:>= 4.5.0. - Python environment:
requirements.txtlists dependencies; the setup scripts create.venvand install the requirements. - Data locations: inputs are expected under
*/data/input/; outputs are written to*/data/and*/figures/. Large files are not tracked in git. - Software vs analysis: this repository is the analysis snapshot. The ProteoForge package is developed separately at LangeLab/ProteoForge (see its README and CITATION for package-specific details).
Please cite both the analysis snapshot (this repository) and the ProteoForge software package when applicable.
- Analysis snapshot (this repository): use the Zenodo record and select the version matching the git tag you used.
- “Snapshot of Benchmarking and Showcasing ProteoForge for Proteoform Deconvolution from Peptide Level Data. Version 1. Zenodo. 10.5281/zenodo.17795845.”
- Software package (ProteoForge): cite the package separately.
- Repository: LangeLab/ProteoForge
- Manuscript: cite the manuscript when referencing results or figures derived from this analysis.
- [INSERT FULL MANUSCRIPT REFERENCE/DOI WHEN AVAILABLE]
This repository is licensed under CC BY-NC 4.0: see license.
