ProteoMaker: Simulating proteoforms in bottom-up proteomics

ProteoMaker is a platform for the generation of an in-silico bottom-up proteomics data set with a ground truth on the level of proteoforms.

All the parameters that are used to generate the data are described in man/Parameters.qmd (render to HTML if you prefer). The script that runs the entire pipeline is RunSims.R. Alternatively, you can use the Vignette. The simulations with multiple parameters can be set up and run. ProteoMaker also provides comparison of the results with the ground truth using benchmarking metrics and visual comparison between simulated data sets.

The pipeline is described in the figure ProteoMakerLayout.svg and can be described as follows:

General functions to run the simulations: 00_BatchRunFuncs.R
Generation of ground truth data at the proteoform level 01_GenerateGroundTruth.R.
Digestion of the proteoforms from the ground truth: 02_Digestion.R.
In silico MS run: 03_MSRun.R.
Functions for data analysis from the peptide to proteins: 04_DataAnalysis.R.
Statistical testing: 05_Statistics.R.
Benchmarking: 06_Benchmarks.R.

Installation

Install the package from GitHub with devtools:

devtools::install_github("computproteomics/ProteoMaker")

Quick Start

Load the package in R:
```
library(ProteoMaker)
```
Run the default simulation (adjust the output folder as needed):
```
Param  <- def_param()
Config <- set_proteomaker(resultFilePath = "results")
run_sims(Param, Config)
```
Intermediate outputDataAnalysis_<hash>.RData files and benchmark tables are written to results/.
Explore the outputs, for example:
```
visualize_benchmarks(Config$resultFilePath)
```
or open the vignette for a full walkthrough.

Repository Overview

Path	Description
R/	Core simulation, analysis, and benchmarking functions
inst/config/parameters.yaml	Default parameter definitions consumed by `def_param()`
inst/cmd/RunSims.R	Convenience script to run the full simulation pipeline
vignettes/	Walkthroughs and usage examples
inst/img/	Diagrams and figures (e.g. pipeline layout)
tests/	Automated tests for key functionality
inst/shiny/	Shiny interface for interactive configuration (if used)

Running full batches and benchmarking

Running the Vignette allows running full batches without having to re-run the data sets which have been built with the same set of parameters. In addition, the pipeline is run hierarchically to avoid repetitive execution of identical down-stream analysis. This is done via creating hashes of the parameter configurations and writing intermediate and final results into respective tables.

Re-running the full batch with different assessment of the benchmarking metrics will avoid re-running the data set generation and analysis, and thus should be superfast.

Important remarks:

You need to always define all parameters in the beginning of this script
Be aware that changing downstream parameters (ground truth, digestion) can immensely increase the number of possible parameter settings
Keep always the result files in the respective folder (resultFilePath) if you didn't change anything in the pipeline such as any of the methods in the sourced files. This will allow you to run the full batch without re-running the data set generation and analysis.
Benchmarking results are skipped for data sets with fewer than 100 quantified proteins.

Name		Name	Last commit message	Last commit date
Latest commit History 424 Commits
.Rproj.user		.Rproj.user
.github/workflows		.github/workflows
.vscode		.vscode
R		R
inst		inst
man		man
tests		tests
tmp		tmp
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
ProteoMaker.Rproj		ProteoMaker.Rproj
README.md		README.md
t.rds		t.rds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProteoMaker: Simulating proteoforms in bottom-up proteomics

Installation

Quick Start

Repository Overview

Running full batches and benchmarking

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

computproteomics/ProteoMaker

Folders and files

Latest commit

History

Repository files navigation

ProteoMaker: Simulating proteoforms in bottom-up proteomics

Installation

Quick Start

Repository Overview

Running full batches and benchmarking

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages