Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0189b60
remove old docs
yaseminbridges Feb 10, 2026
b152a82
remove files
yaseminbridges Feb 10, 2026
cb92da7
add workflow figure
yaseminbridges Feb 10, 2026
a835f79
refactor nav structure and theme configuration
yaseminbridges Feb 11, 2026
21bc964
update docs generation paths to align with new folder structure
yaseminbridges Feb 11, 2026
a0d6948
rewrite README for clarity and completeness
yaseminbridges Feb 11, 2026
42520d3
rewrite for clarity and detail
yaseminbridges Feb 11, 2026
e99f241
add pheval logo to documentation assets
yaseminbridges Feb 11, 2026
24c7737
add "Getting Started" guide to documentation
yaseminbridges Feb 11, 2026
4a62f25
add installation guide to documentation
yaseminbridges Feb 11, 2026
0032398
add utilities section to documentation
yaseminbridges Feb 11, 2026
547cfec
add documentation for phenotype scrambling utilities
yaseminbridges Feb 11, 2026
1717d8d
add resource updates guide to documentation
yaseminbridges Feb 11, 2026
18c2fad
add documentation for data preparation utilities
yaseminbridges Feb 11, 2026
82bf5c2
add documentation for plugins and runners
yaseminbridges Feb 11, 2026
c2c7430
add plugins index to documentation
yaseminbridges Feb 11, 2026
697b0ef
add benchmarking section to documentation
yaseminbridges Feb 11, 2026
b24a776
add "executing a benchmark" guide to documentation
yaseminbridges Feb 11, 2026
2302067
add documentation for variant utilities
yaseminbridges Feb 11, 2026
6429eef
add API reference section to documentation
yaseminbridges Feb 11, 2026
993a534
add "developing a PhEval plugin" guide to documentation
yaseminbridges Feb 11, 2026
0c9b661
add "contributions guide" to documentation
yaseminbridges Feb 11, 2026
fa15fb6
update pheval to version 0.7.8
yaseminbridges Feb 11, 2026
f3adedd
remove repetition
yaseminbridges Feb 26, 2026
ee63ea7
remove code of conduct
yaseminbridges Feb 26, 2026
8fbaeb3
update plugins index table formatting and add example plugin
yaseminbridges Feb 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 86 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,75 +6,116 @@
![Python Version](https://img.shields.io/badge/python-3.10%2B-blue)
![Issues](https://img.shields.io/github/issues/monarch-initiative/pheval)

## Overview
PhEval (Phenotypic Inference Evaluation Framework) is a **modular, reproducible benchmarking framework** for evaluating **phenotype-driven prioritisation tools**, such as gene, variant, and disease prioritisation algorithms.

The absence of standardised benchmarks and data standardisation for Variant and Gene Prioritisation Algorithms (VGPAs) presents a significant challenge in the field of genomic research. To address this, we developed PhEval, a novel framework designed to streamline the evaluation of VGPAs that incorporate phenotypic data. PhEval offers several key benefits:
It is designed to support **fair comparison across tools, tool versions, datasets, and knowledge updates**, addressing a long-standing gap in standardised evaluation for phenotype-based methods.

- Automated Processes: Reduces manual effort by automating various evaluation tasks, thus enhancing efficiency.
- Standardisation: Ensures consistency and comparability in evaluation methodologies, leading to more reliable and standardised assessments.
- Reproducibility: Facilitates reproducibility in research by providing a standardised platform, allowing for consistent validation of algorithms.
- Comprehensive Benchmarking: Enables thorough benchmarking of algorithms, providing well-founded comparisons and deeper insights into their performance.
📖 **Full documentation:** https://monarch-initiative.github.io/pheval/
---

PhEval is a valuable tool for researchers looking to improve the accuracy and reliability of VGPA evaluations through a structured and standardised approach.
## Why PhEval?

For more information please see the full [documentation](https://monarch-initiative.github.io/pheval/).
Evaluating phenotype-driven prioritisation tools is challenging because performance depends on many moving parts, including:

## Download and Installation
- Phenotype representations and noise
- Ontology structure and versioning
- Gene and disease mappings
- Tool-specific scoring and ranking strategies
- Input cohorts and simulation approaches

PhEval provides a framework that makes these factors **explicit, controlled, and comparable**.

Key features:

- **Standardised outputs** across tools
- **Reproducible benchmarking** with recorded metadata
- **Plugin-based architecture** for extensibility
- **Separation of execution and evaluation**
- Support for **gene, variant, and disease prioritisation**

---

## Installation

PhEval requires **Python 3.10 or later**.

Install from PyPI:

1. Ensure you have Python 3.10 or greater installed.
2. Install with `pip`:
```bash
pip install pheval
```
3. See list of all PhEval utility commands:

This installs:

* The core pheval CLI (for running tools via plugins)
* `pheval-utils` (for data preparation, benchmarking, and analysis)

Verify installation:

```bash
pheval --help
pheval-utils --help
```

## Usage
## How PhEval is used

The PhEval CLI offers a variety of commands categorised into two main types: **Runner Implementations** and **Utility Commands**. Below is an overview of each category, detailing how they can be utilised to perform various tasks within PhEval.
PhEval workflows typically consist of three phases:

### Runner Implementations
1. Prepare data
Prepare and manipulate phenopackets and related inputs (e.g. VCFs).
2. Run tools
Execute phenotype-driven prioritisation tools via plugin-provided runners using:
```bash
pheval run --runner <runner_name> ...
```
3. Benchmark and analyse
Compare results across runs using standardised metrics and plots.

The primary command used within PhEval is `pheval run`. This command is responsible for executing concrete VGPA runner implementations, that we sometimes term as plugins. By using pheval run, users can leverage these runner implementations to: execute the VGPA on a set of test corpora, produce tool-specific result outputs, and post-process tool-specific outputs to PhEval standardised TSV outputs.
Each phase is documented in detail in the user documentation.

Some concrete PhEval runner implementations include the [Exomiser runner](https://github.com/monarch-initiative/pheval.exomiser) and the [Phen2Gene runner](https://github.com/monarch-initiative/pheval.phen2gene). The full list of currently implemented runners can be found [here](https://monarch-initiative.github.io/pheval/plugins/)
## Plugins and runners

Please read the [documentation](https://monarch-initiative.github.io/pheval/developing_a_pheval_plugin/) for a step-by-step for creating your own PhEval plugin.
PhEval itself is tool-agnostic.

### Utility Commands
Support for specific tools is provided via plugins, which implement runners responsible for:

In addition to the main `run` command, PhEval provides a set of utility commands designed to enhance the overall functionality of the CLI. These commands can be used to set up and configure experiments, streamline data preparation, and benchmark the performance of various VGPA runner implementations. By utilising these utilities, users can optimise their experimental workflows, ensure reproducibility, and compare the efficiency and accuracy of different approaches. The utility commands offer a range of options that facilitate the customisation and fine-tuning to suit diverse research objectives.
* Preparing tool inputs
* Executing the tool
* Converting raw outputs into PhEval standardised results

#### Example Usage
A list of available plugins is maintained in the documentation:

To add noise to an existing corpus of phenopackets, this could be used to assess the robustness of VGPAs when less relevant or unreliable phenotype data is introduced:
```bash
pheval-utils scramble-phenopackets --phenopacket-dir /phenopackets --scramble-factor 0.5 --output-dir /scrambled_phenopackets_0.5
```
Plugins: https://monarch-initiative.github.io/pheval/plugins/

To update the gene symbols and identifiers to a specific namespace:
```bash
pheval-utils update-phenopackets --phenopacket-dir /phenopackets --output-dir /updated_phenopackets --gene-identifier ensembl_id
```
Each plugin repository contains tool-specific installation instructions and examples.

To prepare VCF files for a corpus of phenopackets, spiking in the known causative variants:
```bash
pheval-utils create-spiked-vcfs --phenopacket-dir /phenopackets --hg19-template-vcf /template_hg19.vcf --hg38-template-vcf /template_hg38.vcf --output-dir /vcf
```
## Documentation

Alternatively, you can wrap all corpus preparatory commands into a single step. Specifying `--variant-analysis`/`--gene-analysis`/`--disease-analysis` will check the phenopackets for complete records documenting the known entities. If template vcf(s) are provided this will spike VCFs with the known variant for the corpus. If a `--gene-identifier` is specified then the corpus of phenopackets is updated.
```bash
pheval-utils prepare-corpus \
--phenopacket-dir /phenopackets \
--variant-analysis \
--gene-analysis \
--gene-identifier ensembl_id \
--hg19-template-vcf /template_hg19.vcf \
--hg38-template-vcf /template_hg38.vcf \
--output-dir /vcf
```
The PhEval documentation is organised by audience and task:
* Getting started: installation and first steps
* Using PhEval: running tools, plugins, and workflows
* Utilities: data preparation, phenopacket manipulation, simulations
* Benchmarking: executing benchmarks, metrics, and plots
* Developer documentation: plugin development and API reference

Start here: https://monarch-initiative.github.io/pheval/

## Contributions

Contributions are welcome across:

* Code
* Documentation
* Testing
* Plugins and integrations

## Citation

If you use **PhEval** in your research, please cite the following publication:

> **Bridges, Y., Souza, V. d., Cortes, K. G., et al.**
> *Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval – Phenotypic Inference Evaluation Framework.*
> **BMC Bioinformatics** 26, 87 (2025).
> https://doi.org/10.1186/s12859-025-06105-4

See the [documentation](https://monarch-initiative.github.io/pheval/executing_a_benchmark/) for instructions on benchmarking and evaluating the performance of various VGPAs.

46 changes: 0 additions & 46 deletions docs/CODE_OF_CONDUCT.md

This file was deleted.

3 changes: 0 additions & 3 deletions docs/about.md

This file was deleted.

138 changes: 138 additions & 0 deletions docs/benchmarking/executing_a_benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Executing a Benchmark

This page describes how to execute a benchmark, configure benchmarking parameters, and interpret the resulting outputs.

It assumes that one or more PhEval runs have already been completed using plugin-provided runners.

---

## After runner execution

After executing a run, an output directory structure similar to the following is produced:

```tree
.
├── pheval_disease_results
│ ├── patient_1-disease_result.parquet
├── pheval_gene_results
│ ├── patient_1-gene_result.parquet
├── pheval_variant_results
│ ├── patient_1-variant_result.parquet
├── raw_results
│ ├── patient_1.json
├── results.yml
└── tool_input_commands
└── tool_input_commands.txt
```

Which result directories are present depends on the configuration used during runner execution.

The contents of the `pheval_*_results` directories are consumed during benchmarking.

---

## Benchmarking configuration file

Benchmarking is configured using a YAML file supplied to the CLI.

### Example configuration

```yaml
benchmark_name: tool_version_update_benchmark
runs:
- run_identifier: run_identifier_1
results_dir: /path/to/results_dir_1
phenopacket_dir: /path/to/phenopacket_dir
gene_analysis: true
variant_analysis: false
disease_analysis: true
threshold:
score_order: descending
- run_identifier: run_identifier_2
results_dir: /path/to/results_dir_2
phenopacket_dir: /path/to/phenopacket_dir
gene_analysis: true
variant_analysis: true
disease_analysis: true
threshold:
score_order: descending
plot_customisation:
gene_plots:
plot_type: bar_cumulative
rank_plot_title:
roc_curve_title:
precision_recall_title:
disease_plots:
plot_type: bar_cumulative
rank_plot_title:
roc_curve_title:
precision_recall_title:
variant_plots:
plot_type: bar_cumulative
rank_plot_title:
roc_curve_title:
precision_recall_title:
```

The `benchmark_name` is used to name the DuckDB database that stores benchmarking statistics.
It should not contain whitespace or special characters.

---

## Runs section

Each entry in the `runs` list specifies a completed run to include in the benchmark.

Required fields:

- `run_identifier` → A human-readable identifier used in tables and plots.
- `results_dir` → Path to the directory containing `pheval_gene_results`, `pheval_variant_results`, and/or `pheval_disease_results`.
- `phenopacket_dir` →Path to the phenopacket directory used during runner execution.
- `gene_analysis`, `variant_analysis`, `disease_analysis` →Boolean flags indicating which analyses to include.

Optional fields:

- `threshold` → Score threshold for result inclusion.
- `score_order` → Ranking order (`ascending` or `descending`).

---

## Plot customisation

The `plot_customisation` section allows optional control over plot appearance.

Available options:

- `plot_type` → One of `bar_cumulative`, `bar_non_cumulative`, or `bar_stacked`.
- `rank_plot_title` → Custom title for ranking summary plots.
- `roc_curve_title` → Custom title for ROC plots.
- `precision_recall_title` → Custom title for precision–recall plots.

If left unspecified, default titles and plot types are used.

---

## Executing the benchmark

Once the configuration file is prepared, benchmarking can be executed with:

```bash
pheval-utils benchmark --run-yaml benchmarking_config.yaml
```

> !!! note "**Command Note:**"
As of `pheval` version **0.5.0** onwards, the command is `benchmark`.
In earlier versions, the equivalent command was `generate-benchmark-stats`.
See the [v0.5.1 release notes](https://github.com/monarch-initiative/pheval/releases/tag/0.5.1) for more details.


---

## Outputs and interpretation

Benchmarking produces:

- A DuckDB database containing computed statistics, and comparisons between runs
- Rank-based and binary classification plots

These outputs can be used to compare tools, configurations, and experimental conditions in a reproducible manner.
Loading