Skip to content

Proposal: Formalized categorization of QC metrics #430

@bittremieux

Description

@bittremieux

QC metrics in the PSI-MS CV currently use the general relationship has_metric_category to encode multiple different kinds of information (e.g. workflow stage, measurement scope, dependency type). However, these categories are mixed together and applied inconsistently, making it difficult for both humans and software to interpret QC metric semantics in a structured way.

We propose to convert this into a formalized, ontology-driven taxonomy of QC metrics, using seven distinct classification dimensions. Each dimension captures one orthogonal aspect of a metric's semantic meaning (e.g. which workflow stage it refers to, what kind of property it measures, how its values should be interpreted, etc.).

By replacing the current catch-all relation has_metric_category with specific relationship types (defined as Typedefs at the top of the CV), we can explicitly encode this semantic information in a consistent, machine-readable, and reasoning-friendly way.

Current situation

At present, QC metrics are categorized with:

relationship: has_metric_category MS:4000009 ! ID free metric
relationship: has_metric_category MS:4000012 ! single run based metric
relationship: has_metric_category MS:4000018 ! XIC metric

All categories are encoded through the same has_metric_category property, even though they represent different kinds of information:

  • one about the data dependency (ID free metric),
  • one about the measurement scope (single run based metric),
  • and one about the workflow stage (XIC metric).

This approach:

  • makes it unclear which category types are (or aren't) specified for a given metric,
  • is hard to maintain consistently,
  • and prevents downstream tools or reasoning engines from automatically inferring relationships (e.g. "which metrics assess chromatography performance across runs?").

Proposed change

We propose to introduce seven distinct relationship types, one for each classification dimension, to replace has_metric_category.

Each relationship type will be defined as a Typedef in the ontology header, e.g.:

[Typedef]
id: has_measurement_scope
name: has measurement scope
def: "Links a QC metric to the level of data aggregation it summarizes (e.g. run level, batch level, study level)."
is_a: has_metric_category

This makes the ontology structure more expressive and explicitly states what each relationship means.

The seven classification dimensions

Each dimension encodes a different facet of a QC metric's meaning:

Classification Dimension Definition Example Subclasses
Workflow stage The experimental or computational stage to which a QC metric applies. Sample preparation, chromatography, ionization, MS1 acquisition, quantification, alignment.
Analytical dimension The fundamental property being measured. Mass accuracy, chromatographic performance, intensity stability, fragmentation efficiency, contamination, etc.
Information dependency type The type of data or information required to compute the metric. Raw acquisition data, identification results, quantification results, hybrid, reference data.
Measurement scope The data aggregation level a metric summarizes. Spectrum level, run level, batch level, study level.
Acquisition strategy The acquisition mode or instrument configuration relevant to the metric. DDA, DIA, targeted, imaging, ion mobility-coupled, acquisition-mode independent.
Quality interpretation type The directionality or qualitative meaning of the metric's values with respect to quality. Higher is better, lower is better, target range, categorical, trend.
Metric value type The structural nature of the metric's output values. Single value, tuple, table, matrix.

These seven axes are orthogonal; every QC metric can be classified along each of them, and each relation type conveys a unique piece of semantic information.

Example: Updated metric definition

Below is an example showing how a QC metric (MS:4000051 XIC-FWHM quantiles) would look under the proposed system.

Old definition (current style)

is_a: MS:4000004 ! n-tuple
relationship: has_metric_category MS:4000009 ! ID free metric
relationship: has_metric_category MS:4000012 ! single run based metric
relationship: has_metric_category MS:4000018 ! XIC metric

New definition (proposed style)

is_a: MS:4000001 ! QC metric

relationship: part_of_workflow_stage MS:XXXXXXX ! chromatography stage
relationship: measures_property MS:XXXXXXX ! chromatographic performance metric
relationship: depends_on_data_type MS:XXXXXXX ! raw acquisition data
relationship: has_measurement_scope MS:XXXXXXX ! run level
relationship: applies_to_acquisition_mode MS:XXXXXXX ! acquisition mode independent
relationship: has_quality_directionality MS:XXXXXXX ! lower is better
relationship: has_value_type MS:XXXXXXX ! tuple

Each line now encodes a specific semantic relationship that tools can reason over, index, or validate.

Benefits

  • Clarity and consistency: Each relationship clearly states what kind of information it represents. Ontology editors and contributors can immediately see whether all relevant facets are defined for a given metric.
  • Machine-readable semantics: Downstream QC tools (e.g. mzQC validators, dashboards, or repositories) can query or filter metrics by category:
    • "Show all chromatography-related metrics."
    • "Which metrics are computed at the run level?"
    • "Which metrics are ID-based?"
  • Backward compatibility: Terms and identifiers remain unchanged, so existing tools and mzQC files continue to work.

Summary

This proposal does not change any existing QC metric terms or their identifiers, it only restructures how relationships are expressed in the CV to make semantic information more explicit and machine-readable.

In other words:

  • The terms themselves remain the same, so no impact on downstream tools or data formats (e.g. mzQC files).
  • Only the relationship layer (previously expressed through has_metric_category) is being refined into multiple, more specific relationships.
  • The ontology will remain fully backward compatible, existing metrics and software will continue to function as before.

Implementing this will require some curation work to reannotate QC terms using the new relationship types. This will be done incrementally through a series of pull requests, starting with the addition of the new Typedef entries at the top of the CV. Once those are in place, individual metrics can gradually adopt the new structure.

With this issue we want to discuss this change first, to ensure full compatibility with existing conventions and downstream usage, before beginning implementation.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions