-
Notifications
You must be signed in to change notification settings - Fork 44
Description
QC metrics in the PSI-MS CV currently use the general relationship has_metric_category to encode multiple different kinds of information (e.g. workflow stage, measurement scope, dependency type). However, these categories are mixed together and applied inconsistently, making it difficult for both humans and software to interpret QC metric semantics in a structured way.
We propose to convert this into a formalized, ontology-driven taxonomy of QC metrics, using seven distinct classification dimensions. Each dimension captures one orthogonal aspect of a metric's semantic meaning (e.g. which workflow stage it refers to, what kind of property it measures, how its values should be interpreted, etc.).
By replacing the current catch-all relation has_metric_category with specific relationship types (defined as Typedefs at the top of the CV), we can explicitly encode this semantic information in a consistent, machine-readable, and reasoning-friendly way.
Current situation
At present, QC metrics are categorized with:
relationship: has_metric_category MS:4000009 ! ID free metric
relationship: has_metric_category MS:4000012 ! single run based metric
relationship: has_metric_category MS:4000018 ! XIC metric
All categories are encoded through the same has_metric_category property, even though they represent different kinds of information:
- one about the data dependency (
ID free metric), - one about the measurement scope (
single run based metric), - and one about the workflow stage (
XIC metric).
This approach:
- makes it unclear which category types are (or aren't) specified for a given metric,
- is hard to maintain consistently,
- and prevents downstream tools or reasoning engines from automatically inferring relationships (e.g. "which metrics assess chromatography performance across runs?").
Proposed change
We propose to introduce seven distinct relationship types, one for each classification dimension, to replace has_metric_category.
Each relationship type will be defined as a Typedef in the ontology header, e.g.:
[Typedef]
id: has_measurement_scope
name: has measurement scope
def: "Links a QC metric to the level of data aggregation it summarizes (e.g. run level, batch level, study level)."
is_a: has_metric_category
This makes the ontology structure more expressive and explicitly states what each relationship means.
The seven classification dimensions
Each dimension encodes a different facet of a QC metric's meaning:
| Classification Dimension | Definition | Example Subclasses |
|---|---|---|
| Workflow stage | The experimental or computational stage to which a QC metric applies. | Sample preparation, chromatography, ionization, MS1 acquisition, quantification, alignment. |
| Analytical dimension | The fundamental property being measured. | Mass accuracy, chromatographic performance, intensity stability, fragmentation efficiency, contamination, etc. |
| Information dependency type | The type of data or information required to compute the metric. | Raw acquisition data, identification results, quantification results, hybrid, reference data. |
| Measurement scope | The data aggregation level a metric summarizes. | Spectrum level, run level, batch level, study level. |
| Acquisition strategy | The acquisition mode or instrument configuration relevant to the metric. | DDA, DIA, targeted, imaging, ion mobility-coupled, acquisition-mode independent. |
| Quality interpretation type | The directionality or qualitative meaning of the metric's values with respect to quality. | Higher is better, lower is better, target range, categorical, trend. |
| Metric value type | The structural nature of the metric's output values. | Single value, tuple, table, matrix. |
These seven axes are orthogonal; every QC metric can be classified along each of them, and each relation type conveys a unique piece of semantic information.
Example: Updated metric definition
Below is an example showing how a QC metric (MS:4000051 XIC-FWHM quantiles) would look under the proposed system.
Old definition (current style)
is_a: MS:4000004 ! n-tuple
relationship: has_metric_category MS:4000009 ! ID free metric
relationship: has_metric_category MS:4000012 ! single run based metric
relationship: has_metric_category MS:4000018 ! XIC metric
New definition (proposed style)
is_a: MS:4000001 ! QC metric
relationship: part_of_workflow_stage MS:XXXXXXX ! chromatography stage
relationship: measures_property MS:XXXXXXX ! chromatographic performance metric
relationship: depends_on_data_type MS:XXXXXXX ! raw acquisition data
relationship: has_measurement_scope MS:XXXXXXX ! run level
relationship: applies_to_acquisition_mode MS:XXXXXXX ! acquisition mode independent
relationship: has_quality_directionality MS:XXXXXXX ! lower is better
relationship: has_value_type MS:XXXXXXX ! tuple
Each line now encodes a specific semantic relationship that tools can reason over, index, or validate.
Benefits
- Clarity and consistency: Each relationship clearly states what kind of information it represents. Ontology editors and contributors can immediately see whether all relevant facets are defined for a given metric.
- Machine-readable semantics: Downstream QC tools (e.g. mzQC validators, dashboards, or repositories) can query or filter metrics by category:
- "Show all chromatography-related metrics."
- "Which metrics are computed at the run level?"
- "Which metrics are ID-based?"
- Backward compatibility: Terms and identifiers remain unchanged, so existing tools and mzQC files continue to work.
Summary
This proposal does not change any existing QC metric terms or their identifiers, it only restructures how relationships are expressed in the CV to make semantic information more explicit and machine-readable.
In other words:
- The terms themselves remain the same, so no impact on downstream tools or data formats (e.g. mzQC files).
- Only the relationship layer (previously expressed through
has_metric_category) is being refined into multiple, more specific relationships. - The ontology will remain fully backward compatible, existing metrics and software will continue to function as before.
Implementing this will require some curation work to reannotate QC terms using the new relationship types. This will be done incrementally through a series of pull requests, starting with the addition of the new Typedef entries at the top of the CV. Once those are in place, individual metrics can gradually adopt the new structure.
With this issue we want to discuss this change first, to ensure full compatibility with existing conventions and downstream usage, before beginning implementation.