From f0b126cedc14472d7d21fa5a4a91b44124a6632a Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Wed, 5 Nov 2025 12:26:27 +0100 Subject: [PATCH 1/4] Add documentation on new metrics classification --- docs/pages/cv/classification_reference.md | 223 +++++++++++++++ docs/pages/cv/howto_create_cv_terms.md | 314 +++++++++------------- docs/pages/cv/howto_use_cv_terms.md | 240 ++++++++++++----- 3 files changed, 520 insertions(+), 257 deletions(-) create mode 100644 docs/pages/cv/classification_reference.md diff --git a/docs/pages/cv/classification_reference.md b/docs/pages/cv/classification_reference.md new file mode 100644 index 0000000..ae1f9a2 --- /dev/null +++ b/docs/pages/cv/classification_reference.md @@ -0,0 +1,223 @@ +--- +layout: page +title: "Metrics – Classification Reference" +permalink: /metrics/classification +--- + +# QC Metric Classification Reference + +*Standardized semantic categories for PSI-MS quality control metrics* + +## Overview + +Each QC metric in the PSI-MS Controlled Vocabulary (CV) is annotated using **seven independent classification dimensions**. +Together, these describe *what a metric measures*, *where it applies in the workflow*, and *how it should be interpreted*. + +Every metric must define **exactly one relationship** from each dimension. +This ensures complete, machine-interpretable semantics across all QC terms used in mzQC and related standards. + +| Dimension | Describes | +| ------------------------------- | ---------------------------------------------------------------------- | +| **Workflow stage** | Where in the experimental or computational pipeline the metric applies | +| **Analytical dimension** | What fundamental property the metric measures | +| **Information dependency type** | What type of data the metric depends on | +| **Measurement scope** | At what aggregation level the metric summarizes data | +| **Acquisition strategy** | Which acquisition or instrument configuration it applies to | +| **Quality interpretation type** | How metric values relate to data quality | +| **Metric value type** | How metric values are structurally represented (single, tuple, etc.) | + +## Workflow Stage + +**Definition:** +The experimental or computational stage of the workflow to which a QC metric applies. + +This tells *where* in the process the metric is relevant — from sample preparation through acquisition and analysis. + +### Subclasses + +#### Experimental workflow stage + +Metrics describing quality at the laboratory or instrument level. + +* **Sample preparation stage:** metrics describing sample handling, labeling, digestion, or storage quality. + *Example:* peptide recovery yield. +* **Chromatography stage:** metrics about LC separation performance. + *Example:* retention-time reproducibility, peak width. +* **Ionization stage:** metrics about ion generation and charge distribution. + *Example:* precursor charge-state fractions. +* **Ion mobility separation stage:** metrics describing the performance of gas-phase separation devices. + *Example:* ion-mobility resolution, CCS reproducibility. +* **Mass spectrometry acquisition stage:** metrics referring to scanning, detection, or data acquisition processes. + *Examples:* number of MS1 scans, duty-cycle stability. + + * **MS1 acquisition stage:** metrics that use or summarize MS1 data. + * **MS2 acquisition stage:** metrics that use or summarize MS2 data. + * **MSn acquisition stage:** metrics for higher-order fragmentation (MS³, etc.). +* **Instrument performance monitoring stage:** general metrics of instrument health and stability. + *Example:* mass-accuracy drift, spray stability. +* **Instrument calibration stage:** metrics derived from calibration routines or control samples. + +#### Data analysis workflow stage + +Metrics evaluating computational processing and interpretation steps. + +* **Data preprocessing stage:** metrics about baseline correction, noise removal, or peak picking. +* **Identification stage:** metrics assessing identification quality. + *Example:* PSM-level FDR, peptide identification rate. +* **Quantification stage:** metrics describing quantitative accuracy or precision. + *Example:* CV of peptide intensities, ratio reproducibility. +* **Integration stage:** metrics related to alignment, normalization, or data integration across runs. + +#### Environmental condition monitoring + +Metrics about environmental conditions that can indirectly affect results. +*Example:* laboratory temperature, humidity, power fluctuations. + +## Analytical Dimension + +**Definition:** +The fundamental property or aspect of data quality that the metric quantifies. + +Think of this as *what kind of problem the metric detects or describes*. + +### Subclasses + +* **Acquisition coverage metric:** evaluates how comprehensively data were collected (e.g., scan counts, sampling density). +* **Mass accuracy metric:** measures deviation between observed and theoretical m/z values. +* **Intensity stability metric:** assesses signal intensity variation over time. +* **Chromatographic performance metric:** evaluates separation performance such as peak width, symmetry, or retention reproducibility. +* **Ionization quality metric:** evaluates properties of the ion population generated during ionization, such as charge-state distribution or adduct prevalence. +* **Ion mobility metric:** describes resolution, drift-time accuracy, or reproducibility in ion-mobility separations. +* **Spectral quality metric:** quantifies quality of individual spectra (e.g., peak density, signal-to-noise, completeness). +* **Fragmentation efficiency metric:** measures how efficiently precursor ions fragment to produce interpretable spectra. +* **Isolation purity metric:** evaluates precursor isolation selectivity or co-isolation of interfering species. +* **Identification confidence metric:** quantifies reliability of peptide or compound identifications (e.g., FDR, number of identified analytes). +* **Quantification precision metric:** measures reproducibility or variability of quantitative results. +* **Contamination metric:** detects unwanted signal from contaminants, carryover, or background. +* **Instrument operational performance metric:** general indicators of instrument health (e.g., vacuum level, temperature, detector voltage). +* **Missingness/completeness metric:** measures data absence or completeness across features, runs, or studies. + +## Information Dependency Type + +**Definition:** +Specifies which type of data input the metric requires to be computed. + +### Subclasses + +* **Raw acquisition data:** metrics that can be calculated directly from the raw MS data, without identifications. + *Example:* total ion current stability, scan count. +* **Identification results:** metrics that depend on identified peptides, compounds, or spectra. + *Example:* PSM-level FDR, peptide coverage. +* **Quantification results:** metrics derived from quantitative data matrices. + *Example:* CV of peptide intensities. +* **Hybrid:** metrics combining multiple data types (e.g., identification and quantification). +* **Reference data:** metrics requiring comparison to external standards or reference files. + *Example:* RT deviation vs. iRT peptides, calibration QC.* + +## Measurement Scope + +**Definition:** +Indicates the level of data aggregation the metric summarizes. + +### Subclasses + +* **Spectrum level:** per-spectrum metrics (e.g., number of peaks, S/N ratio). +* **Pixel/voxel level:** per-pixel metrics in imaging or spatial omics. +* **Feature level:** per feature (e.g., peptide, compound, or chromatographic peak). +* **Run level:** aggregated per LC–MS run. +* **Batch level:** aggregated across multiple related runs. +* **Study level:** aggregated across an entire experiment or project. + +## Acquisition Strategy + +**Definition:** +Specifies which acquisition mode or instrument configuration the metric is relevant for. + +### Subclasses + +#### Acquisition mode + +* **Acquisition mode independent:** metrics valid for any acquisition method. +* **Data-dependent acquisition (DDA):** metrics specific to stochastic precursor selection workflows. + *Example:* number of MS2 spectra per precursor. +* **Data-independent acquisition (DIA):** metrics for window-based fragmentation strategies. + *Example:* precursor window purity. +* **Targeted acquisition:** metrics for SRM, PRM, or other targeted workflows. + *Example:* transition reproducibility. +* **Ion-mobility-coupled metric:** metrics derived from acquisition methods that include gas-phase ion mobility separation. + *Example:* TIMS mobility resolution (Δ1/K₀) per run. +* **Imaging acquisition:** metrics for spatially resolved mass spectrometry experiments such as MALDI, DESI, or SIMS. + *Example:* pixel-to-pixel intensity variation across a tissue section. +* **Other specialized mode:** metrics for advanced or hybrid acquisition modes such as BoxCar, MSⁿ, or multiplexed scanning. + *Example:* BoxCar intensity uniformity across boxes. + +#### Instrument platform specificity + +* **Orbitrap-specific:** metrics only applicable to Orbitrap instruments. + *Example:* Orbitrap transient length stability. +* **TOF-specific:** metrics relevant to time-of-flight instruments. + *Example:* TOF detector voltage stability. +* **Ion-trap-specific:** metrics specific to trap-based systems. + *Example:* Ion trap fill time distribution. +* **Other platform-specific:** for quadrupoles, FT-ICR, or hybrid systems. + +## Quality Interpretation Type + +**Definition:** +Describes how a metric's numeric value relates to overall quality. +This enables automatic reasoning about whether "higher," "lower," or "targeted" values represent better data. + +### Subclasses + +* **Higher is better:** increasing values indicate improved quality. + *Example:* identification rate, mass accuracy score. +* **Lower is better:** decreasing values indicate improved quality. + *Example:* FDR, mass error. +* **Context dependent:** interpretation varies depending on method or range. + *Example:* precursor charge-state fractions, peak density. +* **Target range:** optimal quality corresponds to values within a defined interval. + *Example:* temperature, pressure, retention-time drift. +* **Categorical:** quality expressed as discrete categories (e.g., pass/fail, OK/warning/error). +* **Trend:** metrics intended for temporal monitoring rather than direct ranking (e.g., instrument drift over time). + +## Metric Value Type + +**Definition:** +Specifies the structural format of the metric's reported value(s). +This defines how the metric must be represented in mzQC. + +### Subclasses + +| Type | Structure | Description | Example | +| ---------------- | ----------------- | ------------------------------------------------------------- | ----------------------------- | +| **Single value** | Scalar | A single numeric or categorical value. | Number of MS1 spectra | +| **Tuple** | Ordered list | Several ordered values of the same kind (e.g., quantiles). | XIC-FWHM quantiles | +| **Table** | Named columns | Parallel lists of equal length; each column has its own unit. | MS2 charge fractions | +| **Matrix** | Rectangular array | 2D array of homogeneous numeric values. | Ion-mobility intensity matrix | + +See the [CV Term Usage Guide](TODO:link) for details on how each type is encoded in mzQC. + +--- + +## Putting It All Together + +Each QC metric term in the PSI-MS CV will therefore include seven semantic relationships: + +| Relationship | Refers to | Example value | +| ----------------------------- | ---------------------- | ---------------------------------- | +| `part_of_workflow_stage` | Workflow stage | chromatography stage | +| `measures_property` | Analytical dimension | chromatographic performance metric | +| `depends_on_data_type` | Information dependency | raw acquisition data | +| `has_measurement_scope` | Aggregation level | run level | +| `applies_to_acquisition_mode` | Acquisition strategy | acquisition mode independent | +| `has_quality_directionality` | Interpretation | lower is better | +| `has_value_type` | Value structure | tuple | + +These relationships together provide a complete, machine-readable semantic description of any QC metric. + +In conclusion: + +* Each dimension describes a different facet of a QC metric. +* Together they make the CV complete, consistent, and queryable. +* Contributors defining new metrics should select one subclass from each dimension. +* Developers can use these relationships to automatically filter, group, and interpret QC results. diff --git a/docs/pages/cv/howto_create_cv_terms.md b/docs/pages/cv/howto_create_cv_terms.md index 1f72189..465c3b2 100644 --- a/docs/pages/cv/howto_create_cv_terms.md +++ b/docs/pages/cv/howto_create_cv_terms.md @@ -1,240 +1,182 @@ --- layout: page -title: "Metrics - create" +title: "Metrics – Create" permalink: /metrics/create --- -# CV Term Creation Guide -New CV terms have to be requested via the [mzQC GitHub issue tracker](https://github.com/HUPO-PSI/mzQC/issues). -Upon creating a new issue, you should select the "Request for new CV term" option. -This will produce a template that will guide you in providing the necessary information to request your new CV term, as detailed below. -If additional information or clarifications beyond the initial request are needed, the mzQC working group will work with you to finalize your CV term request. -When all the necessary information has been provided, a new CV term will be created based on the request and added to the QC CV. +# PSI-MS QC Metrics Term Creation Guide -## Required Information +*How to define and request new QC metrics for the PSI-MS Controlled Vocabulary.* -Each metric (and CV entry request) MUST include the following information +## What this guide is for -- Name: A (short) string describing your metric. -- Definition: A longer description. This MUST include information about how the metric should be represented in an mzQC file. -- Comment: OPTIONAL details on how the metric should be interpreted (e.g. is a higher value better, can it only be interpreted relative to...). -- Value type: Is the metric type a single value, an n-tuple, a table, or a matrix? -- Unit: OPTIONAL unit of the value, specified using an existing CV term. -- Categorization: A categorization can OPTIONALLY be supplied. Examples are whether the metric depends on spectrum, peptide, protein, or metabolite identifications; or to describe the metric context. +This document explains how to **request new QC metric terms** or **update existing ones** in the [PSI-MS Controlled Vocabulary](https://github.com/HUPO-PSI/psi-ms-CV). +It shows: -## Restrictions +* what information to include, +* how to write clear definitions, and +* how to classify metrics in line with the QC metric ontology used in mzQC. -The text in `Name`, `Definition`, and `Comment` MUST NOT contain escaped characters, such as `\"`, or special characters, such as backticks (`` ` ``). -If you need to quote words or sentences, use single quotes, e.g. `def: "A QC metric describes the basis for the metric calculation like 'one MS run' or 'one spectrum'." [PSI:QC]`. -Further restrictions to some term elements may apply; please see details in the [Term Element Details](#term-element-details) section. +These guidelines ensure that all QC metrics: -## Example CV term +* are **semantically consistent** and **machine-readable**, +* fit naturally into mzQC and related PSI formats, +* and remain **traceable** to their scientific or software origin. -``` -[Term] -id: QC:4000059 -name: Number of MS1 spectra -def: "The number of MS1 events in the run." [PSI:QC] -is_a: QC:4000003 ! single value -is_a: QC:4000010 ! ID free -is_a: QC:4000023 ! MS1 metric -comment: A lower number of MS1 spectra acquired during one sample run compared to similar runs can indicate mismatched instrument settings or issues with the instrumentation or issues with sample amounts. -relationship: has_relation MS:1000579 ! MS1 spectrum -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_units UO:0000189 -synonym: "MS1-Count" EXACT [] -``` +This guide applies to QC metrics from **proteomics**, **metabolomics**, and related mass spectrometry workflows. -## Term Element Details +## How to request a new QC metric -### ID +All new terms are proposed through GitHub. -``` -id: QC:4000059 -``` +1. Go to the [PSI-MS-CV repository](https://github.com/HUPO-PSI/psi-ms-CV). +2. Create a new issue using the **"New QC Term"** template. +3. Fill in the required fields (see below). +4. Discuss the proposal with maintainers in the issue comments. +5. Once approved, curators assign an accession number (`MS:4000XXX`) and add the term to the next CV release. -Each term MUST have a unique ID, specified as `QC:XXXXXXX`. -Metric IDs are immutable and not reusable (e.g. for redefinition), and will be assigned upon inclusion or redefinition. +If you're refining or updating an existing term, just open an issue referencing its ID. -### Name +> [!NOTE] +> Expect some discussion — the maintainers help ensure consistency and alignment with existing terms. -``` -name: Number of MS1 spectra -``` +## Before you start -Each CV term MUST have a human-readable name. -The name SHOULD be informative, SHOULD consist of maximum 100 characters, and SHOULD only consist of alphanumeric 7-bit ASCII characters, spaces, and punctuation marks ([\-_,\.]). +Check first: -### Definition +* Search the CV (for example in [OLS](https://www.ebi.ac.uk/ols/ontologies/ms)) to ensure that your metric doesn't already exist. +* Verify that your metric is not just a variant or combination of an existing one. +* Collect supporting references (papers, software documentation, mzQC files). -``` -def: "The number of MS1 events in the run." [PSI:QC] -``` +If you find something close but not identical, note that in your request — it helps curators decide whether to extend or merge existing terms. -The definition SHOULD consist of a short explanation of the term and how it should be stored in the mzQC file. -The description SHOULD also provide aid in interpreting the values. -The definition section SHOULD NOT contain calculation or interpretation details, but rather it should explain the purpose, requirements, and scope of the metric. +## What information you'll need -### Comment +Each new QC metric request must contain: -``` -comment: A lower number of MS1 spectra acquired during one sample run compared -to similar runs can indicate mismatched instrument settings or issues with the -instrumentation or issues with sample amounts. -``` +| **Element** | **What to provide** | +| --- | --- | +| **Name** | Short, descriptive title for the metric. Example: `XIC-FWHM quantiles`. | +| **Definition** | One or two sentences explaining what the metric measures, how it is summarized, and what its values mean. | +| **Comment** | *(Optional)* Additional details about computation, conventions, or interpretation. | +| **Units** | Physical or statistical unit (e.g., `UO:0000010 ! second`, `UO:0000187 ! percent`). | +| **Value type** | Structural type of the metric value: single value, tuple, table, or matrix. | +| **Semantic classification** | Seven relationships that describe what kind of metric this is (see below). | +| **Provenance** | Software or publication the metric originates from, e.g. `xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671]`. | -The comment section SHOULD contain calculation and interpretation details, like whether smaller or larger values are desirable. -It is also RECOMMENDED to give a short explanation about how the metric works. -If the metric calculation is not obvious, the calculation is RECOMMENDED to be briefly described in common terms. -For published metrics, it is also RECOMMENDED to refer to the corresponding code. +> [!TIP] +> Keep names short and specific. Avoid tool names in the title — use `xref` for that. -### Value Type and Unit +## How to structure your metric definition -``` -is_a: QC:4000003 ! single value -property_value: has_units UO:0000189 +Here's what a complete metric definition looks like: + +```obo +[Term] +id: MS:4000051 +name: XIC-FWHM quantiles +def: "Summarizes the distribution of chromatographic peak widths, expressed as the full width at half maximum (FWHM) of extracted ion chromatograms (XICs). Reports an ordered tuple of the first through (n-1)-th quantiles (Q1, ..., Qn-1) of the FWHM distribution within a single run. Lower values indicate narrower peaks and therefore better chromatographic performance." +comment: "Values are reported as an (n-1)-element tuple of floating-point numbers in seconds, representing the first to (n-1)-th quantiles of the FWHM distribution. The final quantile (100th percentile) is omitted because it corresponds to the maximum observed peak width, which is a boundary value that does not convey additional information about distribution shape or variability and is sensitive to outliers. The tuple length implicitly specifies how many quantiles are reported and thus the resolution of the summary." +is_a: MS:4000001 ! QC metric +relationship: part_of_workflow_stage MS:XXXXXXX ! chromatography stage +relationship: measures_property MS:XXXXXXX ! chromatographic performance metric +relationship: depends_on_data_type MS:XXXXXXX ! raw acquisition data +relationship: has_measurement_scope MS:XXXXXXX ! run level +relationship: applies_to_acquisition_mode MS:XXXXXXX ! acquisition mode independent +relationship: has_quality_directionality MS:XXXXXXX ! lower is better +relationship: has_value_type MS:XXXXXXX ! tuple +relationship: has_value_concept MS:1000086 ! full width at half-maximum +relationship: has_value_concept STATO:0000291 ! quantile +relationship: has_units UO:0000010 ! second +relationship: has_value_type xsd:float +xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671] +xref: QuaMeter:XIC-FWHM-Q2 [PMID:24494671] +xref: QuaMeter:XIC-FWHM-Q3 [PMID:24494671] ``` -A single value metric with a count as unit (`UO:0000189`). +## Metric classification -``` -is_a: QC:4000003 ! single value -property_value: has_units UO:0000221 -property_value: has_type STATO:0000237 -``` +QC metrics can be categorized according to several classification dimensions. +Together these describe *what the metric measures, where it applies, and how it behaves.* -A single value metric with as unit the standard deviation (`STATO:0000237`) in Dalton (`UO:0000221`), for example, the standard deviation of the distribution of precursor mass errors of identified spectra. +| **Dimension** | **Relationship** | **Example** | **Meaning** | +| --- | --- | --- | --- | +| **Workflow stage** | `part_of_workflow_stage` | `chromatography stage` | Where in the experimental or computational workflow the metric applies. | +| **Analytical dimension** | `measures_property` | `chromatographic performance metric` | What underlying property is measured. | +| **Information dependency type** | `depends_on_data_type` | `raw acquisition data` | What kind of data the metric requires (raw, ID, quant, hybrid). | +| **Measurement scope** | `has_measurement_scope` | `run level` | At what aggregation level it summarizes data (spectrum, run, batch, study). | +| **Acquisition strategy** | `applies_to_acquisition_mode` | `acquisition mode independent` | Which acquisition or instrument mode it applies to. | +| **Quality interpretation type** | `has_quality_directionality` | `lower is better` | How to interpret values in terms of quality. | +| **Metric value type** | `has_value_type` | `tuple` | Structural type of the output. | -Each term that reports a value MUST indicate the corresponding value type using an `is_a` relation. -Different value types are possible: single value, n-tuple, table, or matrix. -A value must be associated with a unit, see below. -Depending on the value type, different additional categorization is REQUIRED. +These relationships make each metric's meaning explicit and enable better machine reasoning. -- **single value:** Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -- **n-tuple:** An ordered list/array of length 'n'. Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -Units and types (optional) MUST be uniform for all values. -An n-tuple is represented by a JSON array, which implicitly defines its length 'n'. -- **table:** A table MUST have one or more columns defined using `has_column` and MAY have optional columns defined using `has_optional_column`. -A table is represented using a JSON key–value object where key(s) represent the column term names/accessions and the value(s) are JSON arrays of uniform value type and -length. -- **table column type definitions:** Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -The term name will be used as the column's header. -- **matrix:** Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -Units and types (optional) MUST be uniform for all values. -A matrix is represented by a JSON array of JSON arrays where the inner arrays MUST be of uniform length, which implicitly defines the matrix dimensions. +The [QC Metric Classification Reference](TODO:link) page provides full details of the available subclasses for each dimension (e.g. all workflow stages, analytical dimensions, acquisition modes, etc.), including definitions, examples, and how they map to existing PSI-MS CV terms. -Units SHOULD be sourced from the [Units of Measurement Ontology (UO)](https://www.ebi.ac.uk/ols/ontologies/uo), if available, otherwise from the -[Statistical Methods Ontology (STATO)](http://stato-ontology.org/) or others as necessary. -Protein modifications SHOULD be sourced from [Unimod](http://www.unimod.org/) or [PSI-MOD](https://github.com/HUPO-PSI/psi-mod-CV) where possible. +Use that reference when selecting the appropriate classification terms for your new metric. -### Metric Categorization +## Quantitative details: what the numbers mean -``` -is_a: QC:4000010 ! ID free -is_a: QC:4000023 ! MS1 metric -relationship: has_relation MS:1000579 ! MS1 spectrum -relationship: has_relation QC:4000013 ! QC metric relation: single run -``` +To make your metric's values interpretable and comparable: -Different types of categorization can be assigned to CV terms. -First, it is RECOMMENDED to specify whether a metric requires identification information to be computed (ID based) or not (ID free). -Second, additional categories to describe the metric context (from which data the metric is derived, to which element of the instrumental setup the metric pertains, etc.) can be specified as well. -It is RECOMMEND to align the categorization of novel metrics to existing terms to facilitate consumption of related metrics. +* `has_value_concept` → what the values represent. + Example: `STATO:0000291 ! quantile`, `MS:1000086 ! full width at half-maximum`. +* `has_units` → the unit of measurement (preferrably from the [Units of Measurement Ontology (UO)](https://www.ebi.ac.uk/ols/ontologies/uo)). + Example: `UO:0000010 ! second`, `UO:0000187 ! percent`. +* `has_value_type` → data type used. -``` -property_value: has_units UO:0000010 -property_value: has_column QC:4000117 -``` +These fields help mzQC readers and validation tools understand how to process the data. -If the metric term has an associated value, its unit MUST be defined using the `property_value` tag. -"Single", "n-tuple", and "matrix" type values MUST be assigned a single, uniform unit type with `has_units`. -For "table" type values, one or more `has_column_type`/`has_optional_column_type` specifications MUST be associated with the table. -These implicitly define the column units through the `has_units` attributes of the corresponding column definitions. +## Writing clear definitions and comments -``` -property_value: has_type STATO:0000237 -``` +**Good definitions:** -For full semantic integration, it is RECOMMENDED to specify the value type for automatic processing and interpretation of the value. -It is RECOMMENDED to source value types from [STATO](http://stato-ontology.org/). +* Start with what the metric summarizes — **don't** begin with "A QC metric that..." +* Mention the data type or entity, and the summary statistic. +* End with an interpretation if relevant ("Lower values indicate better performance"). -### Additional Information +**Comments:** -``` -synonym: "MS1-Count" EXACT [] -``` +Use the `comment:` only to clarify: -In case of reimplementing, renaming, or redefining a metric, it is RECOMMENDED to also add synonym attributes with either the name or ID of the initial metric. -It is not required for the initial metric to be included in any controlled vocabulary, but the name SHOULD be unambiguous and recognizable (e.g. from the source publication). -Synonyms can be "RELATED" (the defined metric is similar, but not the same as what is connected with the synonym name), "NARROW" (the metric's values can be identically interpreted as in the meaning of the synonym metric, however, definition and calculation may somewhat differ), "EXACT" (the defined metric is basically a result of renaming). +* Implementation details (e.g., number of values, normalization). +* Context or rationale (e.g., why a value is omitted). -## More CV Term Examples +Avoid repeating the definition. -**Single value:** +## Provenance and references -``` -[Term] -id: QC:4000050 -name: XIC-WideFrac -def: "The fraction of precursor ions accounting for the top half of all peak widths" [PSI:QC] -is_a: QC:4000003 ! single value -is_a: QC:4000010 ! ID free -is_a: QC:4000020 ! XIC metric -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_units UO:0000191 ! fraction +Always cite the origin of the metric: + +```obo +xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671] ``` -**n-tuple:** +* Use PMIDs or DOIs when available. +* If multiple related metrics exist (e.g. Q1, Q2, Q3), include multiple `xref:` lines. -``` -[Term] -id: QC:4000051 -name: XIC-FWHM quantiles -def: "The first to n-th quantile of peak widths for the wide XICs." [PSI:QC] -is_a: QC:4000004 ! n-tuple -is_a: QC:4000010 ! ID free -is_a: QC:4000020 ! XIC metric -relationship: has_relation MS:1000086 ! full width at half-maximum -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_units UO:0000010 ! second -synonym: "XIC-FWHM-Q1" RELATED [] -synonym: "XIC-FWHM-Q2" RELATED [] -synonym: "XIC-FWHM-Q3" RELATED [] -``` +## Updating or extending metrics -**Table:** +If you need to improve an existing term (e.g., clearer definition, missing relationships): -``` -[Term] -id: QC:4000063 -name: MS2 known precursor charges fractions -def: "The fraction of MS/MS precursors of the corresponding charge. The fractions [0,1] are given in the 'Fraction' column, corresponding charges in the 'Charge state' column. The highest charge state is to be interpreted as that charge state or higher. " [PSI:QC] -is_a: QC:4000006 ! table -is_a: QC:4000010 ! ID free -is_a: QC:4000024 ! MS2 metric -is_a: QC:4000025 ! ion source metric -relationship: has_relation MS:1000041 ! charge state -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_column: QC:4000238 ! Charge state -property_value: has_column: QC:4000239 ! Fraction -synonym: "MS2-PrecZ-1" RELATED [] -synonym: "MS2-PrecZ-2" RELATED [] -synonym: "MS2-PrecZ-3" RELATED [] -synonym: "MS2-PrecZ-4" RELATED [] -synonym: "MS2-PrecZ-5" RELATED [] -synonym: "MS2-PrecZ-more" RELATED [] +* Open an issue referencing its ID. +* Describe what should change and why. +* Curators may update the term or merge it with others if appropriate. -[Term] -id: QC:4000238 -name: Charge state -def: "The column contains charge states." [PSI:QC] -is_a: QC:4000107 ! Column type -property_value: has_units MS:1000041 ! charge state +Deprecated metrics are marked with `is_obsolete: true` and replaced by a new one via `replaced_by:`. -[Term] -id: QC:4000239 -name: Fraction -def: "The column contains fraction values as decimals." [PSI:QC] -is_a: QC:4000107 ! Column type -property_value: has_units UO:0000191 ! fraction -``` +## Quick reference + +**When writing new metrics:** + +* ✅ Keep names and definitions short and specific. +* ✅ Use one relationship per classification dimension. +* ✅ Include `has_value_concept`, `has_units`, and `has_value_type`. +* ✅ Provide provenance (`xref:`). +* ✅ Test for uniqueness before submitting. + +**Avoid:** + +* ❌ Tool names in the metric name. +* ❌ Definitions that describe algorithms instead of meaning. +* ❌ Redundant comments. diff --git a/docs/pages/cv/howto_use_cv_terms.md b/docs/pages/cv/howto_use_cv_terms.md index 73b847a..ccde68e 100644 --- a/docs/pages/cv/howto_use_cv_terms.md +++ b/docs/pages/cv/howto_use_cv_terms.md @@ -1,34 +1,96 @@ --- layout: page -title: "Metrics - use" +title: "Metrics – Use" permalink: /metrics/use --- -# CV Term Usage Guide -The translation from CV terms to elements in an mzQC file depends on the term's value type and is pretty straightforward. -Following, the different value types uses are exemplified . +# PSI-MS QC Metrics Usage Guide -## Single Value +*How to use QC CV terms correctly in mzQC files* -To report the number of MS1 scans in a peak file: +## Introduction +This guide explains **how to use QC metric CV terms** from the [PSI-MS Controlled Vocabulary](https://github.com/HUPO-PSI/psi-ms-CV) when creating or reading **mzQC files**. +You don't need to be an ontology expert — just follow these examples to ensure your QC data is: + +* **Standardized** (compatible across tools), +* **Machine-readable** (interpretable by validators), and +* **Traceable** (linked to known metrics in the PSI-MS CV). + +## How CV metrics map to mzQC + +Each QC metric in mzQC corresponds to one entry in the PSI-MS CV (`MS:4000XXX`). +That entry defines: + +* The **metric name** and **definition**, +* Its **units** and **value type**, +* And semantic information (e.g. whether it's run-level, ID-based, or LC-related). + +When you reference a CV term in your mzQC file, you're telling mzQC-compatible software **exactly what kind of data this metric represents**. + +Example (simplified): + +```json +{ + "accession": "MS:4000059", + "name": "number of MS1 spectra", + "value": 8259, + "unit": { + "accession": "UO:0000189", + "name": "count unit" + } +} ``` + +## Metric value types + +Each QC metric defines **how its values are structured**, through the `has_value_type` relationship. +mzQC supports four value structures: + +| Value type | Structure | Example use | +| ---------------- | -------------------------------- | ------------------------------------- | +| **Single value** | One numeric or categorical value | number of MS1 spectra | +| **Tuple** | Ordered list of values | quantiles, summary statistics | +| **Table** | Named columns with multiple rows | precursor charge fractions | +| **Matrix** | Rectangular numerical array | image-like data, correlation matrices | + +Your mzQC file must follow the value structure declared in the CV. + +## Single-value metrics + +**Definition:** +A metric represented by one scalar value (numeric or categorical). + +**mzQC encoding:** + +* Directly report the single value in the `"value"` field. +* Include the `"unit"` object if defined in the CV. +* The data type (integer, float, string) must match the CV's declared `xsd:` type. + +**Example:** + +CV definition: + +```obo [Term] id: MS:4000059 name: number of MS1 spectra -def: "The number of MS1 events in the run." [PSI:MS] -synonym: "MS1-Count" EXACT [PMID:24494671] -is_a: MS:4000003 ! single value -relationship: has_metric_category MS:4000009 ! ID free metric -relationship: has_metric_category MS:4000012 ! single run based metric -relationship: has_metric_category MS:4000021 ! MS1 metric -relationship: has_value_type xsd:int ! The allowed value-type for this CV term +def: "Counts the number of MS1 scans within a single run." +relationship: part_of_workflow_stage MS:4000XXX ! mass spectrometry acquisition stage +relationship: measures_property MS:4000XXX ! acquisition coverage metric +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! higher is better +relationship: has_value_type MS:4000XXX ! single value relationship: has_units UO:0000189 ! count unit +relationship: has_value_type xsd:int +xref: QuaMeter:MS1-Count [PMID:24494671] ``` -A corresponding `qualityMetric` object in an mzQC file: +mzQC representation: -``` +```json { "accession": "MS:4000059", "name": "number of MS1 spectra", @@ -40,30 +102,46 @@ A corresponding `qualityMetric` object in an mzQC file: } ``` -## n-tuple +## Tuple metrics -To report the number of MS2 scans per quantile: +**Definition:** +A metric consisting of an ordered list of scalar values (e.g. quantiles, min/median/max triplets). +All values share the same semantic meaning and unit. -``` +**mzQC encoding:** + +* The `"value"` field is a JSON array of numbers. +* Include a single `"unit"` object applying to all elements. +* The CV term defines the interpretation (e.g., "first to (n−1)-th quantiles"). + +**Example:** + +CV definition: + +```obo [Term] id: MS:4000062 name: MS2 density quantiles -def: "The first to n-th quantile of MS2 peak density (scan peak counts). A value triplet represents the original QuaMeter metrics, the quartiles of MS2 density. The number of values in the tuple implies the quantile mode." [PSI:MS] -synonym: "MS2-Density-Q1" RELATED [PMID:24494671] -synonym: "MS2-Density-Q2" RELATED [PMID:24494671] -synonym: "MS2-Density-Q3" RELATED [PMID:24494671] -is_a: MS:4000004 ! n-tuple -relationship: has_metric_category MS:4000009 ! ID free metric -relationship: has_metric_category MS:4000012 ! single run based metric -relationship: has_metric_category MS:4000022 ! MS2 metric -relationship: has_value_type xsd:int ! The allowed value-type for this CV term -relationship: has_value_concept NCIT:C45781 ! Density +def: "Summarizes the distribution of spectral peak density in MS2 scans as quantiles of the number of fragment peaks per spectrum within a single run. The metric reports an ordered tuple of the first through (n−1)-th quantiles (Q1, ..., Qn−1), characterizing the overall fragmentation complexity and consistency across spectra." +comment: "Values are reported as an (n−1)-element tuple of counts, representing the first to (n−1)-th quantiles of the distribution of fragment peak counts per MS2 spectrum. The final quantile (100th percentile) is omitted because it corresponds to the maximum observed peak count, which is a boundary value that does not convey additional information about distribution shape or variability and is sensitive to outliers. The tuple length implicitly specifies how many quantiles are reported and thus the resolution of the summary. Lower quantiles correspond to sparsely fragmented spectra; higher quantiles indicate spectra with more peaks. Interpretation depends on the acquisition and fragmentation settings and should be treated as context dependent rather than strictly higher- or lower-is-better." +relationship: part_of_workflow_stage MS:4000XXX ! mass spectrometry acquisition stage +relationship: measures_property MS:4000XXX ! spectral quality metric +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! context dependent +relationship: has_value_type MS:4000XXX ! tuple +relationship: has_value_concept STATO:0000291 ! quantile relationship: has_units UO:0000189 ! count unit +relationship: has_value_type xsd:int +xref: QuaMeter:MS2-Density-Q1 [PMID:24494671] +xref: QuaMeter:MS2-Density-Q2 [PMID:24494671] +xref: QuaMeter:MS2-Density-Q3 [PMID:24494671] ``` -A corresponding `qualityMetric` object in an mzQC file: +mzQC representation: -``` +```json { "accession": "MS:4000062", "name": "MS2 density quantiles", @@ -72,60 +150,51 @@ A corresponding `qualityMetric` object in an mzQC file: "accession": "UO:0000189", "name": "count unit" } -}, +} ``` -## Table +## Table metrics -To report the MS/MS precursor charge states: +**Definition:** +A metric represented as columns of equal-length lists, each describing one variable. +Essentially a named column table with one row per observation. -``` +**mzQC encoding:** + +* `"value"` is an object where each key is a column ID and its value is a list. +* Each column has an optional unit. +* All columns must have identical list lengths — each index corresponds to one row. +* Units are provided as an array under `"unit"` and as part of the column definition. + +**Example:** + +CV definition: + +```obo [Term] id: MS:4000063 -name: MS2 known precursor charges fractions -def: "The fraction of MS/MS precursors of the corresponding charge. The fractions [0,1] are given in the 'Fraction' column, corresponding charges in the 'Charge state' column. The highest charge state is to be interpreted as that charge state or higher." [PSI:MS] -synonym: "MS2-PrecZ-1" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-2" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-3" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-4" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-5" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-more" NARROW [PMID:24494671] -synonym: "IS-3A" RELATED [PMID:19837981] -synonym: "IS-3B" RELATED [PMID:19837981] -synonym: "IS-3C" RELATED [PMID:19837981] -comment: the MS2-PrecZ metrics can be directly read from the table respective table rows, the ratios of IS-3 metrics must be derived from the respective table rows, IS-3A as ratio of +1 over +2, IS-3B as ratio of +3 over +2, IS-3C as +4 over +2. -is_a: MS:4000005 ! table -relationship: has_metric_category MS:4000009 ! ID free metric -relationship: has_metric_category MS:4000012 ! single run based metric -relationship: has_metric_category MS:4000020 ! ion source metric -relationship: has_metric_category MS:4000022 ! MS2 metric +name: MS2 known precursor charge fractions +def: "Fraction of MS/MS precursors for each charge state observed within a run. Each entry lists a precursor charge (z) and its corresponding fraction of all observed MS2 precursors." +comment: "Values are reported as a table with two columns: 'Charge state' and 'Fraction'. The final charge state bin should be interpreted as 'that charge state or higher' to include all unlisted higher charges." +relationship: part_of_workflow_stage MS:4000XXX ! ionization stage +relationship: measures_property MS:4000XXX ! ionization quality metric +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: has_value_type MS:4000XXX ! table relationship: has_column MS:1000041 ! charge state relationship: has_column UO:0000191 ! fraction - -[Term] -id: MS:1000041 -name: charge state -def: "Number of net charges, positive or negative, on an ion." [PSI:MS] -synonym: "z" EXACT [] -is_a: MS:1000455 ! ion selection attribute -is_a: MS:1000507 ! ion property -relationship: has_value_type xsd:int ! The allowed value-type for this CV term - -[RDF extract] -id: UO:0000191 -name: fraction -def: "A dimensionless ratio unit which relates the part (the numerator) to the whole (the denominator). [Wikipedia:Wikipedia]" +relationship: has_value_type xsd:float ``` -A corresponding `qualityMetric` object in an mzQC file: +mzQC representation: -``` +```json { "accession": "MS:4000063", - "name": "MS2 known precursor charges fractions", + "name": "MS2 known precursor charge fractions", "value": { - "Charge state": ["1","2","3","4","5","6"], - "Fraction": [0.000,0.683,0.305,0.008,0.002,0.002] + "MS:1000041": [1, 2, 3, 4, 5, 6], + "UO:0000191": [0.000, 0.683, 0.305, 0.008, 0.002, 0.002] }, "unit": [ { @@ -139,4 +208,33 @@ A corresponding `qualityMetric` object in an mzQC file: ] } ``` -The units of a table instance are implicitly assumed through their respective columns' definition and if available as unit terms, documented in the unit array of the instance for clarity. \ No newline at end of file + +## Matrix metrics + +**Definition:** +A metric that stores a rectangular grid of numeric values of the same type and unit. + +**mzQC encoding:** + +* `"value"` is a rectangular list of lists of numbers (each inner list = a matrix row). +* A single `"unit"` applies to all entries. +* Only homogeneous numeric types are allowed (no mixed datatypes). + +## Understanding the relationships + +Each metric term in the CV includes semantic relationships that describe *how* and *where* it applies. +These don't appear directly in mzQC files, but they're important for consistency and validation. + +| Relationship | Describes | Example | +| ----------------------------- | ----------------------------------- | ------------------------------------ | +| `part_of_workflow_stage` | Experimental or computational stage | `chromatography stage` | +| `measures_property` | Quality dimension measured | `chromatographic performance metric` | +| `depends_on_data_type` | Type of data used | `raw acquisition data` | +| `has_measurement_scope` | Level of aggregation | `run level` | +| `applies_to_acquisition_mode` | Acquisition mode | `DIA-specific metric` | +| `has_quality_directionality` | Interpretation of values | `lower is better` | +| `has_value_type` | Structure of the value | `tuple` | + +These relationships are how the CV ensures every metric is comparable, searchable, and logically complete. + +For full details and all available subclasses (e.g., all workflow stages or acquisition modes), see the [QC Metric Classification Reference](TODO:link). From 6c82f889be1daf6b15ef4552e2e73b0f07c574a6 Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Thu, 6 Nov 2025 10:41:43 +0100 Subject: [PATCH 2/4] Update CV documentation --- docs/pages/cv/classification_reference.md | 222 ++++++++++++++-------- docs/pages/cv/howto_create_cv_terms.md | 123 +++++++----- docs/pages/cv/howto_use_cv_terms.md | 48 ++--- docs/pages/metrics.md | 31 ++- 4 files changed, 273 insertions(+), 151 deletions(-) diff --git a/docs/pages/cv/classification_reference.md b/docs/pages/cv/classification_reference.md index ae1f9a2..e0e0b16 100644 --- a/docs/pages/cv/classification_reference.md +++ b/docs/pages/cv/classification_reference.md @@ -4,38 +4,76 @@ title: "Metrics – Classification Reference" permalink: /metrics/classification --- -# QC Metric Classification Reference - -*Standardized semantic categories for PSI-MS quality control metrics* +*Standardized semantic categories for PSI-MS quality control metrics.* ## Overview -Each QC metric in the PSI-MS Controlled Vocabulary (CV) is annotated using **seven independent classification dimensions**. -Together, these describe *what a metric measures*, *where it applies in the workflow*, and *how it should be interpreted*. +Each QC metric in the [PSI-MS Controlled Vocabulary (CV)](https://github.com/HUPO-PSI/psi-ms-CV) is annotated using **seven independent classification dimensions**. +First, the **analytical dimension** defines *what kind of metric it is* and is encoded as **inheritance** using `is_a`. +The other six are **typed relationships** that describe *where the metric applies, what it depends on, how to interpret it,* and *how to serialize it in mzQC*. + +**At a glance** + +| Dimension | Encoded as | Purpose | +| ------------------------------- | -------------- | ------------------------------------------------------------------- | +| **Analytical dimension** | `is_a` | Defines the metric subtype (what kind of QC metric it *is*) | +| **Workflow stage** | relationship | Where in the experimental/computational pipeline the metric applies | +| **Information dependency type** | relationship | What type of input data the metric needs | +| **Measurement scope** | relationship | At what aggregation level the metric summarizes data | +| **Acquisition strategy** | relationship | Which acquisition/mode or platform it applies to | +| **Quality interpretation type** | relationship | How to interpret higher/lower/targeted values | +| **Metric value type** | relationship | How the values are structurally represented (single, tuple, etc.) | + +**Rule of thumb:** +Every QC metric has exactly one `is_a` (analytical dimension) and one value from each of the other six relationship dimensions. + +## Part 1 — Inheritance: Analytical dimension + +**What it is:** +The analytical dimension defines the *type of QC metric*. +This is the only dimension expressed via **inheritance** (`is_a`) because it establishes the metric's place in the taxonomy. + +**How to use it:** +Choose exactly one of the following as the metric's `is_a` parent: -Every metric must define **exactly one relationship** from each dimension. -This ensures complete, machine-interpretable semantics across all QC terms used in mzQC and related standards. +#### Subclasses -| Dimension | Describes | -| ------------------------------- | ---------------------------------------------------------------------- | -| **Workflow stage** | Where in the experimental or computational pipeline the metric applies | -| **Analytical dimension** | What fundamental property the metric measures | -| **Information dependency type** | What type of data the metric depends on | -| **Measurement scope** | At what aggregation level the metric summarizes data | -| **Acquisition strategy** | Which acquisition or instrument configuration it applies to | -| **Quality interpretation type** | How metric values relate to data quality | -| **Metric value type** | How metric values are structurally represented (single, tuple, etc.) | +- **Acquisition coverage metric:** how comprehensively data were collected (e.g., scan counts, sampling density). +- **Mass accuracy metric:** deviation between observed and theoretical _m_/_z_. +- **Intensity stability metric:** variation of signal intensity over time. +- **Chromatographic performance metric:** separation performance (e.g., eak width, symmetry, RT reproducibility). +- **Ionization quality metric:** properties of the precursor ion population (e.g., charge-state distribution, adduct prevalence). +- **Ion mobility metric:** IMS resolution, drift-time/CCS accuracy and reproducibility. +- **Spectral quality metric:** quality of individual spectra (e.g., peak density, S/N, completeness). +- **Fragmentation efficiency metric:** effectiveness of precursor ion fragmentation to produce interpretable spectra. +- **Isolation purity metric:** precursor isolation selectivity or co-isolation of interfering species. +- **Identification confidence metric:** reliability of identifications (e.g., FDR, ID rate). +- **Quantification precision metric:** reproducibility or variability of quantitative results. +- **Contamination metric:** unwanted signal from contaminants, carryover, or background. +- **Instrument operational performance metric:** general indicators of instrument health (e.g., vacuum, detector voltage, temperature). +- **Missingness/completeness metric:** data absence or completeness across features, runs, or studies. -## Workflow Stage +#### CV example (inheritance only) + +```obo +is_a: MS:4000XXX ! chromatographic performance metric +``` + +## Part 2 — Typed relationships + +The remaining six dimensions are not types; they are **properties** of a metric. +They must be encoded using the specified **relationship** predicates (one value per dimension). + +### 1. Workflow stage — `part_of_workflow_stage` **Definition:** The experimental or computational stage of the workflow to which a QC metric applies. This tells *where* in the process the metric is relevant — from sample preparation through acquisition and analysis. -### Subclasses +#### Subclasses -#### Experimental workflow stage +**Experimental workflow stage** Metrics describing quality at the laboratory or instrument level. @@ -57,7 +95,7 @@ Metrics describing quality at the laboratory or instrument level. *Example:* mass-accuracy drift, spray stability. * **Instrument calibration stage:** metrics derived from calibration routines or control samples. -#### Data analysis workflow stage +**Data analysis workflow stage** Metrics evaluating computational processing and interpretation steps. @@ -68,41 +106,23 @@ Metrics evaluating computational processing and interpretation steps. *Example:* CV of peptide intensities, ratio reproducibility. * **Integration stage:** metrics related to alignment, normalization, or data integration across runs. -#### Environmental condition monitoring +**Environmental condition monitoring** Metrics about environmental conditions that can indirectly affect results. *Example:* laboratory temperature, humidity, power fluctuations. -## Analytical Dimension - -**Definition:** -The fundamental property or aspect of data quality that the metric quantifies. - -Think of this as *what kind of problem the metric detects or describes*. - -### Subclasses +#### CV example -* **Acquisition coverage metric:** evaluates how comprehensively data were collected (e.g., scan counts, sampling density). -* **Mass accuracy metric:** measures deviation between observed and theoretical m/z values. -* **Intensity stability metric:** assesses signal intensity variation over time. -* **Chromatographic performance metric:** evaluates separation performance such as peak width, symmetry, or retention reproducibility. -* **Ionization quality metric:** evaluates properties of the ion population generated during ionization, such as charge-state distribution or adduct prevalence. -* **Ion mobility metric:** describes resolution, drift-time accuracy, or reproducibility in ion-mobility separations. -* **Spectral quality metric:** quantifies quality of individual spectra (e.g., peak density, signal-to-noise, completeness). -* **Fragmentation efficiency metric:** measures how efficiently precursor ions fragment to produce interpretable spectra. -* **Isolation purity metric:** evaluates precursor isolation selectivity or co-isolation of interfering species. -* **Identification confidence metric:** quantifies reliability of peptide or compound identifications (e.g., FDR, number of identified analytes). -* **Quantification precision metric:** measures reproducibility or variability of quantitative results. -* **Contamination metric:** detects unwanted signal from contaminants, carryover, or background. -* **Instrument operational performance metric:** general indicators of instrument health (e.g., vacuum level, temperature, detector voltage). -* **Missingness/completeness metric:** measures data absence or completeness across features, runs, or studies. +```obo +relationship: part_of_workflow_stage MS:4000XXX ! chromatography stage +``` -## Information Dependency Type +### 2. Information dependency type — `depends_on_data_type` **Definition:** Specifies which type of data input the metric requires to be computed. -### Subclasses +#### Subclasses * **Raw acquisition data:** metrics that can be calculated directly from the raw MS data, without identifications. *Example:* total ion current stability, scan count. @@ -114,12 +134,18 @@ Specifies which type of data input the metric requires to be computed. * **Reference data:** metrics requiring comparison to external standards or reference files. *Example:* RT deviation vs. iRT peptides, calibration QC.* -## Measurement Scope +#### CV example + +```obo +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +``` + +### 3. Measurement scope — `has_measurement_scope` **Definition:** Indicates the level of data aggregation the metric summarizes. -### Subclasses +#### Subclasses * **Spectrum level:** per-spectrum metrics (e.g., number of peaks, S/N ratio). * **Pixel/voxel level:** per-pixel metrics in imaging or spatial omics. @@ -128,14 +154,20 @@ Indicates the level of data aggregation the metric summarizes. * **Batch level:** aggregated across multiple related runs. * **Study level:** aggregated across an entire experiment or project. -## Acquisition Strategy +#### CV example + +```obo +relationship: has_measurement_scope MS:4000XXX ! run level +``` + +### 4. Acquisition strategy — `applies_to_acquisition_mode` **Definition:** Specifies which acquisition mode or instrument configuration the metric is relevant for. -### Subclasses +#### Subclasses -#### Acquisition mode +**Acquisition mode** * **Acquisition mode independent:** metrics valid for any acquisition method. * **Data-dependent acquisition (DDA):** metrics specific to stochastic precursor selection workflows. @@ -151,7 +183,7 @@ Specifies which acquisition mode or instrument configuration the metric is relev * **Other specialized mode:** metrics for advanced or hybrid acquisition modes such as BoxCar, MSⁿ, or multiplexed scanning. *Example:* BoxCar intensity uniformity across boxes. -#### Instrument platform specificity +**Instrument platform specificity** * **Orbitrap-specific:** metrics only applicable to Orbitrap instruments. *Example:* Orbitrap transient length stability. @@ -161,13 +193,19 @@ Specifies which acquisition mode or instrument configuration the metric is relev *Example:* Ion trap fill time distribution. * **Other platform-specific:** for quadrupoles, FT-ICR, or hybrid systems. -## Quality Interpretation Type +#### CV example + +```obo +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +``` + +### 5. Quality interpretation type — `has_quality_directionality` **Definition:** Describes how a metric's numeric value relates to overall quality. This enables automatic reasoning about whether "higher," "lower," or "targeted" values represent better data. -### Subclasses +#### Subclasses * **Higher is better:** increasing values indicate improved quality. *Example:* identification rate, mass accuracy score. @@ -180,13 +218,19 @@ This enables automatic reasoning about whether "higher," "lower," or "targeted" * **Categorical:** quality expressed as discrete categories (e.g., pass/fail, OK/warning/error). * **Trend:** metrics intended for temporal monitoring rather than direct ranking (e.g., instrument drift over time). -## Metric Value Type +#### CV example + +```obo +relationship: has_quality_directionality MS:4000XXX ! lower is better +``` + +### 6. Metric value type — `has_value_type` **Definition:** Specifies the structural format of the metric's reported value(s). This defines how the metric must be represented in mzQC. -### Subclasses +#### Subclasses | Type | Structure | Description | Example | | ---------------- | ----------------- | ------------------------------------------------------------- | ----------------------------- | @@ -195,29 +239,57 @@ This defines how the metric must be represented in mzQC. | **Table** | Named columns | Parallel lists of equal length; each column has its own unit. | MS2 charge fractions | | **Matrix** | Rectangular array | 2D array of homogeneous numeric values. | Ion-mobility intensity matrix | -See the [CV Term Usage Guide](TODO:link) for details on how each type is encoded in mzQC. +See the [CV Term Usage Guide](/metrics/use) for details on how each type is encoded in mzQC. --- -## Putting It All Together - -Each QC metric term in the PSI-MS CV will therefore include seven semantic relationships: - -| Relationship | Refers to | Example value | -| ----------------------------- | ---------------------- | ---------------------------------- | -| `part_of_workflow_stage` | Workflow stage | chromatography stage | -| `measures_property` | Analytical dimension | chromatographic performance metric | -| `depends_on_data_type` | Information dependency | raw acquisition data | -| `has_measurement_scope` | Aggregation level | run level | -| `applies_to_acquisition_mode` | Acquisition strategy | acquisition mode independent | -| `has_quality_directionality` | Interpretation | lower is better | -| `has_value_type` | Value structure | tuple | +## Worked examples + +### XIC-FWHM quantiles (tuple) + +* **Analytical dimension (`is_a`)**: *chromatographic performance metric* +* **Workflow**: *chromatography stage* +* **Data type**: *raw acquisition data* +* **Scope**: *run level* +* **Acquisition**: *mode independent* +* **Directionality**: *lower is better* +* **Value type**: *tuple* + +```obo +is_a: MS:4000XXX ! chromatographic performance metric +relationship: part_of_workflow_stage MS:4000XXX ! chromatography stage +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! lower is better +relationship: has_value_type MS:4000XXX ! tuple +``` + +### MS2 known precursor charge fractions (table) + +* **Analytical dimension (`is_a`)**: *ionization quality metric* +* **Workflow**: *ionization stage* +* **Data type**: *raw acquisition data* +* **Scope**: *run level* +* **Acquisition**: *mode independent* +* **Directionality**: *context dependent* +* **Value type**: *table* + +```obo +is_a: MS:4000XXX ! ionization quality metric +relationship: part_of_workflow_stage MS:4000XXX ! ionization stage +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! context dependent +relationship: has_value_type MS:4000XXX ! table +``` + +## Summary + +* Use **`is_a`** for the **analytical dimension** (the metric's type). +* Use **typed relationships** (one each) for the six **orthogonal facets**: workflow stage, data dependency, scope, acquisition strategy, quality interpretation, and value type. These relationships together provide a complete, machine-readable semantic description of any QC metric. -In conclusion: - -* Each dimension describes a different facet of a QC metric. -* Together they make the CV complete, consistent, and queryable. -* Contributors defining new metrics should select one subclass from each dimension. -* Developers can use these relationships to automatically filter, group, and interpret QC results. +For how to serialize each **metric value type** in mzQC (single, tuple, table, matrix), see the **[CV Term Usage Guide](/metrics/use)**. diff --git a/docs/pages/cv/howto_create_cv_terms.md b/docs/pages/cv/howto_create_cv_terms.md index 465c3b2..5c955cd 100644 --- a/docs/pages/cv/howto_create_cv_terms.md +++ b/docs/pages/cv/howto_create_cv_terms.md @@ -1,11 +1,9 @@ --- layout: page -title: "Metrics – Create" +title: "Metrics – Term Creation Guide" permalink: /metrics/create --- -# PSI-MS QC Metrics Term Creation Guide - *How to define and request new QC metrics for the PSI-MS Controlled Vocabulary.* ## What this guide is for @@ -20,14 +18,14 @@ It shows: These guidelines ensure that all QC metrics: * are **semantically consistent** and **machine-readable**, -* fit naturally into mzQC and related PSI formats, -* and remain **traceable** to their scientific or software origin. +* fit naturally into mzQC and related PSI formats, and +* remain **traceable** to their scientific or software origin. This guide applies to QC metrics from **proteomics**, **metabolomics**, and related mass spectrometry workflows. ## How to request a new QC metric -All new terms are proposed through GitHub. +All new terms are proposed through GitHub: 1. Go to the [PSI-MS-CV repository](https://github.com/HUPO-PSI/psi-ms-CV). 2. Create a new issue using the **"New QC Term"** template. @@ -44,9 +42,9 @@ If you're refining or updating an existing term, just open an issue referencing Check first: -* Search the CV (for example in [OLS](https://www.ebi.ac.uk/ols/ontologies/ms)) to ensure that your metric doesn't already exist. +* Search the CV (e.g., via [OLS](https://www.ebi.ac.uk/ols/ontologies/ms)) to make sure that your metric doesn't already exist. * Verify that your metric is not just a variant or combination of an existing one. -* Collect supporting references (papers, software documentation, mzQC files). +* Gather supporting documentation (publications, software references, mzQC examples). If you find something close but not identical, note that in your request — it helps curators decide whether to extend or merge existing terms. @@ -62,12 +60,13 @@ Each new QC metric request must contain: | **Units** | Physical or statistical unit (e.g., `UO:0000010 ! second`, `UO:0000187 ! percent`). | | **Value type** | Structural type of the metric value: single value, tuple, table, or matrix. | | **Semantic classification** | Seven relationships that describe what kind of metric this is (see below). | -| **Provenance** | Software or publication the metric originates from, e.g. `xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671]`. | +| **Provenance** | The origin of the metric (software or publication), e.g. `xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671]`. | > [!TIP] -> Keep names short and specific. Avoid tool names in the title — use `xref` for that. +> Keep names short and specific. +> Avoid tool names in the title — use `xref` for that. -## How to structure your metric definition +## Example of a complete metric definition Here's what a complete metric definition looks like: @@ -77,53 +76,73 @@ id: MS:4000051 name: XIC-FWHM quantiles def: "Summarizes the distribution of chromatographic peak widths, expressed as the full width at half maximum (FWHM) of extracted ion chromatograms (XICs). Reports an ordered tuple of the first through (n-1)-th quantiles (Q1, ..., Qn-1) of the FWHM distribution within a single run. Lower values indicate narrower peaks and therefore better chromatographic performance." comment: "Values are reported as an (n-1)-element tuple of floating-point numbers in seconds, representing the first to (n-1)-th quantiles of the FWHM distribution. The final quantile (100th percentile) is omitted because it corresponds to the maximum observed peak width, which is a boundary value that does not convey additional information about distribution shape or variability and is sensitive to outliers. The tuple length implicitly specifies how many quantiles are reported and thus the resolution of the summary." -is_a: MS:4000001 ! QC metric + +! --- Ontology classification --- +is_a: MS:4000XXX ! chromatographic performance metric relationship: part_of_workflow_stage MS:XXXXXXX ! chromatography stage -relationship: measures_property MS:XXXXXXX ! chromatographic performance metric relationship: depends_on_data_type MS:XXXXXXX ! raw acquisition data relationship: has_measurement_scope MS:XXXXXXX ! run level relationship: applies_to_acquisition_mode MS:XXXXXXX ! acquisition mode independent relationship: has_quality_directionality MS:XXXXXXX ! lower is better relationship: has_value_type MS:XXXXXXX ! tuple + +! --- Quantitative semantics --- relationship: has_value_concept MS:1000086 ! full width at half-maximum relationship: has_value_concept STATO:0000291 ! quantile relationship: has_units UO:0000010 ! second relationship: has_value_type xsd:float + +! --- Provenance --- xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671] xref: QuaMeter:XIC-FWHM-Q2 [PMID:24494671] xref: QuaMeter:XIC-FWHM-Q3 [PMID:24494671] ``` -## Metric classification +## How QC metrics are classified QC metrics can be categorized according to several classification dimensions. -Together these describe *what the metric measures, where it applies, and how it behaves.* +These specify *what kind of metric it is*, *where it belongs in the workflow*, *what data it uses*, and *how its values behave*. + +### Analytical dimension: inheritance via `is_a` -| **Dimension** | **Relationship** | **Example** | **Meaning** | -| --- | --- | --- | --- | -| **Workflow stage** | `part_of_workflow_stage` | `chromatography stage` | Where in the experimental or computational workflow the metric applies. | -| **Analytical dimension** | `measures_property` | `chromatographic performance metric` | What underlying property is measured. | -| **Information dependency type** | `depends_on_data_type` | `raw acquisition data` | What kind of data the metric requires (raw, ID, quant, hybrid). | -| **Measurement scope** | `has_measurement_scope` | `run level` | At what aggregation level it summarizes data (spectrum, run, batch, study). | -| **Acquisition strategy** | `applies_to_acquisition_mode` | `acquisition mode independent` | Which acquisition or instrument mode it applies to. | -| **Quality interpretation type** | `has_quality_directionality` | `lower is better` | How to interpret values in terms of quality. | -| **Metric value type** | `has_value_type` | `tuple` | Structural type of the output. | +This dimension defines **what kind of metric** you are creating — it represents the metric's *type*. + +**Syntax:** + +```obo +is_a: MS:XXXXXXX ! chromatographic performance metric +``` -These relationships make each metric's meaning explicit and enable better machine reasoning. +**Examples:** +`chromatographic performance metric`, `mass accuracy metric`, `spectral quality metric`, `ionization quality metric`, `quantification precision metric`, etc. -The [QC Metric Classification Reference](TODO:link) page provides full details of the available subclasses for each dimension (e.g. all workflow stages, analytical dimensions, acquisition modes, etc.), including definitions, examples, and how they map to existing PSI-MS CV terms. +This is the **only dimension** that uses `is_a`. +It places your metric correctly within the ontology's QC metric hierarchy. -Use that reference when selecting the appropriate classification terms for your new metric. +### Typed relationships: contextual and structural properties + +All other dimensions use a `relationship:` field. + +| **Dimension** | **Relationship** | **Example value** | **Meaning** | +| ------------------------------- | ----------------------------- | ------------------------------ | -------------------------------------------------------------------- | +| **Workflow stage** | `part_of_workflow_stage` | `chromatography stage` | Where in the experimental/computational workflow the metric applies. | +| **Information dependency type** | `depends_on_data_type` | `raw acquisition data` | What kind of data the metric depends on (raw, ID, quant, hybrid). | +| **Measurement scope** | `has_measurement_scope` | `run level` | Level of aggregation (spectrum, run, batch, study). | +| **Acquisition strategy** | `applies_to_acquisition_mode` | `acquisition mode independent` | Which acquisition mode/instrument the metric applies to. | +| **Quality interpretation type** | `has_quality_directionality` | `lower is better` | How values relate to data quality. | +| **Metric value type** | `has_value_type` | `tuple` | Structure of the metric value (single value, tuple, table, matrix). | + +Each dimension has predefined subclasses described in the [QC Metric Classification Reference](/metrics/classification). Use that reference when selecting the appropriate classification terms for your new metric. ## Quantitative details: what the numbers mean -To make your metric's values interpretable and comparable: +Add relationships describing what the metric's numeric values represent: -* `has_value_concept` → what the values represent. - Example: `STATO:0000291 ! quantile`, `MS:1000086 ! full width at half-maximum`. -* `has_units` → the unit of measurement (preferrably from the [Units of Measurement Ontology (UO)](https://www.ebi.ac.uk/ols/ontologies/uo)). - Example: `UO:0000010 ! second`, `UO:0000187 ! percent`. -* `has_value_type` → data type used. +| **Relationship** | **Purpose** | **Example** | +| ------------------- | ----------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | +| `has_value_concept` | Defines what the values represent | `MS:1000086 ! full width at half-maximum`, `STATO:0000291 ! quantile` | +| `has_units` | Defines the physical/statistical unit (preferably from [UO](https://www.ebi.ac.uk/ols/ontologies/uo)) | `UO:0000010 ! second`, `UO:0000187 ! percent` | +| `has_value_type` | Defines the data type (XML schema literal) | `xsd:float`, `xsd:int` | These fields help mzQC readers and validation tools understand how to process the data. @@ -137,7 +156,7 @@ These fields help mzQC readers and validation tools understand how to process th **Comments:** -Use the `comment:` only to clarify: +Use `comment:` only to clarify: * Implementation details (e.g., number of values, normalization). * Context or rationale (e.g., why a value is omitted). @@ -146,7 +165,7 @@ Avoid repeating the definition. ## Provenance and references -Always cite the origin of the metric: +Always record where the metric originates: ```obo xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671] @@ -157,26 +176,36 @@ xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671] ## Updating or extending metrics -If you need to improve an existing term (e.g., clearer definition, missing relationships): +If you want to improve an existing term (e.g., clearer definition, missing relationships): + +* Open an issue referencing the metric ID. +* Explain what should change and why. +* Curators will review and update or merge as appropriate. -* Open an issue referencing its ID. -* Describe what should change and why. -* Curators may update the term or merge it with others if appropriate. +Deprecated metrics are marked with: -Deprecated metrics are marked with `is_obsolete: true` and replaced by a new one via `replaced_by:`. +```obo +is_obsolete: true +replaced_by: MS:XXXXXXX +``` ## Quick reference -**When writing new metrics:** +**When defining new metrics:** -* ✅ Keep names and definitions short and specific. -* ✅ Use one relationship per classification dimension. -* ✅ Include `has_value_concept`, `has_units`, and `has_value_type`. -* ✅ Provide provenance (`xref:`). -* ✅ Test for uniqueness before submitting. +* ✅ Use `is_a` for the analytical dimension (metric type). +* ✅ Add one `relationship:` for each of the six other classification dimensions. +* ✅ Include quantitative metadata (`has_value_concept`, `has_units`, `has_value_type`). +* ✅ Add provenance (`xref:`). +* ✅ Ensure that the metric name and definition are unique. **Avoid:** * ❌ Tool names in the metric name. * ❌ Definitions that describe algorithms instead of meaning. -* ❌ Redundant comments. +* ❌ Redundant comments or duplicated phrasing. + +### See also + +* [**QC Metric Classification Reference:**](/metrics/classification) full list of subclasses, definitions, and examples. +* [**QC Metric Usage Guide:**](/metrics/use) how each value type (single, tuple, table, matrix) is encoded in mzQC. diff --git a/docs/pages/cv/howto_use_cv_terms.md b/docs/pages/cv/howto_use_cv_terms.md index ccde68e..9eb8479 100644 --- a/docs/pages/cv/howto_use_cv_terms.md +++ b/docs/pages/cv/howto_use_cv_terms.md @@ -1,12 +1,10 @@ --- layout: page -title: "Metrics – Use" +title: "Metrics – Usage Guide" permalink: /metrics/use --- -# PSI-MS QC Metrics Usage Guide - -*How to use QC CV terms correctly in mzQC files* +*How to use QC CV terms correctly in mzQC files.* ## Introduction @@ -24,7 +22,7 @@ That entry defines: * The **metric name** and **definition**, * Its **units** and **value type**, -* And semantic information (e.g. whether it's run-level, ID-based, or LC-related). +* Its **semantic classification**, describing where it applies and what it measures. When you reference a CV term in your mzQC file, you're telling mzQC-compatible software **exactly what kind of data this metric represents**. @@ -44,7 +42,7 @@ Example (simplified): ## Metric value types -Each QC metric defines **how its values are structured**, through the `has_value_type` relationship. +Each QC metric defines **how its values are structured**, using the `has_value_type` relationship. mzQC supports four value structures: | Value type | Structure | Example use | @@ -76,8 +74,8 @@ CV definition: id: MS:4000059 name: number of MS1 spectra def: "Counts the number of MS1 scans within a single run." +is_a: MS:4000XXX ! acquisition coverage metric relationship: part_of_workflow_stage MS:4000XXX ! mass spectrometry acquisition stage -relationship: measures_property MS:4000XXX ! acquisition coverage metric relationship: depends_on_data_type MS:4000XXX ! raw acquisition data relationship: has_measurement_scope MS:4000XXX ! run level relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent @@ -124,8 +122,8 @@ id: MS:4000062 name: MS2 density quantiles def: "Summarizes the distribution of spectral peak density in MS2 scans as quantiles of the number of fragment peaks per spectrum within a single run. The metric reports an ordered tuple of the first through (n−1)-th quantiles (Q1, ..., Qn−1), characterizing the overall fragmentation complexity and consistency across spectra." comment: "Values are reported as an (n−1)-element tuple of counts, representing the first to (n−1)-th quantiles of the distribution of fragment peak counts per MS2 spectrum. The final quantile (100th percentile) is omitted because it corresponds to the maximum observed peak count, which is a boundary value that does not convey additional information about distribution shape or variability and is sensitive to outliers. The tuple length implicitly specifies how many quantiles are reported and thus the resolution of the summary. Lower quantiles correspond to sparsely fragmented spectra; higher quantiles indicate spectra with more peaks. Interpretation depends on the acquisition and fragmentation settings and should be treated as context dependent rather than strictly higher- or lower-is-better." +is_a: measures_property MS:4000XXX ! spectral quality metric relationship: part_of_workflow_stage MS:4000XXX ! mass spectrometry acquisition stage -relationship: measures_property MS:4000XXX ! spectral quality metric relationship: depends_on_data_type MS:4000XXX ! raw acquisition data relationship: has_measurement_scope MS:4000XXX ! run level relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent @@ -161,7 +159,7 @@ Essentially a named column table with one row per observation. **mzQC encoding:** -* `"value"` is an object where each key is a column ID and its value is a list. +* `"value"` is an object where each key is a column identifier and its value is a list. * Each column has an optional unit. * All columns must have identical list lengths — each index corresponds to one row. * Units are provided as an array under `"unit"` and as part of the column definition. @@ -176,8 +174,8 @@ id: MS:4000063 name: MS2 known precursor charge fractions def: "Fraction of MS/MS precursors for each charge state observed within a run. Each entry lists a precursor charge (z) and its corresponding fraction of all observed MS2 precursors." comment: "Values are reported as a table with two columns: 'Charge state' and 'Fraction'. The final charge state bin should be interpreted as 'that charge state or higher' to include all unlisted higher charges." +is_a: MS:4000XXX ! ionization quality metric relationship: part_of_workflow_stage MS:4000XXX ! ionization stage -relationship: measures_property MS:4000XXX ! ionization quality metric relationship: depends_on_data_type MS:4000XXX ! raw acquisition data relationship: has_measurement_scope MS:4000XXX ! run level relationship: has_value_type MS:4000XXX ! table @@ -220,21 +218,23 @@ A metric that stores a rectangular grid of numeric values of the same type and u * A single `"unit"` applies to all entries. * Only homogeneous numeric types are allowed (no mixed datatypes). -## Understanding the relationships +## Understanding hierarchy and relationships + +Each QC metric term in the CV encodes its semantics in two ways: -Each metric term in the CV includes semantic relationships that describe *how* and *where* it applies. -These don't appear directly in mzQC files, but they're important for consistency and validation. +* The `is_a` hierarchy specifies *what kind of metric* it is (the analytical dimension). +* The typed `relationship`s describe *where and how* it applies. -| Relationship | Describes | Example | -| ----------------------------- | ----------------------------------- | ------------------------------------ | -| `part_of_workflow_stage` | Experimental or computational stage | `chromatography stage` | -| `measures_property` | Quality dimension measured | `chromatographic performance metric` | -| `depends_on_data_type` | Type of data used | `raw acquisition data` | -| `has_measurement_scope` | Level of aggregation | `run level` | -| `applies_to_acquisition_mode` | Acquisition mode | `DIA-specific metric` | -| `has_quality_directionality` | Interpretation of values | `lower is better` | -| `has_value_type` | Structure of the value | `tuple` | +| Ontology element | Describes | Example | +| ----------------------------- | ------------------------------------- | ------------------------------------ | +| `is_a` | Type of metric (analytical dimension) | `chromatographic performance metric` | +| `part_of_workflow_stage` | Experimental or computational stage | `chromatography stage` | +| `depends_on_data_type` | Type of data used | `raw acquisition data` | +| `has_measurement_scope` | Level of aggregation | `run level` | +| `applies_to_acquisition_mode` | Acquisition mode | `DIA-specific metric` | +| `has_quality_directionality` | Interpretation of values | `lower is better` | +| `has_value_type` | Structure of the value | `tuple` | -These relationships are how the CV ensures every metric is comparable, searchable, and logically complete. +These relationships make each metric comparable, searchable, and logically complete while maintaining a clean metric taxonomy. -For full details and all available subclasses (e.g., all workflow stages or acquisition modes), see the [QC Metric Classification Reference](TODO:link). +For full details and all available subclasses (e.g., analytical metric types, workflow stages, or acquisition modes), see the [QC Metric Classification Reference](/metrics/classification). diff --git a/docs/pages/metrics.md b/docs/pages/metrics.md index d43b60b..dc41c52 100644 --- a/docs/pages/metrics.md +++ b/docs/pages/metrics.md @@ -1,12 +1,33 @@ --- layout: page -title: Metrics +title: QC Metrics permalink: /metrics/ --- -The mzQC format owes much of it's _simplicity_ **and** _flexibility_ to the use of controlled vocabulary (CV) terms to define and instantiate quality metric records. -You can find out more on how to use and define your own CV terms below. +The mzQC format achieves both _simplicity_ and _flexibility_ by using **Controlled Vocabulary (CV) terms** to describe quality metrics in a precise and machine-readable way. +These terms are defined within the [**PSI-MS Controlled Vocabulary (CV)**](https://github.com/HUPO-PSI/psi-ms-CV) and specify: -{% include_relative cv/howto_use_cv_terms.md %} +* what each metric measures, +* how it is computed and represented, and +* how it relates to specific workflow stages or data types. -{% include_relative cv/howto_create_cv_terms.md %} +This ensures that QC results are interoperable across software tools, consistent across datasets, and unambiguously interpretable by both humans and machines. + +## Learn more about QC metric CV terms + +Whether you're using, creating, or browsing metrics, the following pages explain everything you need to know: + +### [Metric Classification Reference](/metrics/classification) + +A taxonomy of QC metric categories and relationships. +Defines the seven classification dimensions used in mzQC and how they describe each metric's meaning, context, and structure. + +### [Using QC Metrics](/metrics/use) + +A hands-on guide for developers and tool integrators. +Learn how to reference, interpret, and serialize CV terms in mzQC files — including examples for single-value, tuple, table, and matrix metrics. + +### [Creating New QC Metrics](/metrics/create) + +Step-by-step instructions for proposing or updating QC metric terms in the PSI-MS CV. +Explains how to write clear definitions, select correct classifications, and provide provenance and quantitative details. From a60f23c48056fda1b5c82949c736d8ccb57715a6 Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Thu, 6 Nov 2025 12:00:05 +0100 Subject: [PATCH 3/4] Fix relative links --- docs/pages/cv/classification_reference.md | 6 +++--- docs/pages/cv/howto_create_cv_terms.md | 8 ++++---- docs/pages/cv/howto_use_cv_terms.md | 4 ++-- docs/pages/metrics.md | 6 +++--- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/pages/cv/classification_reference.md b/docs/pages/cv/classification_reference.md index e0e0b16..9c14be9 100644 --- a/docs/pages/cv/classification_reference.md +++ b/docs/pages/cv/classification_reference.md @@ -1,7 +1,7 @@ --- layout: page title: "Metrics – Classification Reference" -permalink: /metrics/classification +permalink: /metrics/classification/ --- *Standardized semantic categories for PSI-MS quality control metrics.* @@ -239,7 +239,7 @@ This defines how the metric must be represented in mzQC. | **Table** | Named columns | Parallel lists of equal length; each column has its own unit. | MS2 charge fractions | | **Matrix** | Rectangular array | 2D array of homogeneous numeric values. | Ion-mobility intensity matrix | -See the [CV Term Usage Guide](/metrics/use) for details on how each type is encoded in mzQC. +See the [CV Term Usage Guide](../use/) for details on how each type is encoded in mzQC. --- @@ -292,4 +292,4 @@ relationship: has_value_type MS:4000XXX ! table These relationships together provide a complete, machine-readable semantic description of any QC metric. -For how to serialize each **metric value type** in mzQC (single, tuple, table, matrix), see the **[CV Term Usage Guide](/metrics/use)**. +For how to serialize each **metric value type** in mzQC (single, tuple, table, matrix), see the **[CV Term Usage Guide](../use)**. diff --git a/docs/pages/cv/howto_create_cv_terms.md b/docs/pages/cv/howto_create_cv_terms.md index 5c955cd..3f971f7 100644 --- a/docs/pages/cv/howto_create_cv_terms.md +++ b/docs/pages/cv/howto_create_cv_terms.md @@ -1,7 +1,7 @@ --- layout: page title: "Metrics – Term Creation Guide" -permalink: /metrics/create +permalink: /metrics/create/ --- *How to define and request new QC metrics for the PSI-MS Controlled Vocabulary.* @@ -132,7 +132,7 @@ All other dimensions use a `relationship:` field. | **Quality interpretation type** | `has_quality_directionality` | `lower is better` | How values relate to data quality. | | **Metric value type** | `has_value_type` | `tuple` | Structure of the metric value (single value, tuple, table, matrix). | -Each dimension has predefined subclasses described in the [QC Metric Classification Reference](/metrics/classification). Use that reference when selecting the appropriate classification terms for your new metric. +Each dimension has predefined subclasses described in the [QC Metric Classification Reference](../classification/). Use that reference when selecting the appropriate classification terms for your new metric. ## Quantitative details: what the numbers mean @@ -207,5 +207,5 @@ replaced_by: MS:XXXXXXX ### See also -* [**QC Metric Classification Reference:**](/metrics/classification) full list of subclasses, definitions, and examples. -* [**QC Metric Usage Guide:**](/metrics/use) how each value type (single, tuple, table, matrix) is encoded in mzQC. +* [**QC Metric Classification Reference:**](../classification/) full list of subclasses, definitions, and examples. +* [**QC Metric Usage Guide:**](../use/) how each value type (single, tuple, table, matrix) is encoded in mzQC. diff --git a/docs/pages/cv/howto_use_cv_terms.md b/docs/pages/cv/howto_use_cv_terms.md index 9eb8479..25b8e4b 100644 --- a/docs/pages/cv/howto_use_cv_terms.md +++ b/docs/pages/cv/howto_use_cv_terms.md @@ -1,7 +1,7 @@ --- layout: page title: "Metrics – Usage Guide" -permalink: /metrics/use +permalink: /metrics/use/ --- *How to use QC CV terms correctly in mzQC files.* @@ -237,4 +237,4 @@ Each QC metric term in the CV encodes its semantics in two ways: These relationships make each metric comparable, searchable, and logically complete while maintaining a clean metric taxonomy. -For full details and all available subclasses (e.g., analytical metric types, workflow stages, or acquisition modes), see the [QC Metric Classification Reference](/metrics/classification). +For full details and all available subclasses (e.g., analytical metric types, workflow stages, or acquisition modes), see the [QC Metric Classification Reference](../classification). diff --git a/docs/pages/metrics.md b/docs/pages/metrics.md index dc41c52..17a298d 100644 --- a/docs/pages/metrics.md +++ b/docs/pages/metrics.md @@ -17,17 +17,17 @@ This ensures that QC results are interoperable across software tools, consistent Whether you're using, creating, or browsing metrics, the following pages explain everything you need to know: -### [Metric Classification Reference](/metrics/classification) +### [Metric Classification Reference](classification/) A taxonomy of QC metric categories and relationships. Defines the seven classification dimensions used in mzQC and how they describe each metric's meaning, context, and structure. -### [Using QC Metrics](/metrics/use) +### [Using QC Metrics](use/) A hands-on guide for developers and tool integrators. Learn how to reference, interpret, and serialize CV terms in mzQC files — including examples for single-value, tuple, table, and matrix metrics. -### [Creating New QC Metrics](/metrics/create) +### [Creating New QC Metrics](create/) Step-by-step instructions for proposing or updating QC metric terms in the PSI-MS CV. Explains how to write clear definitions, select correct classifications, and provide provenance and quantitative details. From 4f49c3db62daf50f38cec9f272f54ff669726e0f Mon Sep 17 00:00:00 2001 From: Wout Bittremieux Date: Thu, 6 Nov 2025 18:19:04 +0100 Subject: [PATCH 4/4] Add deconvoluted data --- docs/pages/cv/classification_reference.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/pages/cv/classification_reference.md b/docs/pages/cv/classification_reference.md index 9c14be9..c9d8c0e 100644 --- a/docs/pages/cv/classification_reference.md +++ b/docs/pages/cv/classification_reference.md @@ -126,6 +126,8 @@ Specifies which type of data input the metric requires to be computed. * **Raw acquisition data:** metrics that can be calculated directly from the raw MS data, without identifications. *Example:* total ion current stability, scan count. +* **Deconvoluted data:** metrics based on processed spectra or peak lists obtained after signal deconvolution, centroiding, or deisotoping, but prior to identification. + *Example:* peak density in deconvoluted spectra, precursor mass range coverage. * **Identification results:** metrics that depend on identified peptides, compounds, or spectra. *Example:* PSM-level FDR, peptide coverage. * **Quantification results:** metrics derived from quantitative data matrices.