diff --git a/.gitignore b/.gitignore index a9c5404c3..3fbb933b6 100644 --- a/.gitignore +++ b/.gitignore @@ -78,6 +78,7 @@ site/sdrf-data.json # Claude Code # ############### .claude/ +CLAUDE.md .codacy/ # Local files for developers are in folder in local-info diff --git a/README.md b/README.md index 5c6b6f0cb..5ab85ae7b 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@ # Proteomics Sample Metadata Format [![Version](https://flat.badgen.net/static/sdrf-proteomics/1.0.1/orange)](CHANGELOG.md) -[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE) -[![Open Issues](https://flat.badgen.net/github/open-issues/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/issues) -[![Open PRs](https://flat.badgen.net/github/open-prs/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/pulls) -![Contributors](https://flat.badgen.net/github/contributors/bigbio/proteomics-metadata-standard) -![Watchers](https://flat.badgen.net/github/watchers/bigbio/proteomics-metadata-standard) -![Stars](https://flat.badgen.net/github/stars/bigbio/proteomics-metadata-standard) +[![License](https://flat.badgen.net/github/license/bigbio/proteomics-sample-metadata)](https://github.com/bigbio/proteomics-sample-metadata/blob/master/LICENSE) +[![Open Issues](https://flat.badgen.net/github/open-issues/bigbio/proteomics-sample-metadata)](https://github.com/bigbio/proteomics-sample-metadata/issues) +[![Open PRs](https://flat.badgen.net/github/open-prs/bigbio/proteomics-sample-metadata)](https://github.com/bigbio/proteomics-sample-metadata/pulls) +![Contributors](https://flat.badgen.net/github/contributors/bigbio/proteomics-sample-metadata) +![Watchers](https://flat.badgen.net/github/watchers/bigbio/proteomics-sample-metadata) +![Stars](https://flat.badgen.net/github/stars/bigbio/proteomics-sample-metadata) +[![llms.txt](https://flat.badgen.net/static/llms.txt/available/blue)](llms.txt) ## Improving metadata annotation of Proteomics datasets @@ -44,7 +45,7 @@ In the [annotated projects](https://github.com/bigbio/proteomics-metadata-standa Annotate a dataset in 5 steps: - Read the [SDRF-Proteomics specification](https://github.com/bigbio/proteomics-metadata-standard/tree/master/sdrf-proteomics). -- Depending on the type of dataset, choose the appropriate [sample template](https://github.com/bigbio/proteomics-metadata-standard/tree/master/sdrf-proteomics#sdrf-templates). +- Depending on the type of dataset, choose the appropriate [sample template](https://github.com/bigbio/proteomics-sample-metadata/tree/master/sdrf-proteomics#sdrf-templates). - Annotate the corresponding ProteomeXchange PXD dataset following the guidelines. - Validate your SDRF file: diff --git a/llms.txt b/llms.txt new file mode 100644 index 000000000..d7b6d72ad --- /dev/null +++ b/llms.txt @@ -0,0 +1,121 @@ +# SDRF-Proteomics + +> SDRF-Proteomics is a HUPO-PSI community standard defining a tab-delimited file format for capturing sample-to-data-file relationships in proteomics experiments. It standardizes sample metadata (organism, disease, tissue), technical metadata (instrument, labels, enzymes), and experimental design (factor values) to enable automated reprocessing and reuse of public proteomics datasets. Compatible with MAGE-TAB SDRF from transcriptomics. + +## Specification + +- sdrf-proteomics/README.adoc - Core specification: format rules, column headers, cell values, templates, factor values, ontologies + - sdrf-proteomics/quickstart.adoc - Quick Start Tutorial (10-15 min) + - sdrf-proteomics/metadata-guidelines/sample-metadata.adoc - Sample Metadata Guidelines: age, sex, disease, organism part, cell type + - sdrf-proteomics/metadata-guidelines/template-definitions.adoc - Template Definitions Guide (for developers) + - sdrf-proteomics/metadata-guidelines/sdrf-terms.tsv - SDRF Terms Reference: all column terms with ontology mappings + +- sdrf-proteomics/VERSIONING.adoc - Versioning and Deprecation Policy: version tracks, template compatibility, deprecation lifecycle, transition timelines +- sdrf-proteomics/open-issues.adoc - Open Issues and Future Decisions: community discussions for post-v1.1.0 changes +- psi-document/v1.0.0/SDRF_Proteomics_Specification_v1.0.0.pdf - Official HUPO-PSI specification (PDF, v1.0.0) +- psi-document/v1.1.0-dev/sdrf-proteomics-specification-v1.1.0-dev.pdf - Development specification (PDF, v1.1.0-dev) + +## Templates + +- sdrf-proteomics/templates/ms-proteomics/README.adoc - MS-Proteomics: labels, instruments, modifications, cleavage agents +- sdrf-proteomics/templates/affinity-proteomics/README.adoc - Affinity Proteomics: Olink and SomaScan +- sdrf-proteomics/templates/human/README.adoc - Human: disease, age, sex, ancestry, disease staging +- sdrf-proteomics/templates/vertebrates/README.adoc - Vertebrates: mouse, rat, zebrafish +- sdrf-proteomics/templates/invertebrates/README.adoc - Invertebrates: Drosophila, C. elegans +- sdrf-proteomics/templates/plants/README.adoc - Plants: Arabidopsis, crops +- sdrf-proteomics/templates/cell-lines/README.adoc - Cell Lines: Cellosaurus integration +- sdrf-proteomics/templates/dda-acquisition/README.adoc - DDA Acquisition: dissociation method, collision energy +- sdrf-proteomics/templates/dia-acquisition/README.adoc - DIA Acquisition: scan windows, isolation width +- sdrf-proteomics/templates/single-cell/README.adoc - Single-Cell Proteomics: cell isolation, carrier proteome +- sdrf-proteomics/templates/immunopeptidomics/README.adoc - Immunopeptidomics: MHC class, HLA typing +- sdrf-proteomics/templates/crosslinking/README.adoc - Crosslinking MS: crosslinker reagents +- sdrf-proteomics/templates/metaproteomics/README.adoc - Metaproteomics: environmental and microbiome samples +- sdrf-proteomics/templates/olink/README.adoc - Olink: proximity extension assays +- sdrf-proteomics/templates/somascan/README.adoc - SomaScan: aptamer-based proteomics + +## Template YAML Schemas (sdrf-templates submodule) + +Machine-readable YAML definitions used by sdrf-pipelines for validation. Each template has a `.yaml` schema and an optional `.sdrf.tsv` example file. Templates follow a layered hierarchy: base → technology → sample/experiment. + +- sdrf-proteomics/sdrf-templates/templates.yaml - Template manifest: all templates with latest versions, inheritance, and layer metadata +- sdrf-proteomics/sdrf-templates/base/1.1.0/base.yaml - Base template (internal, not user-facing): shared columns inherited by all templates + - sdrf-proteomics/sdrf-templates/base/1.1.0/base.sdrf.tsv - Base example +- sdrf-proteomics/sdrf-templates/ms-proteomics/1.1.0/ms-proteomics.yaml - MS-Proteomics (technology layer): minimum valid template for any MS experiment + - sdrf-proteomics/sdrf-templates/ms-proteomics/1.1.0/ms-proteomics.sdrf.tsv - MS-Proteomics example +- sdrf-proteomics/sdrf-templates/affinity-proteomics/1.1.0/affinity-proteomics.yaml - Affinity Proteomics (technology layer): Olink, SomaScan base + - sdrf-proteomics/sdrf-templates/affinity-proteomics/1.1.0/affinity-proteomics.sdrf.tsv - Affinity Proteomics example +- sdrf-proteomics/sdrf-templates/human/1.1.0/human.yaml - Human (sample layer): disease, age, sex, ancestry + - sdrf-proteomics/sdrf-templates/human/1.1.0/human.sdrf.tsv - Human example +- sdrf-proteomics/sdrf-templates/vertebrates/1.1.0/vertebrates.yaml - Vertebrates (sample layer): mouse, rat, zebrafish, etc. + - sdrf-proteomics/sdrf-templates/vertebrates/1.1.0/vertebrates.sdrf.tsv - Vertebrates example +- sdrf-proteomics/sdrf-templates/invertebrates/1.1.0/invertebrates.yaml - Invertebrates (sample layer): Drosophila, C. elegans + - sdrf-proteomics/sdrf-templates/invertebrates/1.1.0/invertebrates.sdrf.tsv - Invertebrates example +- sdrf-proteomics/sdrf-templates/plants/1.1.0/plants.yaml - Plants (sample layer): Arabidopsis, crops + - sdrf-proteomics/sdrf-templates/plants/1.1.0/plants.sdrf.tsv - Plants example +- sdrf-proteomics/sdrf-templates/cell-lines/1.1.0/cell-lines.yaml - Cell Lines (experiment layer): Cellosaurus integration + - sdrf-proteomics/sdrf-templates/cell-lines/1.1.0/cell-lines.sdrf.tsv - Cell Lines example +- sdrf-proteomics/sdrf-templates/dda-acquisition/1.1.0/dda-acquisition.yaml - DDA Acquisition (experiment layer): dissociation method, collision energy + - sdrf-proteomics/sdrf-templates/dda-acquisition/1.1.0/dda-acquisition.sdrf.tsv - DDA example +- sdrf-proteomics/sdrf-templates/dia-acquisition/1.1.0/dia-acquisition.yaml - DIA Acquisition (experiment layer): scan windows, isolation width + - sdrf-proteomics/sdrf-templates/dia-acquisition/1.1.0/dia-acquisition.sdrf.tsv - DIA example +- sdrf-proteomics/sdrf-templates/crosslinking/1.1.0/crosslinking.yaml - Crosslinking MS (experiment layer): crosslinker reagents + - sdrf-proteomics/sdrf-templates/crosslinking/1.1.0/crosslinking.sdrf.tsv - Crosslinking example +- sdrf-proteomics/sdrf-templates/single-cell/1.0.0/single-cell.yaml - Single-Cell (experiment layer): cell isolation, carrier proteome + - sdrf-proteomics/sdrf-templates/single-cell/1.0.0/single-cell.sdrf.tsv - Single-Cell example +- sdrf-proteomics/sdrf-templates/immunopeptidomics/1.0.0-dev/immunopeptidomics.yaml - Immunopeptidomics (experiment layer): MHC class, HLA typing +- sdrf-proteomics/sdrf-templates/metaproteomics/1.0.0-dev/metaproteomics.yaml - Metaproteomics (experiment layer): environmental and microbiome samples + - sdrf-proteomics/sdrf-templates/metaproteomics/1.0.0-dev/metaproteomics.sdrf.tsv - Metaproteomics example +- sdrf-proteomics/sdrf-templates/olink/1.0.0/olink.yaml - Olink (experiment layer): proximity extension assays + - sdrf-proteomics/sdrf-templates/olink/1.0.0/olink.sdrf.tsv - Olink example +- sdrf-proteomics/sdrf-templates/somascan/1.0.0/somascan.yaml - SomaScan (experiment layer): aptamer-based proteomics + - sdrf-proteomics/sdrf-templates/somascan/1.0.0/somascan.sdrf.tsv - SomaScan example + +## Tools + +- sdrf-proteomics/tool-support.adoc - Tool Support Overview: annotators, validators, analysis tools + - https://github.com/bigbio/sdrf-pipelines - sdrf-pipelines: official Python CLI/library for SDRF validation + - https://lessdrf.streamlit.app/ - lesSDRF: web-based SDRF creation tool + - https://cupcake-vanilla-demo.proteo.nexus/ - CupCAKE: web annotation platform with ontology integration + - https://quantms.org/ - quantms: Nextflow pipeline for quantitative proteomics + - https://www.maxquant.org/ - MaxQuant: desktop proteomics software with SDRF export + - https://github.com/wombat-p - Wombat-P: benchmarking platform for proteomics workflows + +## Examples + +- examples/core/PXD002137/PXD002137.sdrf.tsv - Core example: label-free +- examples/core/PXD004684/PXD004684.sdrf.tsv - Core example: TMT labeled +- examples/core/PXD006482/PXD006482.sdrf.tsv - Core example: SILAC +- examples/core/PXD008934/PXD008934.sdrf.tsv - Core example: human proteome +- examples/core/PDC000126/PDC000126.sdrf.tsv - Core example: PDC dataset +- examples/use-cases/crosslinking.sdrf.tsv - Use case: crosslinking MS +- examples/use-cases/immunopeptidomics.sdrf.tsv - Use case: immunopeptidomics +- examples/use-cases/single-cell.sdrf.tsv - Use case: single-cell proteomics + +## Annotated Projects + +- annotated-projects/ - 250+ public proteomics datasets annotated in SDRF format + - annotated-projects/PXD008934/PXD008934.sdrf.tsv - Label-free quantification + - annotated-projects/PXD017710/PXD017710.sdrf.tsv - TMT-labeled quantitative proteomics + - annotated-projects/PXD000612/PXD000612.sdrf.tsv - SILAC-based quantification + - annotated-projects/PXD018830/PXD018830-DIA.sdrf.tsv - Data-independent acquisition + - annotated-projects/PXD000759/PXD000759.sdrf.tsv - Phosphoproteomics + - annotated-projects/PXD001819/PXD001819.sdrf.tsv - Cell line proteomics + +## Publications + +- https://www.nature.com/articles/s41467-021-26111-3 - Dai et al. (2021) Nat Commun: A proteomics sample metadata representation for multiomics integration +- https://pubs.acs.org/doi/abs/10.1021/acs.jproteome.0c00376 - Perez-Riverol et al. (2020) J Proteome Res: Towards a sample metadata standard in public proteomics repositories + +## Project + +- README.md - Project overview and contributor list +- CHANGELOG.md - Version history and changes +- CITATION.cff - Citation metadata +- LICENSE - GNU General Public License +- DEVELOPMENT.md - Building the documentation website locally + +## Optional + +- https://github.com/bigbio/proteomics-metadata-standard/wiki - 30-Minute Guide to SDRF-Proteomics +- https://www.youtube.com/watch?v=TMDu_yTzYQM - Introduction to SDRF-Proteomics (video) +- https://www.psidev.info/sdrf-sample-data-relationship-format - HUPO-PSI official page diff --git a/psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf b/psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf index e32e163b7..efe9fb818 100644 Binary files a/psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf and b/psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf differ diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index d0b2505a7..f59fdc2c8 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -104,6 +104,8 @@ The file is organized into three column sections: The SDRF-Proteomics specification uses https://semver.org/[Semantic Versioning] (MAJOR.MINOR.PATCH). Version numbers are prefixed with "v" (e.g., v1.1.0). Changes are proposed via GitHub pull requests to the dev branch. +For the complete versioning strategy — including template versioning, ontology updates, the deprecation policy, transition timelines, and migration tooling — see link:VERSIONING.adoc[Versioning and Deprecation Policy]. + [[sdrf-file-rules]] === Format rules diff --git a/sdrf-proteomics/VERSIONING.adoc b/sdrf-proteomics/VERSIONING.adoc new file mode 100644 index 000000000..9d947ffdf --- /dev/null +++ b/sdrf-proteomics/VERSIONING.adoc @@ -0,0 +1,509 @@ += SDRF-Proteomics Versioning and Deprecation Policy +:sectnums: +:toc: left +:doctype: book +:xrefstyle: short + +ifdef::env-github[] +:tip-caption: :bulb: +:note-caption: :information_source: +:important-caption: :heavy_exclamation_mark: +:caution-caption: :fire: +:warning-caption: :warning: +endif::[] + +== Introduction + +This document defines the versioning strategy and deprecation policy for the SDRF-Proteomics ecosystem. It covers the specification itself, YAML templates, ontology validators, and how changes propagate to previously validated SDRF files. The goal is to provide a clear, predictable schedule for all breaking changes so that data submitters, repository curators, tool developers, and analysis pipeline maintainers can plan upgrades accordingly. + +This policy applies starting with SDRF-Proteomics v1.1.0. + +== Version Tracks + +The SDRF-Proteomics ecosystem has three independently versioned components. Each follows https://semver.org/[Semantic Versioning] (MAJOR.MINOR.PATCH) but they are released on separate schedules. + +[cols="2,2,3,2", options="header"] +|=== +| Component | Repository | What it defines | Example version + +| **Specification** +| https://github.com/bigbio/proteomics-metadata-standard[proteomics-metadata-standard] +| Format rules, column semantics, ontology requirements, and the normative text +| v1.1.0 + +| **Templates** +| https://github.com/bigbio/sdrf-templates[sdrf-templates] (submodule) +| YAML schemas defining required/recommended/optional columns per experiment type +| human v1.1.0 + +| **Validator** +| https://github.com/bigbio/sdrf-pipelines[sdrf-pipelines] +| CLI/library that checks SDRF files against templates and the specification +| sdrf-pipelines 0.1.0 +|=== + +=== Semantic Versioning Definitions + +Each component uses MAJOR.MINOR.PATCH with the following meanings: + +[cols="1,4,4", options="header"] +|=== +| Level | Meaning | Examples + +| **MAJOR** +| Breaking changes that make previously valid SDRF files invalid, remove required columns, or change core format rules. +| Removing a required column; changing the TSV structure; renaming reserved words. + +| **MINOR** +| Backward-compatible additions. Previously valid files remain valid. New optional/recommended columns, new templates, or relaxed requirements. +| Adding a new template; making a required column optional; adding a new recommended column. + +| **PATCH** +| Bug fixes, documentation clarifications, ontology term corrections. No schema or validation changes. +| Fixing a typo in a column description; correcting an ontology accession; updating examples. +|=== + +=== Version Compatibility Matrix + +Each specification version declares which template versions and validator versions are compatible. This is tracked in a compatibility matrix maintained in the repository: + +[cols="2,2,2", options="header"] +|=== +| Specification | Templates | Validator (minimum) + +| v1.0.0 +| 1.0.x +| sdrf-pipelines 0.0.x + +| v1.1.0 +| 1.0.x, 1.1.x +| sdrf-pipelines 0.1.x +|=== + +NOTE: A template version `X.Y.Z` is compatible with any specification version that shares the same MAJOR version and has a MINOR version >= the template's MINOR version. + +== Template Versioning + +=== Independent Template Versions + +Templates are versioned independently of the specification and of each other. A template at version `1.1.0` can coexist with another template at `1.0.0`. The `templates.yaml` manifest tracks the latest version and all available versions for each template. + +=== SDRF File Version Declaration + +SDRF files SHOULD declare the versions they were validated against using the file-level metadata columns: + +[source,tsv] +---- +comment[sdrf version] comment[sdrf template] comment[sdrf template] +v1.1.0 human v1.1.0 ms-proteomics v1.1.0 +---- + +This enables parsers and validators to apply the correct rules for the declared version, even if newer versions exist. + +NOTE: If the `comment[sdrf version]` or `comment[sdrf template]` columns are absent, the validator MUST assume the **latest available specification and template versions** and validate accordingly. A warning SHOULD be emitted recommending that submitters declare explicit versions to ensure reproducible validation over time. + +=== Template Inheritance and Updates + +Templates follow a layered hierarchy where child templates inherit from parent templates (see link:metadata-guidelines/template-definitions.adoc[Template Definitions Guide]). When a parent template is updated, the following rules apply: + +[cols="2,4,2", options="header"] +|=== +| Scenario | What happens | Action required + +| Parent adds an OPTIONAL column +| Child automatically inherits the new column. Previously valid files remain valid. +| None. + +| Parent adds a RECOMMENDED column +| Child inherits the column as recommended. Validators emit a warning for files missing the column. Previously valid files remain valid (warning only). +| None (but annotation encouraged). + +| Parent adds a REQUIRED column +| **Breaking change.** Child must release a new MAJOR version. A deprecation period applies (see <>). +| Update files during the transition period. + +| Parent removes a column +| **Breaking change.** Column is first deprecated in a MINOR release, then removed in the next MAJOR release. +| Migrate files during the transition period. + +| Parent changes a column's requirement level +| `optional` → `recommended` is non-breaking (MINOR). `optional` → `required` or `recommended` → `required` is breaking (MAJOR with deprecation period). +| Follow the deprecation timeline. +|=== + +=== Community Templates + +Community-contributed templates MUST be submitted via pull request to the official https://github.com/bigbio/sdrf-templates[sdrf-templates] repository. Templates that exist only in external repositories or local copies are **not recognized** by the specification and will not be supported by the validator. This ensures that every template receives community review, follows consistent quality standards, and remains maintained over time. + +Community-contributed templates MUST: + +1. Be submitted as a pull request to the https://github.com/bigbio/sdrf-templates[sdrf-templates] repository. The PR must include the YAML schema, an example `.sdrf.tsv` file, and documentation (a `README.adoc` in the corresponding `templates/` directory). +2. Extend an official template using the `extends` field. +3. Follow the same YAML schema structure defined in the link:metadata-guidelines/template-definitions.adoc[Template Definitions Guide]. +4. Use the `-dev` suffix for pre-release versions (e.g., `1.0.0-dev`) until the template has been reviewed and accepted by the core team. +5. Declare their `spec_compatibility` range (see <>). + +Once merged, community templates become part of the official template catalog and appear in the `templates.yaml` manifest. The original contributors are expected to help maintain the template, but the core team assumes responsibility for ensuring compatibility with future specification releases. + +IMPORTANT: The validator only recognizes templates present in the official `sdrf-templates` repository. Using `comment[sdrf template]` to reference a template name that does not exist in the catalog will produce a validation warning. + +When a base template is updated, community template maintainers are responsible for testing compatibility and releasing updated versions via pull request. The validator SHOULD emit a warning when a community template references a base template version older than the latest available. + +== Feature Lifecycle Policy + +This section details the policy for the introduction, testing, stabilization, and eventual removal of features (fields, ontologies, validation rules). + +=== Feature Categorization + +Features within the specification are categorized based on their intended audience and visibility. This categorization dictates the strictness of the stability guarantees applied during the lifecycle. + +==== Background Features +Background features consist of properties primarily consumed by software pipelines, validators, or scripts that do not get rendered in a table. Examples include validation hash columns. These features are rarely inspected manually by end-users. Changes to background features follow the standard lifecycle, and deprecation requires appropriate updates to technical documentation and validator warnings. + +==== Display Features +Display features are those that are rendered in the tabulated table (e.g., column names, value syntax). To avoid causing confusion, any changes that require the user to relearn the format should be carefully considered and given a large grace period (minimum 12 months) where both old and new formats coexist, ideally with converter tooling. Proposals affecting display features require discussion with the wider community. + +=== Feature Lifecycle Stages + +The lifecycle of a feature proceeds through six distinct stages: + +==== 1. Introduction +The Introduction stage represents the proposal phase. A contributor creates a GitHub Issue describing the use case and providing examples. The feature is categorized (Background vs. Display) and undergoes community review. Once approved, a Pull Request is drafted. + +==== 2. Test +Features in the Test stage are experimental, implemented in a feature branch. This stage is strictly for internal testing. Features in this state can change frequently. + +==== 3. Unstable +Features in this stage are merged into the `dev` branch and documented but are not yet recommended for public use. They are subject to change without notice. + +==== 4. Stable +Features in this stage are included in a numbered stable release and are recommended for production use. Stable features are protected by backward compatibility guarantees. The official validator enforces rules for stable features. + +==== 5. Deprecation +Deprecation indicates that a feature is close to end-of-life. It remains valid to ensure backward compatibility, but its use is discouraged. The validator issues a warning. +* **Background Features:** Deprecation period of at least one Minor version cycle or 6 months. +* **Display Features:** Deprecation period of at least two Minor version cycles or 12 months. + +==== 6. Removal +Removal is the final stage where a feature is designated as obsolete. Removed features are no longer valid, and their presence triggers a validation error. Removal is a breaking change and occurs only in a Major version release. + +[[deprecation-policy]] +== Deprecation Policy + +=== Core Principle: Versions Are Immutable + +Once a template version is released (e.g., `human/1.1.0/human.yaml`), it is never modified or deleted. Changes go into a **new version** (e.g., `human/1.2.0/human.yaml`). This means: + +* An SDRF file that was valid under `human v1.1.0` will always be valid under `human v1.1.0`. +* The validator loads whichever template version the file declares. +* No special deprecation metadata is added to template YAML files — each version is its own complete, self-contained truth. + +There is no `deprecated:` flag, no `spec_compatibility:` range, and no deprecation schedule file. The mechanism is simply: **old versions stay, new versions are added, and the validator picks the right one.** + +=== How It Works in Practice + +Suppose `characteristics[biological replicate]` is `required` in `human v1.1.0` and becomes `optional` in `human v1.2.0`: + +[cols="2,3", options="header"] +|=== +| SDRF file declares | What happens + +| `comment[sdrf template]` = `human v1.1.0` +| Validator loads `human/1.1.0/human.yaml`. Column is required. **File is validated under old rules.** + +| `comment[sdrf template]` = `human v1.2.0` +| Validator loads `human/1.2.0/human.yaml`. Column is optional. **File is validated under new rules.** + +| No `comment[sdrf template]` column +| Validator uses the **latest** template version (v1.2.0). Column is optional. A warning recommends declaring an explicit version. +|=== + +That is the entire deprecation mechanism. No YAML annotations, no extra tooling. The template versions do all the work. + +=== How Changes Are Communicated + +Changes are communicated through three channels. Each serves a different audience and timeline: + +==== 1. GitHub Issues (before the change) + +Every change that affects existing SDRF files starts as a **GitHub issue** in the https://github.com/bigbio/proteomics-sample-metadata[specification repository]. The issue describes the proposal, the rationale, and the impact. Community members comment and vote. No change is merged without community input. + +Issues that propose breaking changes MUST remain open for a minimum of **60 days** before being accepted, to give the community time to respond. + +==== 2. CHANGELOG (at the time of the change) + +When the change is accepted and a new version is released, the `CHANGELOG.md` records: + +* What changed (e.g., "column X moved from required to optional"). +* Why it changed (link to the GitHub issue). +* Which template versions are affected (e.g., "human v1.2.0, vertebrates v1.2.0"). +* What users need to do (e.g., "update your `comment[sdrf template]` to `human v1.2.0`; or keep `human v1.1.0` if you need the old behavior"). + +The CHANGELOG is the permanent human-readable record of all changes. + +==== 3. Validator Messages (after the change) + +The validator (sdrf-pipelines) is how most users discover that a newer version exists. Each sdrf-pipelines release ships with the full catalog of template versions. When it validates a file, it compares the declared template version to the latest available and reports: + +[source] +---- +INFO: Template 'human v1.2.0' is available. + Your file uses 'human v1.1.0' and is valid under that version. + See CHANGELOG for what changed: https://github.com/bigbio/proteomics-sample-metadata/blob/master/CHANGELOG.md +---- + +The validator **never fails** a file that is valid under its declared version. It only emits INFO messages pointing to newer versions. The user decides when to upgrade. + +=== Grace Period for Breaking Changes + +Not all changes are equal. The grace period depends on the severity: + +[cols="2,3,2", options="header"] +|=== +| Change type | Examples | Grace period + +| **Non-breaking** (new MINOR version) +| New optional column added; new template created; requirement relaxed (required → optional). +| None needed. Old files remain valid. New version is available immediately. + +| **Breaking** (new MAJOR version) +| Required column removed; column renamed; reserved word changed; template structure reorganized. +| **Minimum 12 months.** The change is announced in a MINOR release. The old template version continues to work. The breaking change only takes effect in the next MAJOR release. +|=== + +In practice: + +* **MINOR releases** never break existing files. They only add new template versions alongside old ones. Users upgrade at their own pace. +* **MAJOR releases** are the only point where old template versions may stop being shipped with the validator. They happen at most **once per year** and are announced at least one full MINOR release cycle in advance. +* **Between MAJOR releases**, every template version ever released continues to work. + +IMPORTANT: A MAJOR release does not delete old template files from the sdrf-templates repository — they remain in the git history and can always be retrieved. It only means the validator stops bundling them by default. Users can still validate against old templates by providing the YAML file path directly. + +=== What the Validator Reports + +[cols="2,1,3", options="header"] +|=== +| Situation | Level | Example + +| File is valid under its declared version; a newer version exists. +| **INFO** +| `INFO: 'human v1.2.0' is available. Your file uses 'human v1.1.0' and is valid.` + +| File declares no version; validated against latest. +| **WARNING** +| `WARNING: No 'comment[sdrf template]' found. Validating against latest 'human v1.2.0'. Declare a version for reproducible validation.` + +| File declares a version not bundled with this validator release. +| **ERROR** +| `ERROR: Template 'human v0.9.0' is not available in sdrf-pipelines 0.2.0. Available versions: 1.0.0, 1.1.0, 1.2.0. To validate against 'human v0.9.0', use sdrf-pipelines 0.0.x (pip install "sdrf-pipelines<0.1.0"). To upgrade your file to the latest template, run: parse_sdrf migrate --to human:1.1.0` +|=== + +When a template version is no longer bundled, the validator MUST tell the user **which older validator version still supports it** and how to install it. This ensures that no file is ever left without a validation path — even if the current validator has moved on. + +=== Role of sdrf-pipelines Releases + +Each sdrf-pipelines release declares which template versions it ships with. This is the coupling point between the validator and the templates: + +[cols="2,3", options="header"] +|=== +| sdrf-pipelines version | Bundled template versions + +| 0.1.x +| human 1.0.0, 1.1.0; ms-proteomics 1.0.0, 1.1.0; ... + +| 0.2.x +| human 1.0.0, 1.1.0, 1.2.0; ms-proteomics 1.0.0, 1.1.0, 1.2.0; ... + +| 1.0.0 (MAJOR) +| human 1.1.0, 1.2.0; ms-proteomics 1.1.0, 1.2.0; ... _(v1.0.0 templates dropped)_ +|=== + +This means: + +* **Upgrading sdrf-pipelines within a MINOR series** never removes old templates. Your existing files keep validating. +* **Upgrading sdrf-pipelines to a new MAJOR version** may drop the oldest template versions. The release notes list exactly which versions are dropped and how to migrate. +* **Pinning your sdrf-pipelines version** guarantees reproducible validation. If you don't upgrade the validator, nothing changes for you. + +=== Reannotation of Public Datasets + +While old template versions are preserved for backward compatibility, the long-term goal of the SDRF-Proteomics community is to **move public datasets forward** to the latest specification version. Richer, more consistent metadata benefits everyone — it enables better reanalysis, cross-dataset integration, and automated pipeline configuration. + +We encourage: + +* **Data repositories** (PRIDE, ProteomicsDB, MassIVE) to periodically reannotate their SDRF files to the latest templates, adding newly recommended columns and updating ontology terms. +* **Data submitters** to use the latest template versions when creating new SDRF files, rather than targeting old versions for convenience. +* **Community contributors** to help reannotate existing public datasets in the https://github.com/bigbio/proteomics-metadata-standard/tree/master/annotated-projects[annotated-projects] collection via pull requests. +* **Pipeline developers** to support and encourage the latest SDRF version in their tools, while gracefully handling older versions. + +The validator's INFO messages (pointing users to newer template versions) are intentionally designed to nudge the ecosystem forward without forcing immediate action. Over time, the expectation is that most public SDRF files will converge on the latest version, and that older versions will become increasingly rare in practice — even though they remain technically valid. + +TIP: When a new specification version is released, the core team will open a tracking issue listing public datasets that would benefit from reannotation. Community members can claim datasets and submit updated SDRF files via pull request. + +=== Summary + +The deprecation policy is built on three things that already exist — template versions, sdrf-pipelines releases, and the CHANGELOG — with no extra metadata or tooling: + +1. **Template versions are immutable.** Old versions stay. New versions are added alongside them. The validator loads whichever version the file declares. +2. **Communication happens through GitHub issues, CHANGELOG, and validator INFO messages.** Users are never surprised — they see what changed and decide when to upgrade. +3. **Only sdrf-pipelines MAJOR releases can drop old template versions**, and they happen at most once per year with advance notice. Between MAJOR releases, every file keeps working. When a template is dropped, the validator tells the user which older validator version still supports it. +4. **The community actively reannotates public datasets** to keep the ecosystem moving forward. Backward compatibility is a safety net, not an excuse to stay on old versions indefinitely. + +== Ontology Versioning + +=== Ontology Updates Are Non-Breaking + +Ontology updates (new terms added, term labels changed, terms deprecated) are treated as **non-breaking changes** because: + +1. SDRF files store ontology term *names* (and optionally accessions), not ontology version references. +2. Validators resolve terms against the latest available ontology version. +3. A term that was valid when the SDRF was created remains valid even if the ontology evolves. + +=== Handling Deprecated Ontology Terms + +When an ontology deprecates a term that is used in SDRF files: + +1. The validator emits a **warning** (not an error) suggesting the replacement term. +2. The deprecated term remains accepted for validation for at least **2 years** after the ontology deprecation. +3. The validator's `--strict` mode can optionally reject deprecated ontology terms. + +=== Ontology Version Pinning + +For reproducibility, the validator SHOULD support an `--ontology-date` flag that validates against ontology releases as of a specific date. This ensures that validation results are reproducible even as ontologies evolve. + +== Validator Behavior + +=== Version-Aware Validation + +The validator (sdrf-pipelines) MUST support version-aware validation: + +[source,bash] +---- +# Validate against a specific specification version +parse_sdrf validate-sdrf --sdrf_file data.sdrf.tsv --spec-version v1.1.0 + +# Validate using the version declared in the file (comment[sdrf version]) +parse_sdrf validate-sdrf --sdrf_file data.sdrf.tsv --use-declared-version + +# Validate with strict mode (reject all deprecated features) +parse_sdrf validate-sdrf --sdrf_file data.sdrf.tsv --strict + +# Validate with legacy mode (accept removed features from previous major version) +parse_sdrf validate-sdrf --sdrf_file data.sdrf.tsv --legacy v1.0.0 +---- + +=== Validation Output Levels + +The validator uses four severity levels for version-related messages: + +[cols="1,3,3", options="header"] +|=== +| Level | When used | Example message + +| **INFO** +| Feature announced for future deprecation. +| `INFO: characteristics[biological replicate] will become optional in v1.3.0` + +| **DEPRECATION** +| Feature is deprecated but still valid. +| `DEPRECATION: characteristics[biological replicate] is deprecated since v1.1.0. It will become optional in v1.3.0.` + +| **WARNING** +| Deprecated feature in enforcement phase, or ontology term deprecated. +| `WARNING: characteristics[biological replicate] is no longer required as of v1.3.0. Consider removing it if values are inferrable from source name.` + +| **ERROR** +| Removed feature, or validation failure. +| `ERROR: Column "old_column_name" was removed in v2.0.0. Use "new_column_name" instead.` +|=== + +=== Validation Hash and Provenance + +When validation succeeds, the validator can write a `comment[sdrf validation hash]` column containing a SHA-256 hash of the file content at validation time, along with the validator version and specification version. This provides an audit trail showing when and how the file was validated. + +[source,tsv] +---- +comment[sdrf validation hash] +sha256:a1b2c3...;validator=sdrf-pipelines:0.1.0;spec=v1.1.0 +---- + +== Release Process + +=== Specification Releases + +1. All changes are proposed via pull requests to the `dev` branch. +2. Each PR must be labeled with its change type: `breaking`, `feature`, `fix`, `deprecation`. +3. The `CHANGELOG.md` is updated with every merged PR. +4. MINOR releases happen on an approximate **6-month cadence** (not strictly scheduled, but targeting twice per year). +5. MAJOR releases happen only when accumulated breaking changes warrant it, with a minimum **12-month gap** between MAJOR releases. + +=== Template Releases + +1. Templates are released independently via the https://github.com/bigbio/sdrf-templates[sdrf-templates] repository. +2. The `templates.yaml` manifest is auto-generated on each merge to the master branch. +3. Template releases follow the specification's semantic versioning rules. +4. A template MINOR release MUST NOT introduce breaking changes relative to its parent specification version. + +=== Validator Releases + +1. The validator (sdrf-pipelines) is released via PyPI. +2. Each validator release specifies the specification versions it supports. +3. The validator MUST support at least the current and previous MINOR specification version. + +== Migration Guide Template + +When a breaking change is introduced, a migration guide MUST be provided. The guide follows this structure: + +[source,asciidoc] +---- +=== Migrating from vX.Y to vX.Z + +**What changed:** [description] + +**Why:** [rationale] + +**Who is affected:** [submitters / curators / tool developers / all] + +**What to do:** + +1. [Step-by-step migration instructions] +2. [Automated migration command, if available] + +**Automated migration:** + + parse_sdrf migrate --from vX.Y --to vX.Z --sdrf_file data.sdrf.tsv + +**Timeline:** Deprecated in vX.Y, enforced in vX.Z, removed in vA.0.0. +---- + +== Summary of Key Guarantees + +[cols="1,3", options="header"] +|=== +| Guarantee | Description + +| **No surprise breakage** +| User-facing breaking changes require at least 2 MINOR releases (12 months minimum) of deprecation warnings before enforcement. + +| **Version pinning** +| SDRF files that declare `comment[sdrf version]` are always validated against the rules of that version, not the latest. + +| **Backward compatibility within MAJOR** +| Any SDRF file valid under v1.0.0 remains valid under any v1.x release. Breaking changes only happen at MAJOR boundaries. + +| **Ontology stability** +| Deprecated ontology terms remain accepted for at least 2 years. Ontology updates never cause validation errors. + +| **Migration tooling** +| Every breaking change includes a `parse_sdrf migrate` command to automate file updates. + +| **Machine-readable metadata** +| Template definitions and compatibility rules are provided in YAML formats that can be consumed by CI pipelines for automated monitoring. +|=== + +== See Also + +* link:README.adoc[SDRF-Proteomics Core Specification] — see <> section +* link:metadata-guidelines/template-definitions.adoc[Template Definitions Guide] — YAML schema structure +* https://github.com/bigbio/proteomics-sample-metadata/issues[Open Issues and Future Decisions] — active community discussions +* https://semver.org/[Semantic Versioning 2.0.0] +* https://github.com/bigbio/proteomics-sample-metadata/issues/771[Issue #771: Versioning strategy discussion] diff --git a/sdrf-proteomics/metadata-guidelines/enzymes.tsv b/sdrf-proteomics/metadata-guidelines/enzymes.tsv deleted file mode 100644 index d513fe69f..000000000 --- a/sdrf-proteomics/metadata-guidelines/enzymes.tsv +++ /dev/null @@ -1,21 +0,0 @@ -AC=MS:1001918;NT=2-iodobenzoate;CS='(?<=W)' -AC=MS:1001303;NT=Arg-C;CS='(?<=R)(?\!P)' -AC=MS:1001304;NT=Asp-N;CS='(?=[BD])' -AC=MS:1001305;NT=Asp-N ambic;CS='(?=[DE])' -AC=MS:1001307;NT=CNBr;CS='(?<=M)' -AC=MS:1001306;NT=Chymotrypsin;CS='(?<=[FYWL])(?\!P)' -AC=MS:1001308;NT=Formic acid;CS='((?<=D))|((?=D))' -AC=MS:1001309;NT=Lys-C;CS='(?<=K)(?\!P)' -AC=MS:1001310;NT=Lys-C/P;CS='(?<=K)' -AC=MS:1001311;NT=PepsinA;CS='(?<=[FL])' -AC=MS:1001312;NT=TrypChymo;CS='(?<=[FYWLKR])(?\!P)' -AC=MS:1001251;NT=Trypsin;CS='(?<=[KR])(?\!P)' -AC=MS:1001313;NT=Trypsin/P;CS='(?<=[KR])' -AC=MS:1001314;NT=V8-DE;CS='(?<=[BDEZ])(?\!P)' -AC=MS:1001315;NT=V8-E;CS='(?<=[EZ])(?\!P)' -AC=MS:1001917;NT=glutamyl endopeptidase;CS='(?<=[^E]E)' -AC=MS:1001915;NT=leukocyte elastase;CS='(?<=[ALIV])(?!P)' -AC=MS:1001916;NT=proline endopeptidase;CS='(?<=[HKR]P)(?!P)' - - - diff --git a/sdrf-proteomics/metadata-guidelines/sdrf-terms.tsv b/sdrf-proteomics/metadata-guidelines/sdrf-terms.tsv index f340c1cdc..1e365e5c3 100644 --- a/sdrf-proteomics/metadata-guidelines/sdrf-terms.tsv +++ b/sdrf-proteomics/metadata-guidelines/sdrf-terms.tsv @@ -76,6 +76,14 @@ data file comment MS:1000577 minimum, default, human, cell-lines, single-cell, i modification parameters comment MS:1001055 dia-acquisition, dda-acquisition Unimod, PSI-MOD Post-translational modifications searched true false false precursor mass tolerance comment PRIDE:0000575 dia-acquisition, dda-acquisition pattern: number + ppm/Da Precursor mass tolerance for database search true true false fragment mass tolerance comment PRIDE:0000576 dia-acquisition, dda-acquisition pattern: number + ppm/Da Fragment mass tolerance for database search true true false +precursor min mz comment PRIDE:0000476 dia-acquisition, dda-acquisition numeric (m/z) MS method-defined minimum precursor m/z setting used to acquire the data true true false +precursor max mz comment PRIDE:0000477 dia-acquisition, dda-acquisition numeric (m/z) MS method-defined maximum precursor m/z setting used to acquire the data true true false +precursor min charge comment PRIDE:0000472 dia-acquisition, dda-acquisition integer MS method-defined minimum precursor charge state setting used to acquire the data true true false +precursor max charge comment PRIDE:0000473 dia-acquisition, dda-acquisition integer MS method-defined maximum precursor charge state setting used to acquire the data true true false +min retention time comment PRIDE:0000474 dia-acquisition, dda-acquisition numeric (minutes) LC method-defined minimum retention time setting used to acquire the data true true false +max retention time comment PRIDE:0000475 dia-acquisition, dda-acquisition numeric (minutes) LC method-defined maximum retention time setting used to acquire the data true true false +min ion mobility comment PRIDE:0000841 dia-acquisition, dda-acquisition numeric (1/K0 or Vs/cm2) MS method-defined minimum ion mobility setting used to acquire the data true true false +max ion mobility comment PRIDE:0000842 dia-acquisition, dda-acquisition numeric (1/K0 or Vs/cm2) MS method-defined maximum ion mobility setting used to acquire the data true true false passage number comment to be filled cell-lines integer or range Passage number of the cell line true true false cell line source comment to be filled cell-lines free text Repository or source from which the cell line was obtained true true false authentication method comment to be filled cell-lines free text Method used to authenticate the cell line identity true true false diff --git a/sdrf-proteomics/metadata-guidelines/template-definitions.adoc b/sdrf-proteomics/metadata-guidelines/template-definitions.adoc index 70cdb6e32..502d33a41 100644 --- a/sdrf-proteomics/metadata-guidelines/template-definitions.adoc +++ b/sdrf-proteomics/metadata-guidelines/template-definitions.adoc @@ -479,85 +479,263 @@ ms-proteomics + human + dda-acquisition + cell-lines technology sample experiment experiment ---- -== Creating New Templates +== Creating a New Template: Step-by-Step Guide -=== Checklist +This section walks through creating a new YAML template from scratch. The example creates a hypothetical `top-down` template for top-down proteomics experiments. -When creating a new template: +=== Step 1: Choose the Parent Template -1. [ ] Choose appropriate parent template (`extends`) -2. [ ] Set correct layer (`layer`) -3. [ ] Define mutual exclusivity if needed (`mutually_exclusive_with`) -4. [ ] Use semantic versioning (`version`) -5. [ ] Document all columns with clear descriptions -6. [ ] Add appropriate validators (ontology, pattern, values) -7. [ ] Include examples in validators -8. [ ] Test with real SDRF files +Every template (except `base`) must extend a parent. Choose based on where your template fits in the hierarchy: -=== Best Practices +[cols="2,3,2", options="header"] +|=== +| You are defining... | Extend | Layer + +| A new proteomics technology (rare) +| `base` +| `technology` + +| A new organism type +| `base` +| `sample` + +| A new MS experiment type +| `ms-proteomics` +| `experiment` + +| A new affinity experiment type +| `affinity-proteomics` +| `experiment` +|=== + +For our example, top-down proteomics is a mass spectrometry technique, so we extend `ms-proteomics` and use the `experiment` layer. + +=== Step 2: Create the Directory Structure + +Templates live in the https://github.com/bigbio/sdrf-templates[sdrf-templates] repository. Each template has a versioned directory: + +[source] +---- +sdrf-templates/ +└── top-down/ + └── 1.0.0-dev/ + ├── top-down.yaml # Template schema (required) + └── top-down.sdrf.tsv # Example SDRF file (required) +---- -- **Follow naming conventions**: Use lowercase with hyphens for template names -- **Be specific in descriptions**: Include what values are expected and when to use `not applicable` -- **Use ontologies when possible**: Prefer ontology validators over pattern validators for controlled vocabularies -- **Provide examples**: Always include examples in validator definitions -- **Document special cases**: Explain when `not applicable` or `not available` are appropriate +Use the `-dev` suffix for the initial version until the template is reviewed and accepted by the community. -=== Example: Creating an Experiment Template +=== Step 3: Write the YAML Schema +Start with the template metadata, then define the columns your experiment type needs. Only define columns that are **new or different** from the parent — inherited columns from `ms-proteomics` do not need to be repeated. + +.top-down/1.0.0-dev/top-down.yaml [source,yaml] ---- -name: my-experiment -description: SDRF template for my specific experiment type. - Extends ms-proteomics with experiment-specific columns. -version: 1.0.0 +# =========================================================================== +# TEMPLATE METADATA +# =========================================================================== +name: top-down +description: > + SDRF template for top-down proteomics experiments where intact proteins + are analyzed without prior enzymatic digestion. Extends ms-proteomics + with top-down-specific columns for intact mass analysis. +version: 1.0.0-dev extends: ms-proteomics usable_alone: false layer: experiment +# =========================================================================== +# COLUMN DEFINITIONS — only columns new or changed from parent +# =========================================================================== columns: - - name: characteristics[my field] - description: Description of what this field captures - requirement: required - allow_not_applicable: false + + # --- New column: protein separation method --- + - name: comment[protein separation method] + description: > + Method used to separate intact proteins before MS analysis + (e.g., GELFrEE, SEC, RPLC, CZE). Use "not applicable" if no + separation was performed (direct infusion). + requirement: recommended + allow_not_applicable: true allow_not_available: true validators: - validator_name: ontology params: ontologies: - - efo + - ms + - pride error_level: warning - description: Should be a valid EFO term + description: > + Protein separation method should be a valid PSI-MS or PRIDE + ontology term. + examples: + - size exclusion chromatography + - gel electrophoresis + - capillary zone electrophoresis + - reversed-phase liquid chromatography + - not applicable + + # --- New column: intact mass range --- + - name: comment[precursor mass range] + description: > + The mass range of intact protein precursors analyzed, in Daltons. + Format: "min-max Da" (e.g., "5000-50000 Da"). + requirement: optional + allow_not_available: true + validators: + - validator_name: pattern + params: + pattern: ^(\d+-\d+\s*Da|not available)$ + case_sensitive: false + description: > + Precursor mass range in format "min-max Da". examples: - - example value 1 - - example value 2 + - 5000-50000 Da + - 10000-100000 Da + - not available + + # --- Override inherited column: make cleavage agent "not applicable" --- + # In top-down experiments, proteins are not digested, so the cleavage + # agent column should always be "not applicable". We override the + # inherited column to make this explicit. + - name: comment[cleavage agent details] + description: > + Cleavage agent is not applicable for top-down experiments where + intact proteins are analyzed. Use "not applicable". + requirement: required + allow_not_applicable: true + allow_not_available: false + validators: + - validator_name: values + params: + values: + - not applicable + error_level: warning + description: > + Top-down experiments analyze intact proteins. Cleavage agent + should be "not applicable". ---- -== LinkML Schema +Key points demonstrated in this example: -For tools requiring a formal schema language, a https://linkml.io/[LinkML] representation is available: +* **New columns** (`comment[protein separation method]`, `comment[precursor mass range]`) are defined with their own validators. +* **Overriding an inherited column** (`comment[cleavage agent details]`) restricts the parent's definition — in this case, forcing the value to "not applicable" since top-down experiments don't use enzymatic digestion. +* **Descriptions explain the "why"** — not just what the field is, but when to use `not applicable` and what format to follow. +* **Examples are always provided** in validators — they serve as documentation and can be used by tools to generate autocomplete suggestions. -**File:** link:../templates/sdrf-template-schema.linkml.yaml[sdrf-template-schema.linkml.yaml] +=== Step 4: Create an Example SDRF File -LinkML provides: +Every template must include an example `.sdrf.tsv` file that passes validation. This file demonstrates correct usage and serves as a starting point for users. -- Formal type definitions compatible with JSON Schema, SHACL, and OWL -- Code generation for Python, Java, and other languages -- Integration with semantic web tools +.top-down/1.0.0-dev/top-down.sdrf.tsv (tab-separated) +[source,tsv] +---- +source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[biological replicate] assay name technology type comment[proteomics data acquisition method] comment[label] comment[instrument] comment[cleavage agent details] comment[fraction identifier] comment[technical replicate] comment[data file] comment[protein separation method] comment[precursor mass range] factor value[disease] +sample_1 homo sapiens heart normal 1 run_1 proteomic profiling by mass spectrometry data-dependent acquisition label free sample LTQ Orbitrap Elite not applicable 1 1 sample_1.raw gel electrophoresis 10000-100000 Da normal +sample_2 homo sapiens heart dilated cardiomyopathy 1 run_2 proteomic profiling by mass spectrometry data-dependent acquisition label free sample LTQ Orbitrap Elite not applicable 1 1 sample_2.raw gel electrophoresis 10000-100000 Da dilated cardiomyopathy +---- -**Usage examples:** +=== Step 5: Create the Documentation -[source,bash] +Add a `README.adoc` file in the specification repository under `sdrf-proteomics/templates/top-down/`: + +.sdrf-proteomics/templates/top-down/README.adoc (outline) +[source,asciidoc] ---- -# Generate JSON Schema from LinkML -gen-json-schema sdrf-template-schema.linkml.yaml > sdrf-template-schema.json += Top-Down Proteomics Template + +== Overview +[describe when to use this template] -# Generate Python dataclasses -gen-python sdrf-template-schema.linkml.yaml > sdrf_template.py +== Columns +[table of columns with requirement levels, descriptions, and examples] -# Validate a template file against the schema -linkml-validate -s sdrf-template-schema.linkml.yaml templates/human/human.yaml +== Examples +[link to the example SDRF file] + +== See Also +* link:../../README.adoc[SDRF-Proteomics Specification] +* link:../ms-proteomics/README.adoc[MS-Proteomics Template] (parent) ---- +=== Step 6: Test Locally + +Before submitting, validate your example file against your template: + +[source,bash] +---- +# Install or upgrade the validator +pip install sdrf-pipelines + +# Validate the example SDRF file with the new template +parse_sdrf validate-sdrf \ + --sdrf_file top-down/1.0.0-dev/top-down.sdrf.tsv \ + --template ms-proteomics \ + --custom_template top-down/1.0.0-dev/top-down.yaml +---- + +Check that: + +* [ ] The example file passes validation with no errors. +* [ ] All required columns from the parent template are present. +* [ ] New columns validate correctly (ontology terms resolve, patterns match). +* [ ] The `not applicable` and `not available` values work where expected. + +=== Step 7: Submit a Pull Request + +Submit the template to the https://github.com/bigbio/sdrf-templates[sdrf-templates] repository via pull request. The PR must include: + +1. The YAML schema file (`top-down/1.0.0-dev/top-down.yaml`). +2. The example SDRF file (`top-down/1.0.0-dev/top-down.sdrf.tsv`). +3. A PR description explaining: what experiment type the template covers, why it is needed, and which parent it extends. + +The corresponding `README.adoc` documentation goes into a separate PR to the https://github.com/bigbio/proteomics-metadata-standard[specification repository] under `sdrf-proteomics/templates/top-down/`. + +Once reviewed and merged, the template appears in the `templates.yaml` manifest and becomes available to all users of sdrf-pipelines. The `-dev` suffix is removed when the template is promoted to a stable release (e.g., `1.0.0`). + +=== Quick Reference: Template YAML Checklist + +[cols="1,3", options="header"] +|=== +| Field | Checklist + +| `name` +| Lowercase with hyphens. Unique across all templates. + +| `description` +| One or two sentences. Mention what it extends and what experiment type it covers. + +| `version` +| Use `X.Y.Z-dev` for new templates. Remove `-dev` after community review. + +| `extends` +| Must be an existing template name (`ms-proteomics`, `affinity-proteomics`, or `base`). + +| `layer` +| One of: `technology`, `sample`, `experiment`. + +| `usable_alone` +| Almost always `false` for experiment and sample layer templates. + +| `columns` +| Only define columns that are **new** or that **override** a parent column. Do not repeat inherited columns unchanged. + +| Validators +| Use `ontology` for controlled vocabularies, `pattern` for structured text, `values` for fixed lists. Always include `examples`. + +| Reserved words +| Set `allow_not_applicable`, `allow_not_available`, `allow_pooled`, `allow_anonymized` as appropriate for each column. +|=== + +=== Best Practices + +- **Only define what is new.** Inherited columns from the parent template do not need to be repeated. Only add a column to your template if it is new or if you need to override the parent's definition (e.g., change `requirement` from `optional` to `required`, or restrict allowed values). +- **Use ontologies over patterns.** Prefer `ontology` validators for fields where controlled vocabulary terms exist. Use `pattern` validators only for structured free text (ages, mass ranges, file names). +- **Provide clear descriptions.** Explain not just what the field is, but when to use `not applicable` vs `not available`, and give format guidance. +- **Always include examples.** Examples in validators serve as documentation and help tools generate suggestions. +- **Test with real data.** Your example SDRF should represent a realistic experiment, not a toy file. If possible, base it on a real public dataset. + == Validation Commands Validate SDRF files using sdrf-pipelines: @@ -583,6 +761,5 @@ parse_sdrf check-templates --templates ms-proteomics,human,dda-acquisition == References - https://github.com/bigbio/sdrf-pipelines[sdrf-pipelines] - SDRF validation tool -- https://linkml.io/[LinkML] - Linked data modeling language - https://github.com/bigbio/proteomics-metadata-standard/tree/master/sdrf-proteomics/templates[Template files on GitHub] - link:../README.adoc[SDRF-Proteomics Specification] diff --git a/sdrf-proteomics/templates/affinity-proteomics/README.adoc b/sdrf-proteomics/templates/affinity-proteomics/README.adoc index 3ce9d2c2f..87301cc9b 100644 --- a/sdrf-proteomics/templates/affinity-proteomics/README.adoc +++ b/sdrf-proteomics/templates/affinity-proteomics/README.adoc @@ -579,9 +579,9 @@ NOTE: MS-specific validations (label, cleavage agent, etc.) do not apply to affi [[template-file]] == Template File -The affinity proteomics SDRF template file is available in this directory: +The affinity proteomics SDRF template file is available in the sdrf-templates repository: -- link:affinity-proteomics-template.sdrf.tsv[affinity-proteomics-template.sdrf.tsv] +- link:https://github.com/bigbio/sdrf-templates/blob/main/affinity-proteomics/1.1.0/affinity-proteomics.sdrf.tsv[affinity-proteomics.sdrf.tsv] [[related-templates]] == Related Templates diff --git a/sdrf-proteomics/templates/base/README.adoc b/sdrf-proteomics/templates/base/README.adoc index 524a7a559..7c31277c2 100644 --- a/sdrf-proteomics/templates/base/README.adoc +++ b/sdrf-proteomics/templates/base/README.adoc @@ -28,5 +28,5 @@ Use one of the following technology templates instead: [[files]] == Template Files -* `base.yaml` - YAML template definition for validation -* `base-template.sdrf.tsv` - Column header template +- link:https://github.com/bigbio/sdrf-templates/blob/main/base/1.1.0/base.yaml[base.yaml] - YAML template definition for validation +- link:https://github.com/bigbio/sdrf-templates/blob/main/base/1.1.0/base.sdrf.tsv[base.sdrf.tsv] - Column header template diff --git a/sdrf-proteomics/templates/cell-lines/README.adoc b/sdrf-proteomics/templates/cell-lines/README.adoc index 24c7f3359..6d8b8ce41 100644 --- a/sdrf-proteomics/templates/cell-lines/README.adoc +++ b/sdrf-proteomics/templates/cell-lines/README.adoc @@ -582,9 +582,9 @@ For cell lines where donor information is unknown: [[template-file]] == Template File -The cell line SDRF template file is available in this directory: +The cell line SDRF template file is available in the sdrf-templates repository: -- link:cell-lines-template.sdrf.tsv[cell-lines-template.sdrf.tsv] +- link:https://github.com/bigbio/sdrf-templates/blob/main/cell-lines/1.1.0/cell-lines.sdrf.tsv[cell-lines.sdrf.tsv] [[validation]] == Validation diff --git a/sdrf-proteomics/templates/crosslinking/README.adoc b/sdrf-proteomics/templates/crosslinking/README.adoc index 75f5db5b9..79410b1c5 100644 --- a/sdrf-proteomics/templates/crosslinking/README.adoc +++ b/sdrf-proteomics/templates/crosslinking/README.adoc @@ -493,6 +493,13 @@ parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv \ --template crosslinking ---- +[[template-file]] +== Template File + +The crosslinking SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/crosslinking/1.1.0/crosslinking.sdrf.tsv[crosslinking.sdrf.tsv] + [[references]] == References diff --git a/sdrf-proteomics/templates/dda-acquisition/README.adoc b/sdrf-proteomics/templates/dda-acquisition/README.adoc index c9dd3c39c..395cff941 100644 --- a/sdrf-proteomics/templates/dda-acquisition/README.adoc +++ b/sdrf-proteomics/templates/dda-acquisition/README.adoc @@ -125,6 +125,62 @@ ifdef::backend-html5[] PSI-MS (MS:1000443) NT=orbitrap;AC=MS:1000484 + +comment[precursor min mz] +OPTIONAL +MS method-defined minimum precursor m/z setting used to acquire the data +PRIDE (PRIDE:0000476) +350, 400 + + +comment[precursor max mz] +OPTIONAL +MS method-defined maximum precursor m/z setting used to acquire the data +PRIDE (PRIDE:0000477) +1600, 2000 + + +comment[precursor min charge] +OPTIONAL +MS method-defined minimum precursor charge state setting used to acquire the data +PRIDE (PRIDE:0000472) +1, 2 + + +comment[precursor max charge] +OPTIONAL +MS method-defined maximum precursor charge state setting used to acquire the data +PRIDE (PRIDE:0000473) +4, 6 + + +comment[min retention time] +OPTIONAL +LC method-defined minimum retention time setting used to acquire the data +PRIDE (PRIDE:0000474) +0, 5 + + +comment[max retention time] +OPTIONAL +LC method-defined maximum retention time setting used to acquire the data +PRIDE (PRIDE:0000475) +120, 90 + + +comment[min ion mobility] +OPTIONAL +MS method-defined minimum ion mobility setting used to acquire the data +PRIDE (PRIDE:0000841) +0.6, 0.7 + + +comment[max ion mobility] +OPTIONAL +MS method-defined maximum ion mobility setting used to acquire the data +PRIDE (PRIDE:0000842) +1.4, 1.6 + ++++ @@ -140,6 +196,14 @@ ifndef::backend-html5[] |comment[precursor mass tolerance] |OPTIONAL |Mass tolerance for precursor ions in database search |Numeric value with unit |10 ppm, 20 ppm |comment[fragment mass tolerance] |OPTIONAL |Mass tolerance for fragment ions in database search |Numeric value with unit |0.02 Da, 20 ppm |comment[MS2 mass analyzer] |OPTIONAL |Mass analyzer used for MS2 acquisition |PSI-MS (MS:1000443) |NT=orbitrap;AC=MS:1000484 +|comment[precursor min mz] |OPTIONAL |MS method-defined minimum precursor m/z setting used to acquire the data |PRIDE (PRIDE:0000476) |350, 400 +|comment[precursor max mz] |OPTIONAL |MS method-defined maximum precursor m/z setting used to acquire the data |PRIDE (PRIDE:0000477) |1600, 2000 +|comment[precursor min charge] |OPTIONAL |MS method-defined minimum precursor charge state setting used to acquire the data |PRIDE (PRIDE:0000472) |1, 2 +|comment[precursor max charge] |OPTIONAL |MS method-defined maximum precursor charge state setting used to acquire the data |PRIDE (PRIDE:0000473) |4, 6 +|comment[min retention time] |OPTIONAL |LC method-defined minimum retention time setting used to acquire the data |PRIDE (PRIDE:0000474) |0, 5 +|comment[max retention time] |OPTIONAL |LC method-defined maximum retention time setting used to acquire the data |PRIDE (PRIDE:0000475) |120, 90 +|comment[min ion mobility] |OPTIONAL |MS method-defined minimum ion mobility setting used to acquire the data |PRIDE (PRIDE:0000841) |0.6, 0.7 +|comment[max ion mobility] |OPTIONAL |MS method-defined maximum ion mobility setting used to acquire the data |PRIDE (PRIDE:0000842) |1.4, 1.6 |=== endif::[] @@ -354,9 +418,9 @@ NOTE: For TMT experiments, multiple samples share the same raw file. Use `commen [[template-file]] == Template File -The DDA SDRF template file is available in this directory: +The DDA SDRF template file is available in the sdrf-templates repository: -- link:dda-acquisition-template.sdrf.tsv[dda-acquisition-template.sdrf.tsv] +- link:https://github.com/bigbio/sdrf-templates/blob/main/dda-acquisition/1.1.0/dda-acquisition.sdrf.tsv[dda-acquisition.sdrf.tsv] [[validation]] == Validation diff --git a/sdrf-proteomics/templates/dia-acquisition/README.adoc b/sdrf-proteomics/templates/dia-acquisition/README.adoc index e5db1e397..0c95399a9 100644 --- a/sdrf-proteomics/templates/dia-acquisition/README.adoc +++ b/sdrf-proteomics/templates/dia-acquisition/README.adoc @@ -186,6 +186,62 @@ ifdef::backend-html5[] Numeric value with unit 0.02 Da, 20 ppm + +comment[precursor min mz] +OPTIONAL +MS method-defined minimum precursor m/z setting used to acquire the data +PRIDE (PRIDE:0000476) +350, 400 + + +comment[precursor max mz] +OPTIONAL +MS method-defined maximum precursor m/z setting used to acquire the data +PRIDE (PRIDE:0000477) +1600, 2000 + + +comment[precursor min charge] +OPTIONAL +MS method-defined minimum precursor charge state setting used to acquire the data +PRIDE (PRIDE:0000472) +1, 2 + + +comment[precursor max charge] +OPTIONAL +MS method-defined maximum precursor charge state setting used to acquire the data +PRIDE (PRIDE:0000473) +4, 6 + + +comment[min retention time] +OPTIONAL +LC method-defined minimum retention time setting used to acquire the data +PRIDE (PRIDE:0000474) +0, 5 + + +comment[max retention time] +OPTIONAL +LC method-defined maximum retention time setting used to acquire the data +PRIDE (PRIDE:0000475) +120, 90 + + +comment[min ion mobility] +OPTIONAL +MS method-defined minimum ion mobility setting used to acquire the data +PRIDE (PRIDE:0000841) +0.6, 0.7 + + +comment[max ion mobility] +OPTIONAL +MS method-defined maximum ion mobility setting used to acquire the data +PRIDE (PRIDE:0000842) +1.4, 1.6 + ++++ @@ -198,6 +254,14 @@ ifndef::backend-html5[] |comment[precursor mass tolerance] |OPTIONAL |Mass tolerance for precursor ions in database search |Numeric value with unit |10 ppm, 20 ppm |comment[fragment mass tolerance] |OPTIONAL |Mass tolerance for fragment ions in database search |Numeric value with unit |0.02 Da, 20 ppm +|comment[precursor min mz] |OPTIONAL |MS method-defined minimum precursor m/z setting used to acquire the data |PRIDE (PRIDE:0000476) |350, 400 +|comment[precursor max mz] |OPTIONAL |MS method-defined maximum precursor m/z setting used to acquire the data |PRIDE (PRIDE:0000477) |1600, 2000 +|comment[precursor min charge] |OPTIONAL |MS method-defined minimum precursor charge state setting used to acquire the data |PRIDE (PRIDE:0000472) |1, 2 +|comment[precursor max charge] |OPTIONAL |MS method-defined maximum precursor charge state setting used to acquire the data |PRIDE (PRIDE:0000473) |4, 6 +|comment[min retention time] |OPTIONAL |LC method-defined minimum retention time setting used to acquire the data |PRIDE (PRIDE:0000474) |0, 5 +|comment[max retention time] |OPTIONAL |LC method-defined maximum retention time setting used to acquire the data |PRIDE (PRIDE:0000475) |120, 90 +|comment[min ion mobility] |OPTIONAL |MS method-defined minimum ion mobility setting used to acquire the data |PRIDE (PRIDE:0000841) |0.6, 0.7 +|comment[max ion mobility] |OPTIONAL |MS method-defined maximum ion mobility setting used to acquire the data |PRIDE (PRIDE:0000842) |1.4, 1.6 |=== endif::[] @@ -381,18 +445,20 @@ NOTE: The `...` column indicates omitted columns (organism part, biological repl [[diaPASEF-example]] == diaPASEF Example -For ion mobility-enhanced DIA (diaPASEF), additional columns may be relevant: +For ion mobility-enhanced DIA (diaPASEF), the following columns capture the ion mobility acquisition window: |=== -|Column Name |Description |Example Values +|Column Name |Description |PRIDE CV |Example Values -|comment[ion mobility] -|Ion mobility separation used -|TIMS, FAIMS, not applicable +|comment[min ion mobility] +|MS method-defined minimum ion mobility setting used to acquire the data +|PRIDE:0000841 +|0.6 -|comment[1/K0 range] -|Ion mobility range -|0.6-1.6 1/K0 +|comment[max ion mobility] +|MS method-defined maximum ion mobility setting used to acquire the data +|PRIDE:0000842 +|1.6 |=== [[best-practices]] @@ -417,9 +483,9 @@ For ion mobility-enhanced DIA (diaPASEF), additional columns may be relevant: [[template-file]] == Template File -The DIA SDRF template file is available in this directory: +The DIA SDRF template file is available in the sdrf-templates repository: -- link:dia-acquisition-template.sdrf.tsv[dia-acquisition-template.sdrf.tsv] +- link:https://github.com/bigbio/sdrf-templates/blob/main/dia-acquisition/1.1.0/dia-acquisition.sdrf.tsv[dia-acquisition.sdrf.tsv] [[validation]] == Validation diff --git a/sdrf-proteomics/templates/human/README.adoc b/sdrf-proteomics/templates/human/README.adoc index ea9689fde..69c1f2b02 100644 --- a/sdrf-proteomics/templates/human/README.adoc +++ b/sdrf-proteomics/templates/human/README.adoc @@ -1655,3 +1655,10 @@ For female patients. Values: `pre-menopausal`, `peri-menopausal`, `post-menopaus === Staging Systems - TNM Classification: AJCC Cancer Staging Manual - Ann Arbor Staging: Lymphoma staging system + +[[template-file]] +== Template File + +The human SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/human/1.1.0/human.sdrf.tsv[human.sdrf.tsv] diff --git a/sdrf-proteomics/templates/immunopeptidomics/README.adoc b/sdrf-proteomics/templates/immunopeptidomics/README.adoc index 80f21882b..436ca704d 100644 --- a/sdrf-proteomics/templates/immunopeptidomics/README.adoc +++ b/sdrf-proteomics/templates/immunopeptidomics/README.adoc @@ -284,9 +284,9 @@ Examples of immunopeptidomics datasets annotated with SDRF-Proteomics: [[template-file]] == Template File -The immunopeptidomics SDRF template file is available in this directory: +The immunopeptidomics SDRF template file is available in the sdrf-templates repository: -- link:immunopeptidomics-template.sdrf.tsv[immunopeptidomics-template.sdrf.tsv] +- link:https://github.com/bigbio/sdrf-templates/blob/main/immunopeptidomics/1.0.0-dev/immunopeptidomics.sdrf.tsv[immunopeptidomics.sdrf.tsv] [[validation]] == Validation diff --git a/sdrf-proteomics/templates/invertebrates/README.adoc b/sdrf-proteomics/templates/invertebrates/README.adoc index 4228547c5..a70c842ae 100644 --- a/sdrf-proteomics/templates/invertebrates/README.adoc +++ b/sdrf-proteomics/templates/invertebrates/README.adoc @@ -200,6 +200,13 @@ pip install sdrf-pipelines sdrf_validate --sdrf_file your_file.sdrf.tsv --template invertebrates ---- +[[template-file]] +== Template File + +The invertebrates SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/invertebrates/1.1.0/invertebrates.sdrf.tsv[invertebrates.sdrf.tsv] + [[examples]] == Examples diff --git a/sdrf-proteomics/templates/metaproteomics/README.adoc b/sdrf-proteomics/templates/metaproteomics/README.adoc index 887071343..a256e2634 100644 --- a/sdrf-proteomics/templates/metaproteomics/README.adoc +++ b/sdrf-proteomics/templates/metaproteomics/README.adoc @@ -505,9 +505,9 @@ For metaproteomics, consider documenting: [[template-file]] == Template File -The metaproteomics SDRF template file is available in this directory: +The metaproteomics SDRF template file is available in the sdrf-templates repository: -- link:metaproteomics-template.sdrf.tsv[metaproteomics-template.sdrf.tsv] +- link:https://github.com/bigbio/sdrf-templates/blob/main/metaproteomics/1.0.0-dev/metaproteomics.sdrf.tsv[metaproteomics.sdrf.tsv] [[validation]] == Validation diff --git a/sdrf-proteomics/templates/ms-proteomics/README.adoc b/sdrf-proteomics/templates/ms-proteomics/README.adoc index 0be741c20..9a8bea3f4 100644 --- a/sdrf-proteomics/templates/ms-proteomics/README.adoc +++ b/sdrf-proteomics/templates/ms-proteomics/README.adoc @@ -473,6 +473,13 @@ endif::[] * link:../dia-acquisition/README.adoc[DIA Acquisition] - DIA-specific columns * link:../human/README.adoc[Human Template] - Human sample metadata +[[template-file]] +== Template File + +The MS-Proteomics SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/ms-proteomics/1.1.0/ms-proteomics.sdrf.tsv[ms-proteomics.sdrf.tsv] + == Ontologies - PRIDE Ontology: https://www.ebi.ac.uk/ols4/ontologies/pride diff --git a/sdrf-proteomics/templates/olink/README.adoc b/sdrf-proteomics/templates/olink/README.adoc index 2f445087d..719d6c39e 100644 --- a/sdrf-proteomics/templates/olink/README.adoc +++ b/sdrf-proteomics/templates/olink/README.adoc @@ -287,7 +287,9 @@ parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv --template olink [[template-file]] == Template File -- link:olink-template.sdrf.tsv[olink-template.sdrf.tsv] +The Olink SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/olink/1.0.0/olink.sdrf.tsv[olink.sdrf.tsv] [[related-templates]] == Related Templates diff --git a/sdrf-proteomics/templates/plants/README.adoc b/sdrf-proteomics/templates/plants/README.adoc index c37b656e4..617f6e745 100644 --- a/sdrf-proteomics/templates/plants/README.adoc +++ b/sdrf-proteomics/templates/plants/README.adoc @@ -208,6 +208,13 @@ pip install sdrf-pipelines sdrf_validate --sdrf_file your_file.sdrf.tsv --template plants ---- +[[template-file]] +== Template File + +The plants SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/plants/1.1.0/plants.sdrf.tsv[plants.sdrf.tsv] + [[examples]] == Examples diff --git a/sdrf-proteomics/templates/sdrf-template-schema.linkml.yaml b/sdrf-proteomics/templates/sdrf-template-schema.linkml.yaml deleted file mode 100644 index 4872b6db1..000000000 --- a/sdrf-proteomics/templates/sdrf-template-schema.linkml.yaml +++ /dev/null @@ -1,377 +0,0 @@ -# SDRF Template Schema - LinkML Representation -# This schema defines the structure for SDRF proteomics template YAML files -# See: https://linkml.io/ - -id: https://github.com/bigbio/proteomics-metadata-standard/sdrf-template-schema -name: sdrf-template-schema -title: SDRF Proteomics Template Schema -description: >- - LinkML schema for defining SDRF proteomics template validation rules. - Templates define column requirements, validators, and inheritance relationships - for Sample and Data Relationship Format (SDRF) files in proteomics. -license: Apache-2.0 -version: 1.1.0 - -prefixes: - linkml: https://w3id.org/linkml/ - sdrf: https://github.com/bigbio/proteomics-metadata-standard/ - schema: http://schema.org/ - OBI: http://purl.obolibrary.org/obo/OBI_ - EFO: http://www.ebi.ac.uk/efo/EFO_ - -default_prefix: sdrf -default_range: string - -imports: - - linkml:types - -# ============================================================================= -# ENUMERATIONS -# ============================================================================= - -enums: - RequirementLevel: - description: Column requirement levels - permissible_values: - required: - description: Column must be present in the SDRF file - recommended: - description: Column should be present for complete metadata - optional: - description: Column may be present for additional context - - TemplateLayer: - description: Template layers in the inheritance hierarchy - permissible_values: - base: - description: Core template with shared columns - technology: - description: Technology-specific template (MS or affinity) - sample: - description: Sample/organism-specific template - experiment: - description: Experiment methodology-specific template - - ColumnType: - description: Data types for column values - permissible_values: - string: - description: Free text or controlled vocabulary string - integer: - description: Whole number value - - Cardinality: - description: Value cardinality for column cells - permissible_values: - single: - description: Single value per cell - multiple: - description: Multiple semicolon-separated values allowed - - ErrorLevel: - description: Severity level for validation failures - permissible_values: - error: - description: Validation failure that must be fixed - warning: - description: Validation issue that should be reviewed - - ValidatorType: - description: Available validator types - permissible_values: - ontology: - description: Validate against ontology terms - pattern: - description: Validate against regex pattern - values: - description: Validate against fixed value list - single_cardinality_validator: - description: Ensure single value per cell - min_columns: - description: Ensure minimum column count - trailing_whitespace_validator: - description: Check for trailing whitespace - column_order: - description: Validate column ordering - empty_cells: - description: Check for empty cells - combination_of_columns_no_duplicate_validator: - description: Ensure unique column combinations - - OntologyPrefix: - description: Supported ontology prefixes for validation - permissible_values: - ncbitaxon: - description: NCBI Taxonomy (organism) - uberon: - description: Uberon Anatomy Ontology (organism part) - cl: - description: Cell Ontology (cell type) - clo: - description: Cell Line Ontology (cell line) - bto: - description: BRENDA Tissue Ontology (tissue, cell type) - mondo: - description: MONDO Disease Ontology (disease) - efo: - description: Experimental Factor Ontology (disease, experimental factors) - doid: - description: Disease Ontology (disease) - pato: - description: Phenotype and Trait Ontology (phenotypes) - ms: - description: PSI-MS Ontology (instrument, modifications) - pride: - description: PRIDE Ontology (acquisition method, labels) - hancestro: - description: Human Ancestry Ontology (ancestry) - hsapdv: - description: Human Developmental Stages (developmental stage) - cellosaurus: - description: Cellosaurus (cell line identifiers) - unimod: - description: Unimod (protein modifications) - -# ============================================================================= -# CLASSES -# ============================================================================= - -classes: - Template: - description: >- - Root class representing an SDRF template definition. - Templates define validation rules for SDRF files. - attributes: - name: - description: Unique template identifier used in validation commands - range: string - required: true - identifier: true - pattern: "^[a-z][a-z0-9-]*$" - examples: - - value: human - - value: ms-proteomics - - value: crosslinking - - description: - description: Human-readable description of the template purpose - range: string - required: true - - version: - description: Semantic version of the template (should match spec version) - range: string - required: true - pattern: "^\\d+\\.\\d+\\.\\d+(-[a-z0-9]+)?$" - examples: - - value: "1.1.0" - - value: "1.0.0-dev" - - extends: - description: Parent template to inherit columns and validators from - range: string - required: false - examples: - - value: base - - value: ms-proteomics - - usable_alone: - description: Whether template can be used without other templates - range: boolean - required: false - ifabsent: "true" - - layer: - description: Template layer in the inheritance hierarchy - range: TemplateLayer - required: false - - validators: - description: Template-level validators applied to entire SDRF file - range: Validator - multivalued: true - required: false - inlined: true - inlined_as_list: true - - columns: - description: Column definitions with validation rules - range: Column - multivalued: true - required: true - inlined: true - inlined_as_list: true - - Column: - description: >- - Definition of an SDRF column including requirements and validation rules. - attributes: - name: - description: >- - Column header name exactly as it appears in the SDRF file. - Uses format: characteristics[term], comment[term], or plain name. - range: string - required: true - examples: - - value: "source name" - - value: "characteristics[organism]" - - value: "comment[label]" - - description: - description: Human-readable description of column contents and usage - range: string - required: true - - requirement: - description: Whether the column is required, recommended, or optional - range: RequirementLevel - required: true - - type: - description: Data type for column values - range: ColumnType - required: false - ifabsent: string(string) - - cardinality: - description: Whether single or multiple values are allowed per cell - range: Cardinality - required: false - ifabsent: string(single) - - allow_not_applicable: - description: Allow "not applicable" as valid value - range: boolean - required: false - ifabsent: "false" - - allow_not_available: - description: Allow "not available" as valid value - range: boolean - required: false - ifabsent: "false" - - allow_pooled: - description: Allow "pooled" as valid value (typically for biological replicate) - range: boolean - required: false - ifabsent: "false" - - validators: - description: Column-level validators for value checking - range: Validator - multivalued: true - required: false - inlined: true - inlined_as_list: true - - Validator: - description: >- - Validation rule that checks SDRF content for compliance. - Can be applied at template level (entire file) or column level (specific values). - attributes: - validator_name: - description: Type of validator to apply - range: ValidatorType - required: true - - params: - description: Validator-specific parameters - range: ValidatorParams - required: true - inlined: true - - ValidatorParams: - description: >- - Parameters for validators. Different validators use different subsets. - This is a flexible container for all possible validator parameters. - attributes: - # For min_columns validator - min_columns: - description: Minimum number of columns required - range: integer - required: false - - # For ontology validator - ontologies: - description: List of ontology prefixes to validate against - range: OntologyPrefix - multivalued: true - required: false - - error_level: - description: Severity level for validation failures - range: ErrorLevel - required: false - ifabsent: string(error) - - # For pattern validator - pattern: - description: Regular expression pattern for validation - range: string - required: false - examples: - - value: "^\\d+$" - - value: "^[A-Za-z0-9_-]+$" - - case_sensitive: - description: Whether pattern matching is case sensitive - range: boolean - required: false - ifabsent: "true" - - # For values validator - values: - description: List of allowed values - range: string - multivalued: true - required: false - - # For combination_of_columns_no_duplicate_validator - column_name: - description: Column names that must have unique combinations (error level) - range: string - multivalued: true - required: false - - column_name_warning: - description: Column names that should have unique combinations (warning level) - range: string - multivalued: true - required: false - - # Documentation fields (used for all validators) - description: - description: Human-readable description of the validation rule - range: string - required: false - - examples: - description: Example valid values - range: string - multivalued: true - required: false - -# ============================================================================= -# SCHEMA METADATA -# ============================================================================= - -slots: - name: - description: Identifier name - range: string - - description: - description: Human-readable description - range: string - - version: - description: Semantic version - range: string - -types: - semver: - uri: xsd:string - base: str - description: Semantic version string (e.g., 1.0.0) - pattern: "^\\d+\\.\\d+\\.\\d+(-[a-z0-9]+)?$" diff --git a/sdrf-proteomics/templates/single-cell/README.adoc b/sdrf-proteomics/templates/single-cell/README.adoc index 1f9829030..f4be032c4 100644 --- a/sdrf-proteomics/templates/single-cell/README.adoc +++ b/sdrf-proteomics/templates/single-cell/README.adoc @@ -551,9 +551,9 @@ Following the Nature Methods SCP guidelines: [[template-file]] == Template File -The single cell proteomics SDRF template file is available in this directory: +The single cell proteomics SDRF template file is available in the sdrf-templates repository: -- link:single-cell-template.sdrf.tsv[single-cell-template.sdrf.tsv] +- link:https://github.com/bigbio/sdrf-templates/blob/main/single-cell/1.0.0/single-cell.sdrf.tsv[single-cell.sdrf.tsv] [[validation]] == Validation diff --git a/sdrf-proteomics/templates/somascan/README.adoc b/sdrf-proteomics/templates/somascan/README.adoc index 8ed4afb4a..5562f7b26 100644 --- a/sdrf-proteomics/templates/somascan/README.adoc +++ b/sdrf-proteomics/templates/somascan/README.adoc @@ -318,7 +318,9 @@ parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv --template somascan [[template-file]] == Template File -- link:somascan-template.sdrf.tsv[somascan-template.sdrf.tsv] +The SomaScan SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/somascan/1.0.0/somascan.sdrf.tsv[somascan.sdrf.tsv] [[related-templates]] == Related Templates diff --git a/sdrf-proteomics/templates/vertebrates/README.adoc b/sdrf-proteomics/templates/vertebrates/README.adoc index c34705a06..f5296f2ff 100644 --- a/sdrf-proteomics/templates/vertebrates/README.adoc +++ b/sdrf-proteomics/templates/vertebrates/README.adoc @@ -154,6 +154,13 @@ pip install sdrf-pipelines sdrf_validate --sdrf_file your_file.sdrf.tsv --template vertebrates ---- +[[template-file]] +== Template File + +The vertebrates SDRF template file is available in the sdrf-templates repository: + +- link:https://github.com/bigbio/sdrf-templates/blob/main/vertebrates/1.1.0/vertebrates.sdrf.tsv[vertebrates.sdrf.tsv] + [[examples]] == Examples