Skip to content

Commit 35d388e

Browse files
Merge remote-tracking branch 'origin/v1' into 191-documentation-refactor
2 parents 5617bd2 + b3e61ca commit 35d388e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+1783
-135
lines changed

.github/pull_request_template.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
1-
Link to the corresponding Issue.
1+
## Link to the corresponding Issue
22

3-
Summary of the Pull Request.
3+
## Summary of the Pull Request
44

5-
Pull Request checklist:
6-
- [ ] Does the title of this Pull Request reference the corresponding Issue?
7-
- [ ] Is the branch validating against pre-commit hooks? Run `pre-commit run --all-files` from the root directory.
8-
- [ ] Is the branch passing tests? Run `pytest tests/` from the root directory.
5+
## Pull Request checklist
96

10-
If the schema or examples were contributed to:
11-
- [ ] Were the schema def/ and json/ files recompiled and committed? Run `cd schema; make all` from the root directory.
12-
- [ ] If constraints or recipes were added, have they been added to the readthedocs? To do so, you can revise the appropriate file within `docs/source/concepts/`.
13-
- [ ] Has documentation been regenerated and committed? Run `cd docs; make clean watch &` from the root directory to compile documentation.
14-
- [ ] Have tests been created or updated?
7+
### Required
8+
- [ ] The title of this Pull Request accurately reflects the scope and content of the linked Issue.
9+
- [ ] The branch passes all pre-commit hooks (Run `pre-commit run --all-files` from the root directory).
10+
- [ ] The branch passes all tests (Run `pytest tests/` from the root directory).
11+
12+
### Required if the schema or examples were contributed to
13+
- [ ] The schema `def/` and `json/` files have been recompiled and committed (Run `cd schema; make all` from the root directory).
14+
- [ ] Tests have been created or updated.
15+
- [ ] Schema changes have been documented (existing files updated or new files created in `docs/source/`).
16+
- [ ] Any new schema definition `.rst` files have been registered in the documentation structure.
17+
- [ ] Documentation has been regenerated and committed (Run `cd docs; make clean watch &` from the root directory to compile documentation).

.gitmodules

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
[submodule "submodules/vrs"]
22
path = submodules/vrs
33
url = https://github.com/ga4gh/vrs.git
4-
branch = 2.0
4+
branch = v2

.readthedocs.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ version: 2
33
build:
44
os: "ubuntu-22.04"
55
tools:
6-
python: "3.11"
6+
python: "3.12"
77

88
submodules:
99
include: all
@@ -14,4 +14,4 @@ sphinx:
1414

1515
python:
1616
install:
17-
- requirements: docs/source/requirements.txt
17+
- requirements: .requirements.txt

.requirements.txt

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,15 @@
1-
pytest
2-
sphinx ~= 7.2
3-
sphinx-rtd-theme ~= 1.2
4-
pyyaml
5-
ga4gh.gks.metaschema==0.3.2
1+
# Schema validation
2+
ga4gh.gks.metaschema == 0.3.2
63
jsonschema
4+
pyyaml
75
referencing
6+
7+
# Development and testing
8+
pytest
89
pre-commit
10+
11+
# ReadtheDocs
12+
jinja2 == 3.1.6
13+
sphinx ~= 8.1.3
14+
sphinx-rtd-theme ~= 3.0.2
15+
sphinx_toolbox == 3.9.0

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Categorical Variation Representation Specification (Cat-VRS)
22

3-
[![Read the Docs](https://img.shields.io/readthedocs/vr-spec/1.1)](https://cat-vrs.readthedocs.io/en/latest/)
3+
[![Read the Docs](https://img.shields.io/readthedocs/cat-vrs)](https://cat-vrs.readthedocs.io/en/latest/)
44

55
**Cat-VRS 1.0.0 Trial Use Review November 2024 - join in [here](https://github.com/ga4gh/cat-vrs/discussions/86)**!
66

docs/source/appendices/design_decisions.rst

Lines changed: 40 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _design-decisions:
22

33
Design Decisions
4-
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
4+
!!!!!!!!!!!!!!!!
55

66
Cat-VRS contributors confronted numerous trade-offs in developing this specification. As these trade-offs may not be apparent to outside readers, this section highlights the most significant ones and the rationale for our design decisions, including the following.
77

@@ -21,17 +21,9 @@ Decisions are labeled based on their maturity status based on the :ref:`maturity
2121
Because maturity is a function of (1) the breadth of model adoption and (2) expected stability, rather than a function of how fundamental a concept is to the model, the maturity status property is entirely orthogonal to the impact of a decision on Cat-VRS.
2222

2323

24-
.. toctree::
25-
:maxdepth: 3
26-
:includehidden:
27-
28-
major_impact
29-
medium_impact
30-
minor_impact
31-
general_principles
32-
33-
34-
24+
.. contents::
25+
:local:
26+
:depth: 1
3527

3628

3729
.. major_impact
@@ -42,7 +34,8 @@ Major Impact
4234

4335
.. hyperintensional_catvars
4436
45-
**Treatment of CatVars as ((Hyper)intensional) Set-Theoretic Objects**
37+
Treatment of CatVars as ((Hyper)intensional) Set-Theoretic Objects
38+
==================================================================
4639

4740
**Decision:**
4841
The group decided to model categorical variants as `hyperintensional <https://plato.stanford.edu/entries/hyperintensionality/>`_ set objects to address the complexities of categorical data representation.
@@ -77,7 +70,8 @@ For example, an extensional set describing *BRAF* p.V600E would need to include
7770

7871
.. constraint_model
7972
80-
**Adoption of a Constraint-Based Model Instead of a Fixed Top-Down Typology of Data Classes**
73+
Adoption of a Constraint-Based Model Instead of a Fixed Top-Down Typology of Data Classes
74+
=========================================================================================
8175

8276
**Decision:**
8377
The group decided to use a `constraint-based model <https://github.com/ga4gh/cat-vrs/discussions/22>`_, defining categorical variants dynamically in a bottom-up fashion based on set constraints rather than in a rigid top-down hierarchy of variant types.
@@ -111,7 +105,8 @@ Medium Impact
111105

112106
.. constraint_array_of_anded_elements
113107
114-
**Constraints as an Array of implicitly ANDed elements**
108+
Constraints as an Array of implicitly ANDed elements
109+
====================================================
115110

116111
**Decision:**
117112
The group decided that the individual *constraints* in the array of the constraints property are to be treated as implicitly ANDed together, and that no other boolean relations should be used in the context of the *CategoricalVariant* data class.
@@ -129,7 +124,8 @@ One property of the base *CategoricalVariant* class in the constraint model is c
129124

130125
.. including_recipes
131126
132-
**Including Recipes in the Cat-VRS Specification**
127+
Including Recipes in the Cat-VRS Specification
128+
==============================================
133129

134130
**Decision:**
135131
The group decided to include recipes in Cat-VRS which illustrate representation of genomic variant types under the constraint model.
@@ -147,7 +143,8 @@ It is intended that implementations of Cat-VRS will allow for variants to be sea
147143

148144
.. machine_readable_spec
149145
150-
**Machine Readable Specifications**
146+
Machine Readable Specifications
147+
===============================
151148

152149
**Decision:**
153150
The group decided to adopt several repository and organizational conventions to ensure a single source of truth during development and ensure that the schema is readily computable:
@@ -169,7 +166,8 @@ These decisions bring Cat-VRS development in line with accepted best practices i
169166

170167
.. separating_copycount_and_copychange
171168
172-
**Separating CopyNumberConstraint into CopyCountConstraint and CopyChangeConstraint**
169+
Separating CopyNumberConstraint into CopyCountConstraint and CopyChangeConstraint
170+
=================================================================================
173171

174172
**Decision:**
175173
The original model had a single copy number constraint, which was later split into two distinct constraints: the *CopyCountConstraint* (absolute copy numbers) and *CopyChangeConstraint* (relative changes such as amplifications and deletions).
@@ -190,7 +188,8 @@ Separating these two constraints ensures greater precision in representing categ
190188

191189
.. separating_definingallele_and_defininglocation
192190
193-
**Separating DefiningContextConstraint into DefiningAlelleConstraint and DefiningLocationConstraint**
191+
Separating DefiningContextConstraint into DefiningAlelleConstraint and DefiningLocationConstraint
192+
=================================================================================================
194193

195194
**Decision:**
196195
The group decided to split up the single combined *DefiningContextConstraint* into a *DefiningAlleleConstraint* and separate *DefiningLocationConstraint*.
@@ -204,45 +203,38 @@ This decision was driven by three primary considerations: (1) the need for great
204203

205204
#. **Compatibility with existing genomic standards:** Existing GKS standards like VRS and knowledgebases like ClinVar treat sequence (location-state) variants and location variants separately. A single *DefiningContextConstraint* was somewhat misaligned with these models, making interoperability more challenging.
206205

207-
208206
Splitting this constraint allows the model to explicitly define variants based on location, sequence, or both while allowing for smoother integration across implementations by mirroring representation in other well established resources.
209207

210208
**Citations:**
211209

212210
* `2024-11-19 meeting minutes <https://docs.google.com/document/d/1oI4ir4OzXFvhZNbMVEX-RHGAQ-d2K4lAKP-7lf-uzPc/edit?tab=t.0#heading=h.hd9lu8gw3jh9>`_, this was primarily discussed in person during a pre-conference hackathon before ASHG
213211

214-
215-
216212
.. using_gks_maturity_model
217213
218-
**Utilization of semantic versioning and the GKS maturity model**
214+
Utilization of semantic versioning and the GKS maturity model
215+
=============================================================
219216

220217
**Decision:**
221218
The group decided to adopt standard semantic versioning practices and to indicate data class maturity in compliance with the :ref:`maturity-model`.
222219

223-
224220
**Rationale:**
225221

226222
These decisions bring Cat-VRS in compliance with generally accepted best practices in the GKS workstream and improve transparency.
227223

228224
**Citations:**
229225

230-
231226
* `2023-10-25 meeting minutes <https://docs.google.com/document/d/1oI4ir4OzXFvhZNbMVEX-RHGAQ-d2K4lAKP-7lf-uzPc/edit?tab=t.0#heading=h.8xxp7lqoun48>`_
232227

233228
* `2023-10-11 meeting minutes <https://docs.google.com/document/d/1oI4ir4OzXFvhZNbMVEX-RHGAQ-d2K4lAKP-7lf-uzPc/edit?tab=t.0#heading=h.cmwm638mk3jb>`_
234229

235-
236-
237230
.. generalizing_genecontextconstraint
238231
239-
**Generalization of GeneContextConstriant into FeatureContextConstraint**
232+
Generalization of GeneContextConstriant into FeatureContextConstraint
233+
=====================================================================
240234

241235
**Decision:**
242236
The specification originally proposed a *GeneContextConstraint* to capture variation knowledge tied to a specific gene, but this constraint was later broadened into a *FeatureContextConstraint* to include regulatory elements, pseudogenes, and other sequence-related features.
243237

244-
245-
246238
**Rationale:**
247239

248240
This change was necessary to generalize the model and improve modularity, ensuring that Cat-VRS supports diverse genomic elements beyond strictly defined genes. It also aligns better with other genomic standardization efforts and accommodates structural variants that do not map directly to specific genes​; for example, protein contexts such as “Estrogen Receptor (ER)”. Furthermore, FeatureContext better allows for catvar harmonization across different gene name-space conventions, as these change over time and between organizations. For example, in an older refseq version, *DUXL4* was considered as pseudogene, but in the current refseq version it is not recognized as a gene (or pseudogene) at all.
@@ -267,7 +259,8 @@ Minor Impact
267259

268260
.. relations_and_mappings
269261
270-
**Distinction between Relations and Mappings**
262+
Distinction between Relations and Mappings
263+
==========================================
271264

272265
**Decision:**
273266
Relations refer to structured transformations to the underlying variant, such as translating a transcript sequence into an amino acid sequence. Mappings refer to homomorphisms of coded variant concepts between different codings systems and ontologies, for example, mapping the property of protein gain-of-function EFO code to that of a protein hypermorphism in SO.
@@ -288,7 +281,8 @@ The group followed existing practices in other GKS standards for relations and m
288281

289282
.. members_are_non-exhaustive
290283
291-
**Inclusion of Members as non-exhaustive array of contextual variants**
284+
Inclusion of Members as non-exhaustive array of contextual variants
285+
===================================================================
292286

293287
**Decision:**
294288
Items in the *members* property constitute representative examples of GA4GH Variation Representation Specification (VRS) Variations that satisfy the constraints of a given categorical variant. It is neither required nor expected for *members* to contain an exhaustive list of representative VRS variants.
@@ -309,7 +303,8 @@ Because catvars are `defined by their properties (constraints), <https://docs.go
309303

310304
.. name_as_a_non-required_field
311305
312-
**Name as a non-required field**
306+
Name as a non-required field
307+
============================
313308

314309
**Decision:**
315310
The *name* property in the *CategoricalVariant* class is an optional (but not required) field for *CategoricalVariant*.
@@ -322,7 +317,8 @@ The *name* property is a string field, and is intended to hold a *name* for a ca
322317

323318
.. profiles_to_recipes
324319
325-
**Renaming “Profiles” to “Recipes” to represent standard categorical variants templates**
320+
Renaming “Profiles” to “Recipes” to represent standard categorical variants templates
321+
=====================================================================================
326322

327323
**Decision:**
328324
:ref:`recipes` were originally called Profiles, but the group decided to change the name to the current Recipes.
@@ -340,7 +336,8 @@ The term `profile is already used within the Variant Annotation Specification (V
340336

341337
.. function_variants_mullers_morphs
342338
343-
**Handling of Function Variants using Müller's Morphs**
339+
Handling of Function Variants using Müller's Morphs
340+
===================================================
344341

345342
**Decision:**
346343
The classification of functional impact on protein structure in the FunctionConstraint was standardized using terms like hypermorphic, amorphic, neomorphic, and antimorphic (based on `Müller’s morphs <https://en.wikipedia.org/wiki/Muller%27s_morphs>`_), rather than terms like "gain-of-function" or "loss-of-function".
@@ -351,8 +348,6 @@ This approach provides a more structured, ontology code-backed classification. A
351348

352349
We recognize that this terminology is inconsistent with current colloquial use of gain-of-function and loss-of-function descriptors. `A Discussion <https://github.com/ga4gh/cat-vrs/discussions/54>`_ was created on the Cat-VRS GitHub repository on October 6th, 2024 to promote discussion around this design decision. This decision will further be interrogated when this constraint is nominated to Trial Use as part of a GKS review ballot.
353350

354-
355-
356351
**Citations:**
357352

358353
* `"Terminology for function changes" GitHub Discussion <https://github.com/ga4gh/cat-vrs/discussions/23>`_
@@ -362,38 +357,32 @@ We recognize that this terminology is inconsistent with current colloquial use o
362357
* `“Handling Function Variants” GitHub Issue <https://github.com/ga4gh/cat-vrs/issues/14>`_
363358
* `“Generalizing Canonical allele and Categorical CNV to handle function / expression variants” GitHub Issue <https://github.com/ga4gh/cat-vrs/discussions/16>`_
364359

365-
366-
367-
368-
369360
.. mappable_concepts_for_relations
370361
371-
**Integration of Mappable Concepts for Variant Relations**
362+
Integration of Mappable Concepts for Variant Relations
363+
======================================================
372364

373365
**Decision:**
374366
For the relations property in the DefiningAlleleConstraint and DefiningLocationConstraint, the group decided to remove the explicit enum of possible relation methods (such as translates_to and translates_from) and instead refer to the :ref:`MappableConcept` data class.
375367

376-
377-
378368
**Rationale:**
379369
This decision was made for a number of reasons: First, it is more consistent with `DRY <https://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_ best practices to have a single mechanism to handle relations rather than repeating lists of them multiple times throughout the specification. Second, the *gks.core:MappableConcept* class is a general-purpose data structure that holds codings of a concept and maps them to codings within other systems within a standardized way. Therefore, regardless of which coded methods are used by an implementation to relate one version of a variant to another, containerizing these coded methods in the *gks.core:MappableConcept* should make them easier to map to other coding systems.
380370

381-
382371
**Citations:**
383372

384373
* `“Should relation or relations be renamed?” GitHub discussion <https://github.com/ga4gh/cat-vrs/discussions/100>`_
385374
* `2025-02-05 meeting minutes <https://docs.google.com/document/d/1oI4ir4OzXFvhZNbMVEX-RHGAQ-d2K4lAKP-7lf-uzPc/edit?tab=t.0#heading=h.ujjbabr6rnl>`_
386375

387-
388-
389-
390-
391376
.. Error_Handling
392377
393-
**Error handling is intentionally unspecified and delegated to implementation.**
394-
Cat-VRS provides foundational data types that enable significant flexibility. Except where required by this specification, implementations may choose whether and how to validate data. For example, implementations MAY choose to validate that particular combinations of objects are compatible, but such validation is not required.
378+
Error handling is intentionally unspecified and delegated to implementation.
379+
============================================================================
395380

381+
Cat-VRS provides foundational data types that enable significant flexibility. Except where required by this specification, implementations may choose whether and how to validate data. For example, implementations MAY choose to validate that particular combinations of objects are compatible, but such validation is not required.
396382

397383
.. Text_Case
398384
385+
Text casing
386+
===========
387+
399388
**Cat-VRS uses** `PascalCase (a.k.a. CamelCaps) <https://simple.wikipedia.org/wiki/CamelCase>`__ **to represent compound words and** `snake_case <https://simple.wikipedia.org/wiki/Snake_case>`__ **to represent compound file names** Although the schema is currently JSON-based (which would typically use camelCase), Cat-VRS itself is intended to be neutral with respect to languages and database.

docs/source/appendices/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
.. _appendices:
2+
23
Appendices
34
!!!!!!!!!!
45

@@ -8,3 +9,4 @@ Appendices
89
maturity_model
910
design_decisions
1011
roadmap
12+
hyperintensional_catvars

docs/source/appendices/roadmap.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
.. _roadmap:
2+
23
Product Roadmap
34
!!!!!!!!!!!!!!!
45

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
.. _additionalDataTypes:
2+
3+
Additional Data Types
4+
@@@@@@@@@@@@@@@@@@@@@
5+
6+
Below are the additional data types used by the Cat-VRS models.
7+
8+
.. _FunctionalDomain:
9+
10+
FunctionalDomain
11+
##################
12+
13+
The FunctionalDomain class is used to populate the `functionalDomains` property within the :ref:`AdjacencyConstraint`. It is intended to represent `Functional Domains <https://fusions.cancervariants.org/en/latest/information_model.html#categorical-elements>`_ from the VICC Gene Fusion Specification.
14+
15+
.. include:: ../def/cat-vrs/FunctionalDomain.rst
16+
17+
.. _UnspecifiedElement:
18+
19+
UnspecifiedElement
20+
##################
21+
22+
The UnspecifiedElement class is an available item to populate the `adjoinedElements` property within the :ref:`AdjacencyConstraint`. It is intended to represent both the `Multiple Possible Gene Component <https://fusions.cancervariants.org/en/latest/nomenclature.html#multiple-possible-gene-component>`_ and `Unknown Gene Component <https://fusions.cancervariants.org/en/latest/nomenclature.html#unknown-gene-component>`_ from the VICC Gene Fusion Specification.
23+
24+
.. include:: ../def/cat-vrs/UnspecifiedElement.rst
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. _ConceptSet:
2+
3+
ConceptSet
4+
!!!!!!!!!!!
5+
6+
.. include:: ../../def/gks-core/ConceptSet.rst

0 commit comments

Comments
 (0)