You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[ ] Does the title of this Pull Request reference the corresponding Issue?
7
-
-[ ] Is the branch validating against pre-commit hooks? Run `pre-commit run --all-files` from the root directory.
8
-
-[ ] Is the branch passing tests? Run `pytest tests/` from the root directory.
5
+
## Pull Request checklist
9
6
10
-
If the schema or examples were contributed to:
11
-
-[ ] Were the schema def/ and json/ files recompiled and committed? Run `cd schema; make all` from the root directory.
12
-
-[ ] If constraints or recipes were added, have they been added to the readthedocs? To do so, you can revise the appropriate file within `docs/source/concepts/`.
13
-
-[ ] Has documentation been regenerated and committed? Run `cd docs; make clean watch &` from the root directory to compile documentation.
14
-
-[ ] Have tests been created or updated?
7
+
### Required
8
+
-[ ] The title of this Pull Request accurately reflects the scope and content of the linked Issue.
9
+
-[ ] The branch passes all pre-commit hooks (Run `pre-commit run --all-files` from the root directory).
10
+
-[ ] The branch passes all tests (Run `pytest tests/` from the root directory).
11
+
12
+
### Required if the schema or examples were contributed to
13
+
-[ ] The schema `def/` and `json/` files have been recompiled and committed (Run `cd schema; make all` from the root directory).
14
+
-[ ] Tests have been created or updated.
15
+
-[ ] Schema changes have been documented (existing files updated or new files created in `docs/source/`).
16
+
-[ ] Any new schema definition `.rst` files have been registered in the documentation structure.
17
+
-[ ] Documentation has been regenerated and committed (Run `cd docs; make clean watch &` from the root directory to compile documentation).
Copy file name to clipboardExpand all lines: docs/source/appendices/design_decisions.rst
+40-51Lines changed: 40 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
.. _design-decisions:
2
2
3
3
Design Decisions
4
-
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
4
+
!!!!!!!!!!!!!!!!
5
5
6
6
Cat-VRS contributors confronted numerous trade-offs in developing this specification. As these trade-offs may not be apparent to outside readers, this section highlights the most significant ones and the rationale for our design decisions, including the following.
7
7
@@ -21,17 +21,9 @@ Decisions are labeled based on their maturity status based on the :ref:`maturity
21
21
Because maturity is a function of (1) the breadth of model adoption and (2) expected stability, rather than a function of how fundamental a concept is to the model, the maturity status property is entirely orthogonal to the impact of a decision on Cat-VRS.
22
22
23
23
24
-
.. toctree::
25
-
:maxdepth:3
26
-
:includehidden:
27
-
28
-
major_impact
29
-
medium_impact
30
-
minor_impact
31
-
general_principles
32
-
33
-
34
-
24
+
.. contents::
25
+
:local:
26
+
:depth: 1
35
27
36
28
37
29
.. major_impact
@@ -42,7 +34,8 @@ Major Impact
42
34
43
35
.. hyperintensional_catvars
44
36
45
-
**Treatment of CatVars as ((Hyper)intensional) Set-Theoretic Objects**
37
+
Treatment of CatVars as ((Hyper)intensional) Set-Theoretic Objects
The group decided to model categorical variants as `hyperintensional <https://plato.stanford.edu/entries/hyperintensionality/>`_ set objects to address the complexities of categorical data representation.
@@ -77,7 +70,8 @@ For example, an extensional set describing *BRAF* p.V600E would need to include
77
70
78
71
.. constraint_model
79
72
80
-
**Adoption of a Constraint-Based Model Instead of a Fixed Top-Down Typology of Data Classes**
73
+
Adoption of a Constraint-Based Model Instead of a Fixed Top-Down Typology of Data Classes
The group decided to use a `constraint-based model <https://github.com/ga4gh/cat-vrs/discussions/22>`_, defining categorical variants dynamically in a bottom-up fashion based on set constraints rather than in a rigid top-down hierarchy of variant types.
@@ -111,7 +105,8 @@ Medium Impact
111
105
112
106
.. constraint_array_of_anded_elements
113
107
114
-
**Constraints as an Array of implicitly ANDed elements**
108
+
Constraints as an Array of implicitly ANDed elements
The group decided that the individual *constraints* in the array of the constraints property are to be treated as implicitly ANDed together, and that no other boolean relations should be used in the context of the *CategoricalVariant* data class.
@@ -129,7 +124,8 @@ One property of the base *CategoricalVariant* class in the constraint model is c
129
124
130
125
.. including_recipes
131
126
132
-
**Including Recipes in the Cat-VRS Specification**
127
+
Including Recipes in the Cat-VRS Specification
128
+
==============================================
133
129
134
130
**Decision:**
135
131
The group decided to include recipes in Cat-VRS which illustrate representation of genomic variant types under the constraint model.
@@ -147,7 +143,8 @@ It is intended that implementations of Cat-VRS will allow for variants to be sea
147
143
148
144
.. machine_readable_spec
149
145
150
-
**Machine Readable Specifications**
146
+
Machine Readable Specifications
147
+
===============================
151
148
152
149
**Decision:**
153
150
The group decided to adopt several repository and organizational conventions to ensure a single source of truth during development and ensure that the schema is readily computable:
@@ -169,7 +166,8 @@ These decisions bring Cat-VRS development in line with accepted best practices i
169
166
170
167
.. separating_copycount_and_copychange
171
168
172
-
**Separating CopyNumberConstraint into CopyCountConstraint and CopyChangeConstraint**
169
+
Separating CopyNumberConstraint into CopyCountConstraint and CopyChangeConstraint
The original model had a single copy number constraint, which was later split into two distinct constraints: the *CopyCountConstraint* (absolute copy numbers) and *CopyChangeConstraint* (relative changes such as amplifications and deletions).
@@ -190,7 +188,8 @@ Separating these two constraints ensures greater precision in representing categ
190
188
191
189
.. separating_definingallele_and_defininglocation
192
190
193
-
**Separating DefiningContextConstraint into DefiningAlelleConstraint and DefiningLocationConstraint**
191
+
Separating DefiningContextConstraint into DefiningAlelleConstraint and DefiningLocationConstraint
The group decided to split up the single combined *DefiningContextConstraint* into a *DefiningAlleleConstraint* and separate *DefiningLocationConstraint*.
@@ -204,45 +203,38 @@ This decision was driven by three primary considerations: (1) the need for great
204
203
205
204
#. **Compatibility with existing genomic standards:** Existing GKS standards like VRS and knowledgebases like ClinVar treat sequence (location-state) variants and location variants separately. A single *DefiningContextConstraint* was somewhat misaligned with these models, making interoperability more challenging.
206
205
207
-
208
206
Splitting this constraint allows the model to explicitly define variants based on location, sequence, or both while allowing for smoother integration across implementations by mirroring representation in other well established resources.
209
207
210
208
**Citations:**
211
209
212
210
* `2024-11-19 meeting minutes <https://docs.google.com/document/d/1oI4ir4OzXFvhZNbMVEX-RHGAQ-d2K4lAKP-7lf-uzPc/edit?tab=t.0#heading=h.hd9lu8gw3jh9>`_, this was primarily discussed in person during a pre-conference hackathon before ASHG
213
211
214
-
215
-
216
212
.. using_gks_maturity_model
217
213
218
-
**Utilization of semantic versioning and the GKS maturity model**
214
+
Utilization of semantic versioning and the GKS maturity model
The specification originally proposed a *GeneContextConstraint* to capture variation knowledge tied to a specific gene, but this constraint was later broadened into a *FeatureContextConstraint* to include regulatory elements, pseudogenes, and other sequence-related features.
243
237
244
-
245
-
246
238
**Rationale:**
247
239
248
240
This change was necessary to generalize the model and improve modularity, ensuring that Cat-VRS supports diverse genomic elements beyond strictly defined genes. It also aligns better with other genomic standardization efforts and accommodates structural variants that do not map directly to specific genes; for example, protein contexts such as “Estrogen Receptor (ER)”. Furthermore, FeatureContext better allows for catvar harmonization across different gene name-space conventions, as these change over time and between organizations. For example, in an older refseq version, *DUXL4* was considered as pseudogene, but in the current refseq version it is not recognized as a gene (or pseudogene) at all.
@@ -267,7 +259,8 @@ Minor Impact
267
259
268
260
.. relations_and_mappings
269
261
270
-
**Distinction between Relations and Mappings**
262
+
Distinction between Relations and Mappings
263
+
==========================================
271
264
272
265
**Decision:**
273
266
Relations refer to structured transformations to the underlying variant, such as translating a transcript sequence into an amino acid sequence. Mappings refer to homomorphisms of coded variant concepts between different codings systems and ontologies, for example, mapping the property of protein gain-of-function EFO code to that of a protein hypermorphism in SO.
@@ -288,7 +281,8 @@ The group followed existing practices in other GKS standards for relations and m
288
281
289
282
.. members_are_non-exhaustive
290
283
291
-
**Inclusion of Members as non-exhaustive array of contextual variants**
284
+
Inclusion of Members as non-exhaustive array of contextual variants
Items in the *members* property constitute representative examples of GA4GH Variation Representation Specification (VRS) Variations that satisfy the constraints of a given categorical variant. It is neither required nor expected for *members* to contain an exhaustive list of representative VRS variants.
@@ -309,7 +303,8 @@ Because catvars are `defined by their properties (constraints), <https://docs.go
309
303
310
304
.. name_as_a_non-required_field
311
305
312
-
**Name as a non-required field**
306
+
Name as a non-required field
307
+
============================
313
308
314
309
**Decision:**
315
310
The *name* property in the *CategoricalVariant* class is an optional (but not required) field for *CategoricalVariant*.
@@ -322,7 +317,8 @@ The *name* property is a string field, and is intended to hold a *name* for a ca
322
317
323
318
.. profiles_to_recipes
324
319
325
-
**Renaming “Profiles” to “Recipes” to represent standard categorical variants templates**
320
+
Renaming “Profiles” to “Recipes” to represent standard categorical variants templates
The classification of functional impact on protein structure in the FunctionConstraint was standardized using terms like hypermorphic, amorphic, neomorphic, and antimorphic (based on `Müller’s morphs <https://en.wikipedia.org/wiki/Muller%27s_morphs>`_), rather than terms like "gain-of-function" or "loss-of-function".
@@ -351,8 +348,6 @@ This approach provides a more structured, ontology code-backed classification. A
351
348
352
349
We recognize that this terminology is inconsistent with current colloquial use of gain-of-function and loss-of-function descriptors. `A Discussion <https://github.com/ga4gh/cat-vrs/discussions/54>`_ was created on the Cat-VRS GitHub repository on October 6th, 2024 to promote discussion around this design decision. This decision will further be interrogated when this constraint is nominated to Trial Use as part of a GKS review ballot.
353
350
354
-
355
-
356
351
**Citations:**
357
352
358
353
* `"Terminology for function changes" GitHub Discussion <https://github.com/ga4gh/cat-vrs/discussions/23>`_
@@ -362,38 +357,32 @@ We recognize that this terminology is inconsistent with current colloquial use o
362
357
* `“Handling Function Variants” GitHub Issue <https://github.com/ga4gh/cat-vrs/issues/14>`_
363
358
* `“Generalizing Canonical allele and Categorical CNV to handle function / expression variants” GitHub Issue <https://github.com/ga4gh/cat-vrs/discussions/16>`_
364
359
365
-
366
-
367
-
368
-
369
360
.. mappable_concepts_for_relations
370
361
371
-
**Integration of Mappable Concepts for Variant Relations**
362
+
Integration of Mappable Concepts for Variant Relations
For the relations property in the DefiningAlleleConstraint and DefiningLocationConstraint, the group decided to remove the explicit enum of possible relation methods (such as translates_to and translates_from) and instead refer to the :ref:`MappableConcept` data class.
375
367
376
-
377
-
378
368
**Rationale:**
379
369
This decision was made for a number of reasons: First, it is more consistent with `DRY <https://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_ best practices to have a single mechanism to handle relations rather than repeating lists of them multiple times throughout the specification. Second, the *gks.core:MappableConcept* class is a general-purpose data structure that holds codings of a concept and maps them to codings within other systems within a standardized way. Therefore, regardless of which coded methods are used by an implementation to relate one version of a variant to another, containerizing these coded methods in the *gks.core:MappableConcept* should make them easier to map to other coding systems.
380
370
381
-
382
371
**Citations:**
383
372
384
373
* `“Should relation or relations be renamed?” GitHub discussion <https://github.com/ga4gh/cat-vrs/discussions/100>`_
**Error handling is intentionally unspecified and delegated to implementation.**
394
-
Cat-VRS provides foundational data types that enable significant flexibility. Except where required by this specification, implementations may choose whether and how to validate data. For example, implementations MAY choose to validate that particular combinations of objects are compatible, but such validation is not required.
378
+
Error handling is intentionally unspecified and delegated to implementation.
Cat-VRS provides foundational data types that enable significant flexibility. Except where required by this specification, implementations may choose whether and how to validate data. For example, implementations MAY choose to validate that particular combinations of objects are compatible, but such validation is not required.
396
382
397
383
.. Text_Case
398
384
385
+
Text casing
386
+
===========
387
+
399
388
**Cat-VRS uses** `PascalCase (a.k.a. CamelCaps) <https://simple.wikipedia.org/wiki/CamelCase>`__ **to represent compound words and** `snake_case <https://simple.wikipedia.org/wiki/Snake_case>`__ **to represent compound file names** Although the schema is currently JSON-based (which would typically use camelCase), Cat-VRS itself is intended to be neutral with respect to languages and database.
0 commit comments