-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Some notes on extensions and standardisation for attribute-value set/pairs/whatever-you-want-to-call-them.
Add a type field to disambiguate between different types of value that may be stored and to allow extra validation.
The type should come from an enum so that we can control what's going in the field. Ideally the value in the attribute would dictate what appears in the type field, and we can use existing ontologies/controlled vocabularies to automatically populate it.
Examples:
properties:
- attribute:
id: MIXS:0000117
label: total phosphorous
raw_value: 2.2 ppm
unit: ppm
numeric_value: 2.2
type: float
- attribute:
id: MIXS:0000011
label: collection date
raw_value: 12 Jun 2025
value: 2025-06-12
type: iso_datetime
- attribute:
label: n people on railway track
value: 5
type: integer
- attribute:
id: MIXS:0000012
label: env_broad_scale
value: terrestrial biome
value_cv_id: ENVO:00000446
type: cv_term # controlled vocab term
- attribute:
label: smell
value: completely disgusting
type: text
- attribute:
label: size_of_bear
value: big
type: BearSizeEnum
- attribute:
label: random json data
value: '{"this": "that", "the other": [1,3,5]}'
type: jsonEventually this could be a form of data validation provided by BERtron itself on data integration; in these early stages, we would have to rely on data providers to add the appropriate type fields.
Capture units as text and as ontology IDs
Not everyone has an encyclopaedic knowledge of the unit ontology (more's the pity), so whilst it is useful to be able to use a controlled vocab to express units, it is not very user friendly. Providing both the controlled vocab ID for the unit and the text string might be a good compromise.
For example:
value: 2.2
unit: UO:0000008would become
value: 2.2
unit: meter
unit_cv_id: UO:0000008At a future time point, it would be good if the labels were populated on ingest into BERtron using preloaded reference ontologies. For now, we will have to rely on data providers to have the correct term names and term IDs.
Standardise representation of ontology IDs and their labels
attribute is split into label and id, whilst value has an accompanying value_cv_id sibling. Choose one representation or the other.
- attribute:
id: MIXS:0000097
label: depth
value:
raw_value: 2.2m
value: 2.2 # value.label / value.id doesn't make sense here but 'value.value' is not ideal!
unit:
id: UO:0000008
label: meter
type: float
- attribute:
id: MIXS:0000012
label: env_broad_scale
value:
id: ENVO:00000446
label: terrestrial biome
type: cv_term # controlled vocab term - value is expressed as value.label and value.idAnother possibility:
- attribute_label: depth
attribute_cv_id: MIXS:0000097
value: 2.2
unit_label: meter
unit_cv_id: UO:0000008
type: float
- attribute_label: env_broad_scale
attribute_cv_id: MIXS:0000012
value_label: terrestrial biome
value_cv_id: ENVO:00000446
type: cv_termA simpler version:
- attribute: depth
attribute_cv_id: MIXS:0000097
value: 2.2
unit: meter
unit_cv_id: UO:0000008
type: float
- attribute: env_broad_scale
attribute_cv_id: MIXS:0000012
value: terrestrial biome
value_cv_id: ENVO:00000446
type: cv_term