Conversation
Re-format the SSSOM section in the Makefile to make the Jinja conditionals easier to follow. Also move the validate_mappings target to the top of section, right after the validate-sssom-% target that it uses.
The auto-generated SSSOM mapping set template contains a bogus declaration for the semapv prefix name (associated to the <https://w3id.org/semapv/> prefix instead of the expected <https://w3id.org/semapv/vocab/>). This is an error as per the SSSOM spec (which forbids associating a built-in prefix name to another prefix), so we fix that.
Add a new value for the 'maintenance' field of a SSSOM mapping set product: 'merged'. A "merged" mapping set is obtained simply by merging other mapping sets together. The sets that are to be merged should be listed in a new 'source_mappings' key; if that key is absent, then the set is made by merging all other (non-merged) sets.
For consistency with other product types, add an explicit 'custom' type of SSSOM mapping set product. The generated rule for a 'custom' SSSOM set does nothing but emit an error message, reminding the user that they must override the rule in their custom Makefile. (Users could also declare the set as being of type 'manual' -- which creates a rule that does nothing else but touch'ing the file --, but the 'custom' type makes it more explicit that the set is to be generated by some custom rule.)
When building a SSSOM mapping set by extracting mappings from an ontology (maintenance type set to 'extract'), if the ontology to extract the mappings from is not specified (no 'source_file' key), we default to extracting from the preprocessed edit file. We do not change that logic, but we move it from the Makefile template to the odk.py script, so that the template can just dereference 'mapping.source_file' without having to check whether that field had been set or not.
Add a new option to the SSSOM mapping set group: 'mapping_extractor'. That option determines the tool to be used to extract mappings from an ontology, to create the mapping sets of type 'extract'. The default extractor is 'sssom-py', which corresponds to the existing behaviour. The new value is 'robot', which causes the mappings to be extracted with the sssom:xref-extract command of the SSSOM ROBOT plugin.
Add a helper rule, 'normalize-sssom-%', to force the re-serialisation of a SSSOM mapping set using SSSOM-CLI. Can be useful especially for manually maintained sets. Also add a 'normalize_mappings' rule, which merely runs the above rule on all declared mapping sets.
Since SSSOM-Java can now possibly be used in a standard workflow, it belongs to the ODKLite image. Also upgrade to the latest 1.4.0 version.
The sssom-py mapping extractor generates a bogus mapping set if it does not find any mapping to extract (no header line), which prevents SSSOm-CLI to merge it with the other set.
When using sssom:xref-extract to extract mappings from an ontology, preserve any existing metadata in the original SSSOM file.
matentzn
left a comment
There was a problem hiding this comment.
Looks fantastic. Two optional comments!
| raise Exception(f"Unknown source mapping set '{source}'") | ||
| elif product.maintenance == "extract": | ||
| if product.source_file is None: | ||
| product.source_file = "$(EDIT_PREPROCESSED)" |
There was a problem hiding this comment.
I am assuming you thought this through - I recently struggled hard with a pipeline that provided a merged sssom mapping set which was partially extracted from upstream, but partially generated by the pipeline itself.. So, something like: build a bridge, then extract a mapping set from that, than merge that into a mapping set you publish. This caused some circularity. I don't think this is the case here, just wanted to share my experience.
There was a problem hiding this comment.
I am not sure I understand your concern, and how it applies here.
In any case, the only change here is that the logic to handle the absence of a source_file key has been moved from within the Makefile template to the odk.py script. This is so the template can simply says
$(MAPPINGDIR)/{{ mapping.id }}.sssom.tsv: {{ mapping.source_file }}instead of
$(MAPPINGDIR)/{{ mapping.id }}.sssom.tsv: {% if mapping.source_file is not none %}{{ mapping.source_file }}{% else %}$(EDIT_PREPROCESSED){% endif %}This is functionally identical to what we had before, only with a (slightly) more readable template.
odk/odk.py
Outdated
| """If set to True, mappings are copied to the release directory.""" | ||
|
|
||
| mapping_extractor : str = "sssom-py" | ||
| """The tool to use to extract mappings from an ontology ('sssom-py' or 'robot').""" |
There was a problem hiding this comment.
Maybe to be more future proof this could be sssom-py and sssom-java (in case someone implements a stellar mapping extraction method in robot itself :P). Ok fantasising now, but.. suggestion.
There was a problem hiding this comment.
in case someone implements a stellar mapping extraction method in robot itself
I think that’s highly unlikely :D but OK, I have no objection with that.
Since the SSSOM-Py-based mapping extraction process is called 'sssom-py', we should call the SSSOM-Java-based one 'sssom-java', instead of 'robot'.
This PR primarily adds a new value for the
maintenancefield of SSSOM mapping sets declared in thesssom_mappingset_group:merged.Given the following configuration:
The
my-merged-set.sssom.tsvset will be created by merging themy-set-a.sssom.tsvandmy-set-b.sssom.tsvsets.If
source_mappingsis not set for amergedset, the default behaviour is to use all the other sets as source, so the example above is actually equivalent to:(This is as discussed in #106.)
This PR also brings a few other changes to the SSSOM workflows:
A. Another
maintenancevalue,custom, to explicitly declare that a set is to be built using a custom rule in the custom Makefile (similarly to themodule_type: customfor import modules).B. The possibility to use the
xref-extractcommand of the SSSOM ROBOT plugin, rather thansssom-py parse, to extract mappings from an ontology. This is done with a new group-level option,mapping_extractor. The default value for that option issssom-py, which preserves the existing behaviour of usingsssom-py parse. If set torobot, mappings would be extracted usingrobot sssom:xref-extract.C. New helper rules to force mapping sets to be re-serialized, similarly to the
normalize_srcrule (normalize-sssom-Xto re-serialize mapping set X,normalize_mappingsto re-serialize all sets).closes #106