Simply the "Imports" section in the Makefile template#1194
Merged
Conversation
In order to make the Makefile template more readable, we need to move
some of the logic it contains into the odk.py script.
The general principle is that the configuration object that is given to
the templates should contain "pre-computed" fields that can be used
directly by the templates. This pre-computation will be done by the new
`_derive_fields()` method in the `ProductGroup` class and its derived
classes.
In this commit, we migrate some of the logic needed to deal with
imports. For example, instead of letting the Makefile template having to
deal with the possibility that the `module_type` of a given module might
not be set (in which case the template needs to get the value of the
`module_type` at the level of the entire ImportGroup), we ensure in the
`_derive_fields()` method of ImportGroup that all individual modules
have an explicit `module_type` (set from the group-level value if no
module type has been explicitly specified in the configuration file), so
that the Makefile template can simply use something like:
{% if 'filter' == ont.module_type %}
instead of
{% if ('filter' == ont.module_type) or
(ont.module_type is none
and 'filter' == project.import_group.module_type) %}
Likewise for the `slme_individuals` and `module_type_slme` parameters.
Also, we add a `special_products` derived field to the ImportGroup,
containing the list of all the modules that either (1) are marked as
"large", or (2) use a different type than the default, group-level
module type.
Overall, this makes the section of the template responsible for the
production of the rules for the individual modules easier to read.
The rules that produce the individual import modules are only generated when use_base_merging is False, so there is no need, for every single rule, to test for use_base_merging and to make the module dependent on `$(MIRRORDIR)/merged.owl` if use_base_merging is True -- that can never happen, if we are generating a rule for an individual import module then we are necessarily under use_base_merging=False.
Indent Jinja2 conditionals (within the {% ... %} blocks to make the
template easier to read.
That is, instead of:
{% if foo %}
{% for b in bar %}
{% if b.is_baz %}
...
{% endif %}
{% endfor %}
{% endif %}
we do instead:
{% if foo %}
{% for b in bar %}
{% if b.is_baz %}
...
{% endif %}
{% endfor %}
{% endif %}
This does not solve *all* the readability issues of the Makefile
template, but at least it provides a relatively easy way to find, for
example, the `if` that corresponds to a given `endif`.
Reformat some import rules to make them more readable. Apply fine-grained control of Jinja2 whitespaces to avoid generating spurious blank lines in the Makefile.
If we are generating a `$(IMPORTDIR)/%_import.owl` rule, then it necessarily means that `use_base_merging` is False, so there is no need to test that field when generating the dependencies of the rule. (This should have been done at the same time as the same change concerning the rules for the "special" import modules.)
Move the rule that produce the OBO version of an import module towards the end of the imports section, so that it is not mixed with the rules that produce the actual import modules. Apply some minor Jinja2 reformatting, and also add a comment explaining why we need non-generic rules for the "special" imports. Include the type of the module in the generated Makefile comment, as advanced users might find that useful.
When generating a ROBOT command that takes repeated arguments from a
list, write each item of the list on a separate line. That is:
robot extract --base-iri IRI1 \
--base-iri IRI2 \
--base-iri IRI3 \
...
instead of:
robot extract --base-iri IRI1 --base-iri IRI2 --base-iri IRI3 ...
as the second form could produce arbitrary long lines depending on how
many items we need to expand.
This makes the generated Makefile easier to read, at the expense of
making the *template* slightly harder to read. But since the Makefile is
what the users are exposed to, its readability is more important than
the readability of its template (which only ODK developers need worry
about).
Though never officially documented, 'fast_slme' was a module type that was almost identical to 'slme', the only difference being that a "fast" SLME module was produced without removing the original ontology annotations and without adding a 'dc:source' annotation. Since those tasks used to be performed by SPARQL queries, they were very memory-intensive and could be a bottlenecks for large modules. In fact, SLME modules that were marked as "large" were automatically treated as "fast SLME" modules because of their high memory requirement. Now that adding the 'dc:source' annotation is added by the ODK plugin in a way that does not involve SPARQL, that step is as fast as it could be and, more importantly, does not depend on the size of the import module anymore. Therefore, there is no longer any real need for a "fast track" for large SLME modules. So we remove that fast track entirely. All SLME modules are now equal, regardless of their size. For backwards compatibility, if a module is explicitly configured as a 'fast_slme' module, we silently recognise it as a standard 'slme' module. This also means that for now, the only effect of the 'is_large' parameter is to allow to selectively *not* refresh large import modules, while refreshing the other, non-large modules.
As suggested by a comment in the template, there is no real need for the Makefile to catenate, sort, and "uniquify" the lists of terms we use as a seed for imports (the automatically generated $(IMPORTSEED) and the manually maintained $(IMPORT)_terms.txt). ROBOT is perfectly able to accept: * more than one --term-file; * containing comments; * containing unsorted terms; * containing duplicated terms. Anything we might do before passing the lists to ROBOT would in fact be redundant with what ROBOT will do anyway, so we might as well let ROBOT do everything. This means that we no longer need any *_terms_combined.txt file.
Add a 'scan_signature' option in the ImportGroup. If true (which is the default), a "pre-seed" is extracted from the -edit file (and its components, if any). That pre-seed is added to the main import seed (tmp/seed.txt) along with any terms extracted from the DOSDP patterns (if any). This is the behaviour of the current version of the ODK. If that option is set to false, then the pre-seed is not extracted, and terms from DOSDP patterns are not added to the main seed as well. This means that the main seed can now only contain the custom seed optionally set by the 'use_custom_import_module'. This, in turn, means that the only terms that will effectively be imported will be those actively selected by the user (either in the custom import module, or in the individual *_terms.txt files). If we do not scan the signature and there is no custom import module, then the tmp/seed.txt file would effectively be empty, so in that case we do not even generate the corresponding rule in the Makefile. closes #638
When refactoring the `$(IMPORTDIR)/merged_import.owl` rule, an error was introduced: the `remove` command that removes annotation properties only took as input (for the list of annotation properties to preserve) the properties listed in ANNOTATION_PROPERTIES. It should also take as input all the seeds, so that any annotation property listed here is preserved as well (this is the original behaviour).
When creating the merged import module, there is no need to preserve the original ontology annotations. They are actually misleading since they are only the annotations of the _first_ mirror to have been merged. (Prior to the refactoring, those annotations were removed by the "postprocess" SPARQL script.) Conversely, when creating a generic SLME import module, during the extraction step we do want to preserve annotations -- otherwise the dc:source annotation we have explicitly added at the beginning of the command pipeline would be lost.
matentzn
reviewed
Mar 13, 2025
Contributor
matentzn
left a comment
There was a problem hiding this comment.
I went through the code line by line and LGTM; I made some mostly irrelevant comments, and currently running some tests. This may take a bit, but just wanted to ensure you I am working in it :)
Contributor
Author
FYI I tested it on all FlyBase ontologies (FBdv, FBbt, DPO, FBcv) without issues. But of course they are all using a merged import module, so this did not test all the code paths. |
Contributor
Author
|
@matentzn Are you still testing this? |
Contributor
|
Sorry, I was buried in a backlog, now I am running some tests. |
matentzn
approved these changes
Mar 25, 2025
Contributor
matentzn
left a comment
There was a problem hiding this comment.
I tested with three ontologies I care about (HPO, RO and CHR) and it the behaviour looks fine to me. Awesome! THANK you for this!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR attempts to make the "Imports" section in
Makefile.jinja2easier to read and maintain.First, it moves some of the logic into the
odk.pyscript. Notably, the script now takes care of automatically assigning the default module type to all individual modules that do not have an explicitmodule_typeparameter, so that we do not have to go fetch the default value in the template. That is, instead of having to do things like that:(testing if the module is a SLME module or if it does not have an explicit type and the default module type is SLME), we can do simply:
That change alone should reduce the risks of blood clots forming in the brain of ODK developers trying to read the template by at least a few percent.
Second, it removes rules that were generated for nothing: when
use_base_mergingis used, there is no need to generate rules for the individual imports (see #1189).Third, it removes the “fast track” for “large“ SLME modules. This was a special rule invoked for (1) SLME modules marked as “large” or (2) module using the (undocumented) type
fast_slme. The only difference compared to the standard SLME rule is that the “preprocessing step” (removing original ontology annotations and inserting adc:sourceannotation instead) is skipped in the fast track. This was because the preprocessing step used to be performed with SPARQL queries, which was memory-intensive for large modules. That step is now performed by the ODK plugin using pure Java code, and has no excessive memory requirement. Therefore, it is no longer needed to treat large modules differently.Fourth, and though it is not really a “simplification” (on the contrary, it’s a new feature), it adds the possibility of disabling the extraction of a seed from the -edit file (option
scan_signature = False) (#638).Fifth, it re-formats the template according to a few rules:
--inputinstead of-i,--term-fileinstead of-T, etc);{% endif %}{# !use_base_merging -#}).