Fix the update-repo system by gouttegd · Pull Request #1180 · INCATools/ontology-development-kit

gouttegd · 2025-02-19T00:39:30Z

This PR implements the idea outlined in this comment to avoid having to run the update_repo script twice.

Basically, instead of having the update code in a local script installed in the repo under src/scripts/update_repo.sh, we put all the update logic directly in the odk.py script. That script accepts a new command update. If executed from within the src/ontology directory of an ODK repository, that command will read the *-odk.yaml configuration file and re-instantiate the templates directly into the repository, without overwriting most files if they already exist – only a handful of files (most importantly the src/ontology/Makefile file) are forcibly overwritten.

The command can be invoked as sh run.sh odk.py update, or (preferably) as sh run.sh update_repo. For backwards compatibility, sh run.sh make update_repo is still supported, but is better avoided.

As said in the aforementioned comment, this approach has (at least) two benefits, which both stems from the fact that the updating code is now entirely within the Docker image rather than in a local file in the repository:

it is always the latest version of the code that is executed;
the updating code is under no risk of being overwritten while it is executed.

Another benefit is the logic to update the code is now at the same place as the logic that seeds an initial repository (in the odk.py script), instead of being in the separate update_repo.sh.jinja2 template – this should facilitate the maintenance.

closes #989

We plan to reuse the code that copies/instantiates the template-derived files for the upcoming `update` command, so we first need to isolate it in a separate function.

Use list comprehensions to iterate over only the plain files first, then over only the templates. This is no more efficient (we still need to iterate over all the files twice), but is more readable. When applying the templates, check whether the template is a "dynamic pack" first, and only open the "derived_file" if it is not. A further simplification would be to loop over all the files only once and test for the template suffix once inside the loop: for f in files: if not f.endswith(TEMPLATE_SUFFIX): # Copy plain file else: # Apply template (single file or pack file) but this would not allow to preserve the current behaviour of first copying all the plain files before instantiating the templates (plain files and templates would be processed in the order they are listed).

Make the install_template_files function accept an optional `policies` dictionary. A key is that dictionary is the pathname of a template file (relative to the top level of a ODK repository); the value should be one of: * IF_MISSING: install the corresponding file into the target repository, only if it does not already exist; * ALWAYS: always install the corresponding file; * NEVER: never install the corresponding file; The policy dictionary is queried for every single file to install (whether it is a plain file or a file from a Jinja2 template), and the file is installed or not based on the corresponding policy. If the dictionary does not contain an entry for a file, the default policy is IF_MISSING. In the `seed` command, there is no policy dictionary and so all files are installed (unless they happen to exist already, in the case the `seed` command was called without the `--clean` option) and the command behaves as before. But the policy dictionary will make it possible to implement an `update` command where we can selectively decide which files must be overwritten when we update an existing repository.

Now that we can selectively install instantiated template files in a directory depending on a per-file policy and on whether the target file already exists or not, we can implement a repository update method directly in the main odk.py script. The command simply: 1. finds the '*-odk.yaml' configuration file, aborting if it can't find it (this also acts as a check that we are within a ODK repository); 2. loads the configuration; 3. declares which files must be ignored or overwritten; 4. instantiates the templates. This is a much more straightforward process than the original `update_repo.sh` script, which involved seeding a brand new repository in a staging directory and then selectively copying files from the staging directory into the real repository. Also, because the update code is within the odk.py script, this means it is within the ODK image. Therefore, when using that command, it is always the latest available update code from the latest ODK version that is immediately used. With the previous method, the `update_repo.sh` script that was executed was the one that was installed with the previous version of the ODK, leading to the recommendation of running the script twice so that the second invocation would use the updated code.

That script will not be used anymore. We keep the `update_repo` target in the Makefile, but make it invoke the update command of the odk.py script. Using that target is not recommended, it is kept for backwards compatibility only. (Its problem is that the Makefile is overwritten during the update process.)

Updating a repo can now be done by invoking `sh run.sh odk.py update`. But for convenience, we install a dead simple script in /tools/update_repo that merely call the above command, so that an update can be triggered using `sh run.sh update_repo`. This is because until now, most people never had to deal with the `odk.py` script directly, since it was typically only used for seeding a repository (which is only ever done once). It was also indirectly used for updates (since an update involved seeding a brand new repository and then copying its files to the current repository), but this fact was hidden from the users. So, I believe suddenly introducing the need to call that odk.py script to update a repo could cause needless confusion in users' mind, so we hide it behing a single `update_repo` command. This also has the benefit of shielding us from any future potential change to the interface of the odk.py update command.

When updating, the shell executing the run.sh script (through which the update command is executing, as any other ODK process) could fail because the run.sh script is overwritten during the update. To avoid that, we detect at the very beginning of the script if the user is invoking the update command, and if it is, we make a copy of the script in the tmp/ directory and immediately execute that copy, so that any overwrite of the original script has no effect on the executing shell.

Change the type of the policies argument to `must_install_file` from a dictionary to a list of tuples, where the first item in each tuple is a shell-like globbing pattern. This is slightly less efficient as it requires iterating over the entire list to find a matching policy (instead of the on-average constant access time of a hashtable), but is more convenient as it allows to have a single policy for an entire subtree. Notably, it allows to do policies.add(('src/sparql/*', ALWAYS)) to apply a ALWAYS policy to all files under `src/sparql`, without having to enumerate them all.

matentzn

This looooooks good - from the general approach perspective :P Haha but my confidence with the python code is rather low. I would say, if you tested it, lets merge and once we get into ODK 1.6 release mode, this funcition will anyways undergo massive testing.

odk/odk.py

gouttegd added 7 commits February 17, 2025 22:06

Move the template-instanciating code out of the seed command.

b550496

We plan to reuse the code that copies/instantiates the template-derived files for the upcoming `update` command, so we first need to isolate it in a separate function.

gouttegd self-assigned this Feb 19, 2025

gouttegd requested a review from matentzn February 19, 2025 01:45

gouttegd marked this pull request as draft February 19, 2025 10:12

gouttegd marked this pull request as ready for review February 19, 2025 11:40

gouttegd mentioned this pull request Feb 20, 2025

Fix the odk-info script. #1184

Merged

matentzn approved these changes Feb 20, 2025

View reviewed changes

odk/odk.py Show resolved Hide resolved

odk/odk.py Show resolved Hide resolved

gouttegd merged commit 7e66a1c into master Feb 20, 2025
1 check passed

gouttegd deleted the fix-update branch February 20, 2025 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the update-repo system#1180

Fix the update-repo system#1180
gouttegd merged 8 commits intomasterfrom
fix-update

gouttegd commented Feb 19, 2025

Uh oh!

matentzn left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gouttegd commented Feb 19, 2025

Uh oh!

matentzn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants