Skip to content

Fix the update-repo system#1180

Merged
gouttegd merged 8 commits intomasterfrom
fix-update
Feb 20, 2025
Merged

Fix the update-repo system#1180
gouttegd merged 8 commits intomasterfrom
fix-update

Conversation

@gouttegd
Copy link
Contributor

This PR implements the idea outlined in this comment to avoid having to run the update_repo script twice.

Basically, instead of having the update code in a local script installed in the repo under src/scripts/update_repo.sh, we put all the update logic directly in the odk.py script. That script accepts a new command update. If executed from within the src/ontology directory of an ODK repository, that command will read the *-odk.yaml configuration file and re-instantiate the templates directly into the repository, without overwriting most files if they already exist – only a handful of files (most importantly the src/ontology/Makefile file) are forcibly overwritten.

The command can be invoked as sh run.sh odk.py update, or (preferably) as sh run.sh update_repo. For backwards compatibility, sh run.sh make update_repo is still supported, but is better avoided.

As said in the aforementioned comment, this approach has (at least) two benefits, which both stems from the fact that the updating code is now entirely within the Docker image rather than in a local file in the repository:

  • it is always the latest version of the code that is executed;
  • the updating code is under no risk of being overwritten while it is executed.

Another benefit is the logic to update the code is now at the same place as the logic that seeds an initial repository (in the odk.py script), instead of being in the separate update_repo.sh.jinja2 template – this should facilitate the maintenance.

closes #989

We plan to reuse the code that copies/instantiates the template-derived
files for the upcoming `update` command, so we first need to isolate it
in a separate function.
Use list comprehensions to iterate over only the plain files first, then
over only the templates. This is no more efficient (we still need to
iterate over all the files twice), but is more readable.

When applying the templates, check whether the template is a "dynamic
pack" first, and only open the "derived_file" if it is not.

A further simplification would be to loop over all the files only once
and test for the template suffix once inside the loop:

    for f in files:
        if not f.endswith(TEMPLATE_SUFFIX):
            # Copy plain file
        else:
            # Apply template (single file or pack file)

but this would not allow to preserve the current behaviour of first
copying all the plain files before instantiating the templates (plain
files and templates would be processed in the order they are listed).
Make the install_template_files function accept an optional `policies`
dictionary. A key is that dictionary is the pathname of a template file
(relative to the top level of a ODK repository); the value should be one
of:

* IF_MISSING: install the corresponding file into the target repository,
  only if it does not already exist;
* ALWAYS: always install the corresponding file;
* NEVER: never install the corresponding file;

The policy dictionary is queried for every single file to install
(whether it is a plain file or a file from a Jinja2 template), and the
file is installed or not based on the corresponding policy. If the
dictionary does not contain an entry for a file, the default policy is
IF_MISSING.

In the `seed` command, there is no policy dictionary and so all files
are installed (unless they happen to exist already, in the case the
`seed` command was called without the `--clean` option) and the command
behaves as before.

But the policy dictionary will make it possible to implement an `update`
command where we can selectively decide which files must be overwritten
when we update an existing repository.
Now that we can selectively install instantiated template files in a
directory depending on a per-file policy and on whether the target file
already exists or not, we can implement a repository update method
directly in the main odk.py script.

The command simply:

1. finds the '*-odk.yaml' configuration file, aborting if it can't find
   it (this also acts as a check that we are within a ODK repository);
2. loads the configuration;
3. declares which files must be ignored or overwritten;
4. instantiates the templates.

This is a much more straightforward process than the original
`update_repo.sh` script, which involved seeding a brand new repository
in a staging directory and then selectively copying files from the
staging directory into the real repository.

Also, because the update code is within the odk.py script, this means it
is within the ODK image. Therefore, when using that command, it is
always the latest available update code from the latest ODK version that
is immediately used. With the previous method, the `update_repo.sh`
script that was executed was the one that was installed with the
previous version of the ODK, leading to the recommendation of running
the script twice so that the second invocation would use the updated
code.
That script will not be used anymore.

We keep the `update_repo` target in the Makefile, but make it invoke the
update command of the odk.py script. Using that target is not
recommended, it is kept for backwards compatibility only. (Its problem
is that the Makefile is overwritten during the update process.)
Updating a repo can now be done by invoking `sh run.sh odk.py update`.

But for convenience, we install a dead simple script in
/tools/update_repo that merely call the above command, so that an update
can be triggered using `sh run.sh update_repo`.

This is because until now, most people never had to deal with the
`odk.py` script directly, since it was typically only used for seeding a
repository (which is only ever done once). It was also indirectly used
for updates (since an update involved seeding a brand new repository and
then copying its files to the current repository), but this fact was
hidden from the users.

So, I believe suddenly introducing the need to call that odk.py script
to update a repo could cause needless confusion in users' mind, so we
hide it behing a single `update_repo` command. This also has the benefit
of shielding us from any future potential change to the interface of the
odk.py update command.
When updating, the shell executing the run.sh script (through which the
update command is executing, as any other ODK process) could fail
because the run.sh script is overwritten during the update.

To avoid that, we detect at the very beginning of the script if the user
is invoking the update command, and if it is, we make a copy of the
script in the tmp/ directory and immediately execute that copy, so that
any overwrite of the original script has no effect on the executing
shell.
@gouttegd gouttegd self-assigned this Feb 19, 2025
@gouttegd gouttegd requested a review from matentzn February 19, 2025 01:45
@gouttegd gouttegd marked this pull request as draft February 19, 2025 10:12
Change the type of the policies argument to `must_install_file` from a
dictionary to a list of tuples, where the first item in each tuple is a
shell-like globbing pattern.

This is slightly less efficient as it requires iterating over the entire
list to find a matching policy (instead of the on-average constant
access time of a hashtable), but is more convenient as it allows to have
a single policy for an entire subtree. Notably, it allows to do

  policies.add(('src/sparql/*', ALWAYS))

to apply a ALWAYS policy to all files under `src/sparql`, without having
to enumerate them all.
@gouttegd gouttegd marked this pull request as ready for review February 19, 2025 11:40
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looooooks good - from the general approach perspective :P Haha but my confidence with the python code is rather low. I would say, if you tested it, lets merge and once we get into ODK 1.6 release mode, this funcition will anyways undergo massive testing.

@gouttegd gouttegd merged commit 7e66a1c into master Feb 20, 2025
1 check passed
@gouttegd gouttegd deleted the fix-update branch February 20, 2025 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

update_repo: Run 2x to alleviate errors

2 participants