Conversation
We plan to reuse the code that copies/instantiates the template-derived files for the upcoming `update` command, so we first need to isolate it in a separate function.
Use list comprehensions to iterate over only the plain files first, then
over only the templates. This is no more efficient (we still need to
iterate over all the files twice), but is more readable.
When applying the templates, check whether the template is a "dynamic
pack" first, and only open the "derived_file" if it is not.
A further simplification would be to loop over all the files only once
and test for the template suffix once inside the loop:
for f in files:
if not f.endswith(TEMPLATE_SUFFIX):
# Copy plain file
else:
# Apply template (single file or pack file)
but this would not allow to preserve the current behaviour of first
copying all the plain files before instantiating the templates (plain
files and templates would be processed in the order they are listed).
Make the install_template_files function accept an optional `policies` dictionary. A key is that dictionary is the pathname of a template file (relative to the top level of a ODK repository); the value should be one of: * IF_MISSING: install the corresponding file into the target repository, only if it does not already exist; * ALWAYS: always install the corresponding file; * NEVER: never install the corresponding file; The policy dictionary is queried for every single file to install (whether it is a plain file or a file from a Jinja2 template), and the file is installed or not based on the corresponding policy. If the dictionary does not contain an entry for a file, the default policy is IF_MISSING. In the `seed` command, there is no policy dictionary and so all files are installed (unless they happen to exist already, in the case the `seed` command was called without the `--clean` option) and the command behaves as before. But the policy dictionary will make it possible to implement an `update` command where we can selectively decide which files must be overwritten when we update an existing repository.
Now that we can selectively install instantiated template files in a directory depending on a per-file policy and on whether the target file already exists or not, we can implement a repository update method directly in the main odk.py script. The command simply: 1. finds the '*-odk.yaml' configuration file, aborting if it can't find it (this also acts as a check that we are within a ODK repository); 2. loads the configuration; 3. declares which files must be ignored or overwritten; 4. instantiates the templates. This is a much more straightforward process than the original `update_repo.sh` script, which involved seeding a brand new repository in a staging directory and then selectively copying files from the staging directory into the real repository. Also, because the update code is within the odk.py script, this means it is within the ODK image. Therefore, when using that command, it is always the latest available update code from the latest ODK version that is immediately used. With the previous method, the `update_repo.sh` script that was executed was the one that was installed with the previous version of the ODK, leading to the recommendation of running the script twice so that the second invocation would use the updated code.
That script will not be used anymore. We keep the `update_repo` target in the Makefile, but make it invoke the update command of the odk.py script. Using that target is not recommended, it is kept for backwards compatibility only. (Its problem is that the Makefile is overwritten during the update process.)
Updating a repo can now be done by invoking `sh run.sh odk.py update`. But for convenience, we install a dead simple script in /tools/update_repo that merely call the above command, so that an update can be triggered using `sh run.sh update_repo`. This is because until now, most people never had to deal with the `odk.py` script directly, since it was typically only used for seeding a repository (which is only ever done once). It was also indirectly used for updates (since an update involved seeding a brand new repository and then copying its files to the current repository), but this fact was hidden from the users. So, I believe suddenly introducing the need to call that odk.py script to update a repo could cause needless confusion in users' mind, so we hide it behing a single `update_repo` command. This also has the benefit of shielding us from any future potential change to the interface of the odk.py update command.
When updating, the shell executing the run.sh script (through which the update command is executing, as any other ODK process) could fail because the run.sh script is overwritten during the update. To avoid that, we detect at the very beginning of the script if the user is invoking the update command, and if it is, we make a copy of the script in the tmp/ directory and immediately execute that copy, so that any overwrite of the original script has no effect on the executing shell.
Change the type of the policies argument to `must_install_file` from a
dictionary to a list of tuples, where the first item in each tuple is a
shell-like globbing pattern.
This is slightly less efficient as it requires iterating over the entire
list to find a matching policy (instead of the on-average constant
access time of a hashtable), but is more convenient as it allows to have
a single policy for an entire subtree. Notably, it allows to do
policies.add(('src/sparql/*', ALWAYS))
to apply a ALWAYS policy to all files under `src/sparql`, without having
to enumerate them all.
matentzn
approved these changes
Feb 20, 2025
Contributor
matentzn
left a comment
There was a problem hiding this comment.
This looooooks good - from the general approach perspective :P Haha but my confidence with the python code is rather low. I would say, if you tested it, lets merge and once we get into ODK 1.6 release mode, this funcition will anyways undergo massive testing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the idea outlined in this comment to avoid having to run the
update_reposcript twice.Basically, instead of having the update code in a local script installed in the repo under
src/scripts/update_repo.sh, we put all the update logic directly in theodk.pyscript. That script accepts a new commandupdate. If executed from within thesrc/ontologydirectory of an ODK repository, that command will read the*-odk.yamlconfiguration file and re-instantiate the templates directly into the repository, without overwriting most files if they already exist – only a handful of files (most importantly thesrc/ontology/Makefilefile) are forcibly overwritten.The command can be invoked as
sh run.sh odk.py update, or (preferably) assh run.sh update_repo. For backwards compatibility,sh run.sh make update_repois still supported, but is better avoided.As said in the aforementioned comment, this approach has (at least) two benefits, which both stems from the fact that the updating code is now entirely within the Docker image rather than in a local file in the repository:
Another benefit is the logic to update the code is now at the same place as the logic that seeds an initial repository (in the
odk.pyscript), instead of being in the separateupdate_repo.sh.jinja2template – this should facilitate the maintenance.closes #989