-
Notifications
You must be signed in to change notification settings - Fork 7
[WIP] Test support for stable+dev builds and deployment on forks #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is the first part of a major refactoring overhaul of how we handle indexing, input-markdown and output-HTML pathing, and sidebar generation. This is still a work in progress, and most functionality has been commented out for debugging/testing. Major changes: - `update_page_index.py` has been renamed to `create_indices.py`. This new file now creates two indices: 1. "hier_index", which is the same as the old "index.json". Its content has not changed. Instead of being saved to file every time and then loaded later during sidebar creation, it creates and returns the dictionary instead, and that is passed to sidebar for usage. I'm not sure if we need to save it as a file at all, since its value can always be easily debugged live, and we were not doing any kind of checking to skip its loading before (i.e. if we have to reconstruct it every time, then there's no gain from saving a copy to disk). 2. "flat_index". This will eventually replace "ordered_links" and is greatly expanded from what "ordered_links" does. This is treated like "hier_index" as an explicit "generate_page_html" function, but will replace much functionality/data of the rest of the code. This index includes the ordering of legitimate "pages" (i.e. not "sections"), and for every page contains a dictionary containing: A. title of the page B. absolute path of the "input" markdown file C. absolute path of the "output" HTML file (including processing dev_build, and including creation of any parent directories to this not-yet-created file) D. "website-root"-relative path of the "output" HTML file (including processing dev_build) The next major step is using "flat_index" to replace ordered_links and output path/filename lookups throughout the rest of generate_page_html. Minor changes: - Location of the saved index files has been moved to inside "scripts" from the root - Many functions and some script files have been renamed
This continues the use of "flat_index", including in every case of the main loop, except for the "ordered_links" which are next. This includes many minor changes as well for formatting (including a ruff run), documentation, and other small changes.
Among other minor changes, including adding a new optional "save_indices" CLI arg that does what it sounds like, this finishes applying the usage of the new "flat_index" to the remaining code, which was mostly the "ordered" links used in the per-page footers. Everything appears to be working correctly, including that the sidebar and footer navigation uses "dev" or "content" properly. Next up is general cleaning and documenting.
This begins a multi-step transition from the CLI "--build-on-dev" argument to the combination of "--code-version" (formerly the unused "--build-type" argument) and "--custom-owner-commit". The CLI args have already been created, and this commit focuses on applying their use inside `get_commit_hash.py`. This also includes a great deal of minor changes `get_commit_hash`, including better error messages, more explicit variable names, etc. There is also incomplete support for a new "--code-version=no-check" option that skips the version checking in case one is debugging or if one wants to run 'build.py' a lot, but doesn't want to spam HTTP requests. This use-case is useful in itself, but also related to a weak hypothesis I had: that the fact that version mismatches in the previous "default" case of 'build.py' usage provided only *warnings*, instead of RuntimeErrors, like in all other mismatch cases (see https://github.com/dylansdaniels/textbook/blob/workflow_refactor/build.py#L610 )
`dev_build` has finished being replaced by a combination of `code_version` and (either `custom_owner_commit` or `commit_hash`) depending on the function. Documentation has not yet been updated, but it was incomplete to begin with.
This comments out what I think is something that may have been added by accident. Compare this line to https://github.com/dylansdaniels/textbook/blob/31dbaf09d6d854a4e8527024fa99f2b415820192/scripts/convert_notebooks.py#L923 . What was originally going on at the time of the above commit is the following: `notebook_was_run` was used as whether or not the notebook was attempted to be run at all, regardless of if it succeeded (I have changed this variable to `execution_initiated`). `notebook_executed` was used as whether or not the notebook successfully finished its execution (I have changed this variable to `execution_successful`). In the control flow, which has not changed at all, in this particular case, we are in the case where `notebook_was_run`/`execution_initiated` is False, meaning no execution was attempted at all. Also in the original version, `dev_build` would have been activated, meaning we're in a dev state. However, even if we were in a dev state, but if the notebook execution was never initiated/attempted, then in this case, I do not think it makes sense to record the commit hash of the current HNN version into the notebook. If we did, it would imply that the last dev execution was using this commit, but that is not the case.
This adds a large description of the building program as a whole to the CLI for 'build.py' such that it is output when calling python build.py --help This includes description of the assumed file structure (which has many, many assumptions), the overall code path, some examples, and the important options along the way. Minor: this also adds a docstring for 'get_hnn_commit_hash', renames the 'write_standalone_html' variable to 'save_standalone_nb_html', and makes a new CLI argument of a similar name. This documentation was first generated with Claude AI before undergoing *heavy* modification of every word.
This partially reverts our post-hnn_commit_hash-validation code's checks of the "code-version" to use a new var like in the original, called "is_dev_build". The distinction between a "dev" and non dev build is now fully documented, apparent, and set in 'build.py'. "is_dev_build" is also a proper boolean flag (true/false only) and the actual commit hash to use (when necessary) is handled separately. More refactoring: this also renames the hnn-core commit hash module and splits its functions, so that the 'get_hnn_commit_hash' function is ONLY getting the installed version, and the rest of the code is validation of the installed version against various other versions.
There are also some minor variable name changes
Previously, we were loading the same notebook from disk twice, once to calculate the hash, then once to actually use it. It's more efficient to load from disk once.
This includes a special comment about how `_read_nb_json_output_metadata` looks in different places for the JSON output files depending on if you're doing a stable or dev build
This fixes what I believe is a bug introduced in the original "dev_build" changes. Here https://github.com/dylansdaniels/textbook/blob/31dbaf09d6d854a4e8527024fa99f2b415820192/build.py#L114 there is a bug at the later step when the final HTML page is being built, and when the HTML content of the notebook (from inside its JSON output file) is being loaded. In this case, because the JSON output file was being loaded using unique code (different from the JSON output loading code used at an earlier step inside `scripts/convert_notebooks.py` see here https://github.com/dylansdaniels/textbook/blob/31dbaf09d6d854a4e8527024fa99f2b415820192/scripts/convert_notebooks.py#L459 ), only the "content" version of the JSON output file was being loaded, and never the "dev" version. This meant that for "dev" final-page HTML output containing a notebook's contents, the wrong JSON output was being used to generate the page. This has been fixed in that both instances of loading the JSON output files have been changed to use the same function. Also, the later step has extra configuration added in order to reconstruct the path of the desired JSON output file location (which differs in the "dev") case. More generally, this is another instance of where the complexity introduced by trying to accomodate the different "dev" vs "stable" file output locations is a lot, and maybe too much. Perhaps in the future we can consider changing how he handle a separate "dev" build on the filesystem on a fundamental level. For example, for a "dev" build, we could make a full copy in the filesystem of the "content" directory into the "dev" directory directly, and then replace our "content_path" variable at the highest level with the "dev" directory.
This also moves the output-directory creation step outside of `create_flat_index` and overtly into the main loop of the HTML page building code. (If you have to write a ton of comments explaining that a step is done in a particular location, then maybe it shouldn't be in that location! I'm dumb and it never should have gone in the index creation func in the first place). As is typical, these docstrings were *initially* created with Claude but then inspected and edited substantially.
This removes the '--code-version' option 'no-check', and instead adds a standalone CLI argument called '--no-version-validation' that does the same thing. This allows both "stable" and "dev" builds to EITHER be run without code-version being checked. (Before, no-check was always a "dev" build, but this is an improvement since not validating is useful when inspecting or debugging a local stable build). This also replaces the confusing argparse not-recommended use of bool in favor of `action="store_true"`. This is the recommended way to treat CLI arguments as "presence flags" so to speak, albeit the naming is a little confusing (the default is False for "store_true").
SHOULD be no code changes...
You can probably guess where this is going... Note: this does not affect the Workshop page, since it's raw HTML.
My Claude's filesystem scan was out-of-date, and lightly "hallulcinated" a `templates/navbar.html` that no longer exists. This removes that.
Appears to work just as well.
|
This should definitely be done in a single action. How I imagined it would be done: If repo is jonescompneurolab, always do stable. Add dev folder to .gitignore. If repo is not jonescompneurolab, look for a tag or some flag in the commit message that would specify the type of build to be done. Do that. (We don't need both builds in a fork) |
|
This PR was primarily for testing how the deployment stuff would work. It definitely makes sense to finish our design of how everything is supposed to work, before we implement those changes on this PR and then merge. I'm currently writing another version of the proposal at #100. |
|
Makes sense, just throwing in my 2 cents so you know my thought process - doesn't mean it has to be that way :) |
This is based on top of the end of #134. This is intended to test how fork-based deployments would work alongside "stable"
content/folders versus "dev"dev/folders.I first tried making a separate Github Action for the "dev" build using
master. That, by itself, does appear to generate a "dev" build of the website at https://asoplata.github.io/textbook/dev/01_getting_started/template_model.html at first. (For reference, since I have domain-forwarding on, all my deployments will go to https://asoplata.com as in https://asoplata.com/textbook/dev/01_getting_started/template_model.html , but this is equivalent to deployments going to<user>.github.io).However, my first attempt failed, since two different Github Actions means two different websites being deployed. If my new "dev" action finishes first, then the "dev" version of my fork https://asoplata.com/textbook/dev/01_getting_started/template_model.html will get deployed. However, this is temporary, since when the "stable" action finishes later (or vice-versa, depending on timing), then the stable version of the website will be deployed (e.g. https://asoplata.com/textbook/content/preface.html ), and my "dev" version will no longer be accessible.
What we want, then, is a single website being deployed, one which has both the "stable" output at
contentand the "dev" output atdev, at the same time.Perhaps the easiest way to do this is to simply add to the existing Github Action, which I've done in the latest commit. First, we create the
textbook-stable-buildenvironment, then use it to run a stable build. Then, only if we're in a fork, we create thetextbook-dev-buildenvironment and use that to run a dev build. This should give us a website deployment that includes both the stable HTML builds atcontentand the dev HTML builds atdev, while allowing the dev pages to reference thecontentassets as needed (like for local images). Based on the most recent deployment (see https://asoplata.com/textbook/content/08_using_hnn_api/parallelism_mpi.html and https://asoplata.com/textbook/dev/08_using_hnn_api/parallelism_mpi.html ), this seems to be effective: we have bothcontentanddevavailable.Some issues with this approach: