Skip to content

Conversation

@asoplata
Copy link
Collaborator

@asoplata asoplata commented Nov 4, 2025

This is based on top of the end of #134. This is intended to test how fork-based deployments would work alongside "stable" content/ folders versus "dev" dev/ folders.

I first tried making a separate Github Action for the "dev" build using master. That, by itself, does appear to generate a "dev" build of the website at https://asoplata.github.io/textbook/dev/01_getting_started/template_model.html at first. (For reference, since I have domain-forwarding on, all my deployments will go to https://asoplata.com as in https://asoplata.com/textbook/dev/01_getting_started/template_model.html , but this is equivalent to deployments going to <user>.github.io).

However, my first attempt failed, since two different Github Actions means two different websites being deployed. If my new "dev" action finishes first, then the "dev" version of my fork https://asoplata.com/textbook/dev/01_getting_started/template_model.html will get deployed. However, this is temporary, since when the "stable" action finishes later (or vice-versa, depending on timing), then the stable version of the website will be deployed (e.g. https://asoplata.com/textbook/content/preface.html ), and my "dev" version will no longer be accessible.

What we want, then, is a single website being deployed, one which has both the "stable" output at content and the "dev" output at dev, at the same time.

Perhaps the easiest way to do this is to simply add to the existing Github Action, which I've done in the latest commit. First, we create the textbook-stable-build environment, then use it to run a stable build. Then, only if we're in a fork, we create the textbook-dev-build environment and use that to run a dev build. This should give us a website deployment that includes both the stable HTML builds at content and the dev HTML builds at dev, while allowing the dev pages to reference the content assets as needed (like for local images). Based on the most recent deployment (see https://asoplata.com/textbook/content/08_using_hnn_api/parallelism_mpi.html and https://asoplata.com/textbook/dev/08_using_hnn_api/parallelism_mpi.html ), this seems to be effective: we have both content and dev available.

Some issues with this approach:

  • These will take a while to run, since they're running serially. Maybe >20 minutes for the whole thing if most notebooks need to be re-run. This also means that if there's an error, it needs to be fixed, and then the whole thing needs to take its time again.
  • If the notebooks are immediately executed with "stable" (or at least have their last version recorded as have successfully run under the latest stable), then when we run a "dev-master" build, they will all have to be re-run again every time IIRC due to the way that the recent-execution-history is detected in the code (see https://github.com/asoplata/textbook/blob/09e00457dadd05fefa5f632ed3a1cf62c153fb75/scripts/execute_and_convert_nbs.py#L1197 ).
    • This also suggests that instead of notebooks' JSON output only recording the last HNN-commit used in case of a dev build, maybe we should change this. Maybe we can persist both the previous "version" used and the previous "commit" used, but have them refer to different things: the previous "version" is always the last "stable" that the notebook was executed on, but the last "commit" is always the last "master" that the notebook was executed on. This way, executing a notebook on "stable" doesn't interfere with the tracking of "what was the last 'master' commit it was run against".

This is the first part of a major refactoring overhaul of how we handle
indexing, input-markdown and output-HTML pathing, and sidebar
generation. This is still a work in progress, and most functionality has
been commented out for debugging/testing.

Major changes:

- `update_page_index.py` has been renamed to `create_indices.py`. This
new file now creates two indices:

1. "hier_index", which is the same as the old "index.json". Its content
has not changed. Instead of being saved to file every time and then
loaded later during sidebar creation, it creates and returns the
dictionary instead, and that is passed to sidebar for usage. I'm not
sure if we need to save it as a file at all, since its value can always
be easily debugged live, and we were not doing any kind of checking to
skip its loading before (i.e. if we have to reconstruct it every time,
then there's no gain from saving a copy to disk).

2. "flat_index". This will eventually replace "ordered_links" and is
greatly expanded from what "ordered_links" does. This is treated like
"hier_index" as an explicit "generate_page_html" function, but will replace
much functionality/data of the rest of the code. This index includes the
ordering of legitimate "pages" (i.e. not "sections"), and for every page
contains a dictionary containing:

A. title of the page
B. absolute path of the "input" markdown file
C. absolute path of the "output" HTML file (including processing
dev_build, and including creation of any parent directories to this
not-yet-created file)
D. "website-root"-relative path of the "output" HTML file (including processing
dev_build)

The next major step is using "flat_index" to replace ordered_links and
output path/filename lookups throughout the rest of generate_page_html.

Minor changes:
- Location of the saved index files has been moved to inside "scripts"
from the root
- Many functions and some script files have been renamed
This continues the use of "flat_index", including in every case of the
main loop, except for the "ordered_links" which are next.

This includes many minor changes as well for formatting (including a
ruff run), documentation, and other small changes.
Among other minor changes, including adding a new optional
"save_indices" CLI arg that does what it sounds like, this finishes
applying the usage of the new "flat_index" to the remaining code, which
was mostly the "ordered" links used in the per-page footers.

Everything appears to be working correctly, including that the sidebar
and footer navigation uses "dev" or "content" properly.

Next up is general cleaning and documenting.
This begins a multi-step transition from the CLI "--build-on-dev"
argument to the combination of "--code-version" (formerly the unused
"--build-type" argument) and "--custom-owner-commit".

The CLI args have already been created, and this commit focuses on
applying their use inside `get_commit_hash.py`.

This also includes a great deal of minor changes `get_commit_hash`,
including better error messages, more explicit variable names, etc.

There is also incomplete support for a new "--code-version=no-check"
option that skips the version checking in case one is debugging or if
one wants to run 'build.py' a lot, but doesn't want to spam HTTP
requests. This use-case is useful in itself, but also related to a weak
hypothesis I had: that the fact that version mismatches in the previous
"default" case of 'build.py' usage provided only *warnings*, instead of
RuntimeErrors, like in all other mismatch cases (see
https://github.com/dylansdaniels/textbook/blob/workflow_refactor/build.py#L610
)
`dev_build` has finished being replaced by a combination of
`code_version` and (either `custom_owner_commit` or `commit_hash`)
depending on the function. Documentation has not yet been updated, but
it was incomplete to begin with.
This comments out what I think is something that may have been added by
accident. Compare this line to
https://github.com/dylansdaniels/textbook/blob/31dbaf09d6d854a4e8527024fa99f2b415820192/scripts/convert_notebooks.py#L923
.

What was originally going on at the time of the above commit is the
following:
`notebook_was_run` was used as whether or not the notebook was attempted
to be run at all, regardless of if it succeeded (I have changed this
variable to `execution_initiated`). `notebook_executed` was used as
whether or not the notebook successfully finished its execution (I have
changed this variable to `execution_successful`).

In the control flow, which has not changed at all, in this particular
case, we are in the case where `notebook_was_run`/`execution_initiated`
is False, meaning no execution was attempted at all. Also in the
original version, `dev_build` would have been activated, meaning we're
in a dev state. However, even if we were in a dev state, but if the
notebook execution was never initiated/attempted, then in this case, I
do not think it makes sense to record the commit hash of the current HNN
version into the notebook. If we did, it would imply that the last dev
execution was using this commit, but that is not the case.
This adds a large description of the building program as a whole to the
CLI for 'build.py' such that it is output when calling

python build.py --help

This includes description of the assumed file structure (which has many,
many assumptions), the overall code path, some examples, and the
important options along the way.

Minor: this also adds a docstring for 'get_hnn_commit_hash', renames the
'write_standalone_html' variable to 'save_standalone_nb_html', and makes
a new CLI argument of a similar name. This documentation was first
generated with Claude AI before undergoing *heavy* modification of every
word.
This partially reverts our post-hnn_commit_hash-validation code's checks
of the "code-version" to use a new var like in the original, called
"is_dev_build". The distinction between a "dev" and non dev build is now
fully documented, apparent, and set in 'build.py'. "is_dev_build" is
also a proper boolean flag (true/false only) and the actual commit hash
to use (when necessary) is handled separately.

More refactoring: this also renames the hnn-core commit hash module and
splits its functions, so that the 'get_hnn_commit_hash' function is ONLY
getting the installed version, and the rest of the code is validation of
the installed version against various other versions.
There are also some minor variable name changes
Previously, we were loading the same notebook from disk twice, once to
calculate the hash, then once to actually use it. It's more efficient to
load from disk once.
This includes a special comment about how
`_read_nb_json_output_metadata` looks in different places for the JSON
output files depending on if you're doing a stable or dev build
This fixes what I believe is a bug introduced in the original
"dev_build" changes.

Here
https://github.com/dylansdaniels/textbook/blob/31dbaf09d6d854a4e8527024fa99f2b415820192/build.py#L114
there is a bug at the later step when the final HTML page is being
built, and when the HTML content of the notebook (from inside its JSON
output file) is being loaded. In this case, because the JSON output file
was being loaded using unique code (different from the JSON output
loading code used at an earlier step inside
`scripts/convert_notebooks.py` see here
https://github.com/dylansdaniels/textbook/blob/31dbaf09d6d854a4e8527024fa99f2b415820192/scripts/convert_notebooks.py#L459
), only the "content" version of the JSON output file was being loaded,
and never the "dev" version. This meant that for "dev" final-page HTML
output containing a notebook's contents, the wrong JSON output was being
used to generate the page.

This has been fixed in that both instances of loading the JSON output
files have been changed to use the same function. Also, the later step
has extra configuration added in order to reconstruct the path of the
desired JSON output file location (which differs in the "dev") case.

More generally, this is another instance of where the complexity
introduced by trying to accomodate the different "dev" vs "stable" file
output locations is a lot, and maybe too much. Perhaps in the future we
can consider changing how he handle a separate "dev" build on the
filesystem on a fundamental level. For example, for a "dev" build, we
could make a full copy in the filesystem of the "content" directory into
the "dev" directory directly, and then replace our "content_path"
variable at the highest level with the "dev" directory.
This also moves the output-directory creation step outside of
`create_flat_index` and overtly into the main loop of the HTML page
building code. (If you have to write a ton of comments explaining that a
step is done in a particular location, then maybe it shouldn't be in
that location! I'm dumb and it never should have gone in the index
creation func in the first place).

As is typical, these docstrings were *initially* created with Claude but
then inspected and edited substantially.
This removes the '--code-version' option 'no-check', and instead adds a
standalone CLI argument called '--no-version-validation' that does the
same thing. This allows both "stable" and "dev" builds to EITHER be run
without code-version being checked. (Before, no-check was always a "dev"
build, but this is an improvement since not validating is useful when
inspecting or debugging a local stable build).

This also replaces the confusing argparse not-recommended use of bool in
favor of `action="store_true"`. This is the recommended way to treat CLI
arguments as "presence flags" so to speak, albeit the naming is a little
confusing (the default is False for "store_true").
@dylansdaniels
Copy link
Collaborator

dylansdaniels commented Nov 4, 2025

This should definitely be done in a single action.

How I imagined it would be done:

If repo is jonescompneurolab, always do stable. Add dev folder to .gitignore.

If repo is not jonescompneurolab, look for a tag or some flag in the commit message that would specify the type of build to be done. Do that. (We don't need both builds in a fork)

@asoplata
Copy link
Collaborator Author

asoplata commented Nov 5, 2025

This PR was primarily for testing how the deployment stuff would work. It definitely makes sense to finish our design of how everything is supposed to work, before we implement those changes on this PR and then merge. I'm currently writing another version of the proposal at #100.

@dylansdaniels
Copy link
Collaborator

Makes sense, just throwing in my 2 cents so you know my thought process - doesn't mean it has to be that way :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants