Skip to content

MAINT/DOC: Designing notebook execution workflows #100

@asoplata

Description

@asoplata

@dylansdaniels This is partially related to #96 , but also related to how we manage the "process" of adding new notebooks or editing them according to new changes in development which are not yet on master (see here jonescompneurolab/hnn-core#1064 (comment) ). I think we need to have a discussion of our approach for how we handle development on tutorials that use code (e.g. notebooks) in its entirety.

Currently

We have two places where we are deploying notebooks (or "notebook-like-content"):

  1. The distinct Jupyter notebooks at textbook, which are only built against the latest stable version.
  2. The notebook-like "scripts" in our examples at hnn-core, which are deployed and built for every version, including new development versions as of each PR.

There is also currently no way for a script/notebook to automatically "go between" these two repos; currently we must manually copy Jupyter notebooks, which is problematic for a number of reasons (e.g. the notebooks that hnn-core builds are malformed).

Proposal

Essentially, I propose the following:

  1. We change hnn-core's doc system to use and execute true Jupyter notebooks, instead of scripts that are currently deployed as both 1. empty (no output and un-executed) Jupyter notebooks and 2. webpages that are built using sphinx out of script code and output. There are multiple sphinx extensions that support this, including myst-nb and nbsphinx. Here is some good documentation. Instead of the scripts getting tested via CircleCI for every PR (and release), the Jupyter notebooks would be tested. Honestly, even if we don't proceed with this proposal, this is a good idea for hnn-core regardless.
  2. We remove the "execution" part of Jupyter notebook processing in the textbook repo, but we still retain the code to take a Jupyter notebook, extract it into the JSON files you've made, and use them to assemble webpages that look good on the textbook website.
  3. Instead, the textbook repo will be changed to directly use the Jupyter notebooks from the hnn-core repo itself. There are at least 2 ways to do this:
    1. on deployment, textbook grabs notebook files directly from raw.githubusercontent and updates the hashes and its JSON files, or
    2. on deployment, textbook depends on and downloads hnn-core as a git submodule, then processes the newly-downloaded versions of the notebooks (on first glance I prefer this, and it has the added benefit that the only hash we need to track is that of the hnn-core submodule commit itself, rather than the hash of multiple files).
    3. There are probably other ways to do this.

Pros:

  • This means that, if desired, we can actually use textbook and point to BOTH stable and development notebook versions, if we want! For example, most of our notebooks would presumably be pointing to the stable version of each notebook (which will be located on the hnn-core/gh-pages branch), but if we want to add a page for a new feature that is still in development, that page by itself could point to the new notebook off of the hnn-core/master branch.
  • textbook could access development versions of textbooks, even including those of unfinished PRs. This is the issue I ran into here feat(network): Add spike_train_drive for explicit inter-network inputs hnn-core#1064 (comment) which made me come up with this solution.
  • There would no longer be any ambiguity about which version a notebook was successfully run on, or between websites. textbook and hnn-core would never have alternate versions of a notebook that the other repo could not access. There is "one and only one" version of each notebook per hnn-core commit.
  • Notebooks would only need to be executed once for each version, rather than twice (both hnn-core and separately textbook).
    • Similarly, textbook would also never have to double-check that each notebook actually works.
  • Similar to our code website, if we wanted to provide separate "stable" and "development" webpage versions of our textbook website, it's just a matter of pointing to different versions of the same notebooks.
  • The current method of manually copying Jupyter notebooks from hnn-core (which need to be heavily changed still, and do not necessarily work out-of-the-box!) is prone to issues, including 1. it's manual, 2. the current notebooks on textbook are not the same as the notebooks on hnn-core, 3. updates to scripts (or new ones) on hnn-core need to be tested on hnn-core, then converted to notebooks, then those notebooks need to have some changes made, then the updated notebooks need to be manually copied over to textbook, and this process needs to be repeated every time. If we are planning to add major rewrites or new additions of notebooks, which we are, then this is not a good approach.
  • This would prevent us from having to start dealing with multiple kinds of execution styles like from [WIP] Remove automatic forced-re-execution for versions #96.

Cons:

  • This is obviously a non-trivial amount of work. However, I think the current method is unsustainable, and we need to streamline (and connect the two repos) somehow.
  • Authors would have to edit notebooks on hnn-core rather than the textbook repo, even though they want the notebook content to show up on textbook. However, I think this is inevitable if we want textbook and hnn-core execution of every notebook to be synchronized.
  • Much of the execution code here would no longer be needed (but the rest of the code would still be).
  • ???

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions