fix: backtick-stripping regex in nb2md()#268
Merged
danielfrg merged 6 commits intodanielfrg:mainfrom Mar 5, 2026
Merged
Conversation
Notebook conversion (nb2html + nb2md for TOC) runs on every build, even when notebooks haven't changed. This adds a SHA-256 content+config cache that skips conversion entirely on cache hit. New plugin config options: - cache (bool, default: True): enable/disable caching - cache_dir (str, default: ".cache/mkdocs-jupyter"): cache location The cache key incorporates the notebook file content and all config options that affect output (execute, kernel_name, theme, show_input, no_input, remove_tag_config, highlight_extra_classes, include_requirejs, custom_mathjax_url, toc_depth). Changing any of these invalidates the cache for that notebook, which triggers a rebuild. Refactors get_nb_toc() to extract _get_nb_toc_tokens() so raw TOC tokens can be serialized to the cache without a redundant nb2md() call. Adds 9 tests: unit tests for cache key determinism and invalidation, and integration tests verifying cache population, cache hit (nb2html not called on second build), and cache-disabled behavior. Closes danielfrg#161
- Add allow_errors to cache key config options - Hash resolved exec_nb value instead of config["execute"] so notebooks matching execute_ignore get a distinct cache key - Use repr() for config values in cache key for deterministic hashing of dicts - Add stale cache eviction via on_post_build: tracks used cache paths during build and removes orphaned .json files - Add tests for execute_ignore key sensitivity and stale eviction
- Fixes truncated Table of Contents for notebooks with inline code in markdown cells - The regex `[.\s\S]*?` in backquote_text_regex matches any character including newlines, causing it to span across lines and destroy markdown headings used for TOC generation - Replace with `[^`\n]*` to restrict matching to single lines
- The previously committed backtick fix was stripping #include which resulted in failing tests. Fixed.
Owner
|
ty! |
✅ Preview deployment
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fix addresses a problem with generating table of contents based on notebook files that have backtick (
`) heavy content. I noticed that some of the notebooks we convert in our documentation just mysteriously drop sections suddenly in the generated ToC. The tests intest_toc.pyare all still passing.Here is a minimal example of the issue this PR addressed:
Old regex:
[.\s\S]*?matches across newlines, scanning forward until it finds the next backtick, which is in Section C...\n\n## Section C\n\nMore text with, destroying the Section C heading.New regex:
[^\n]*`cannot cross newlines, all sections are included.#includeas well (noticed this while testing after the fix in the first commit).