Skip to content

Add format-ipy-cells builtin hook#1844

Open
janosh wants to merge 4 commits intoj178:masterfrom
janosh:feat/format-ipy-cells-builtin
Open

Add format-ipy-cells builtin hook#1844
janosh wants to merge 4 commits intoj178:masterfrom
janosh:feat/format-ipy-cells-builtin

Conversation

@janosh
Copy link
Copy Markdown
Contributor

@janosh janosh commented Mar 21, 2026

Summary

  • Adds a Rust-native builtin hook format-ipy-cells that formats # %% cell delimiters in VS Code interactive Python notebooks
  • Based on the Python format-ipy-cells hook, reimplemented using a structured line-based parser (parse -> format -> serialize) instead of chained regex substitutions, making it more robust and easier to reason about
  • Includes 21 tests covering parsing, individual formatting rules, merge behavior, idempotency, and async file I/O

Formatting rules

  • Normalize delimiter spacing: #%%, # %% -> # %%
  • Normalize comments: # %%comment, # %% comment -> # %% comment
  • Remove empty cells (no comment and no code)
  • Merge bare delimiters separated by only whitespace into the previous cell
  • Remove leading/trailing blank lines within each cell
  • Ensure exactly two blank lines before each cell delimiter
  • One blank line between a triple-quoted string and the following cell (ruff compatibility)
  • Remove trailing empty cell at end of file
  • Strip trailing whitespace from each line

Usage

repos:
  - repo: builtin
    hooks:
      - id: format-ipy-cells

No Python environment needed -- runs as native Rust code inside prek.

Test plan

  • All 21 unit tests pass (cargo test -p prek format_ipy_cells)
  • Full fixture test validates end-to-end formatting against known good output
  • Idempotency test confirms formatting a clean file produces identical output
  • Mutation testing verified all 6 key code paths are caught by tests
  • Existing prek test suite passes (356 unit tests + all builtin hook integration tests)
  • Manual testing with real-world interactive Python notebooks

@janosh janosh requested a review from j178 as a code owner March 21, 2026 12:25
Add a Rust-native builtin hook that formats `# %%` cell delimiters in
VS Code interactive Python notebooks. Based on the Python
[format-ipy-cells](https://github.com/janosh/format-ipy-cells) hook,
reimplemented using a structured line-based parser instead of chained
regex substitutions.

The hook normalizes cell delimiter spacing, comment formatting, removes
empty cells, ensures consistent blank lines between cells, and handles
module docstring spacing.

Usage:
```yaml
repos:
  - repo: builtin
    hooks:
      - id: format-ipy-cells
```
@janosh janosh force-pushed the feat/format-ipy-cells-builtin branch from 8c76a97 to 6dca7df Compare March 21, 2026 12:26
janosh added 3 commits March 21, 2026 12:27
- Replace `Option<Option<String>>` with `Option<String>` in
  parse_delimiter (clippy::option_option)
- Use method references instead of redundant closures
- Add format-ipy-cells to list_builtins snapshot tests
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 21, 2026

Codecov Report

❌ Patch coverage is 98.08219% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.89%. Comparing base (9328863) to head (2018e5d).

Files with missing lines Patch % Lines
...s/prek/src/hooks/builtin_hooks/format_ipy_cells.rs 98.28% 6 Missing ⚠️
crates/prek/src/hooks/builtin_hooks/mod.rs 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1844      +/-   ##
==========================================
- Coverage   91.96%   91.89%   -0.08%     
==========================================
  Files         101      102       +1     
  Lines       20584    20949     +365     
==========================================
+ Hits        18931    19252     +321     
- Misses       1653     1697      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot bot commented Mar 21, 2026

📦 Cargo Bloat Comparison

Binary size change: +0.40% (24.9 MiB → 25.0 MiB)

Expand for cargo-bloat output

Head Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_39_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_39_0_aes_gcm_decrypt_avx512
 0.3%   0.7%  81.4KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  74.4KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  69.3KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.2%   0.5%  56.9KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  46.4KiB              prek prek::run::{{closure}}
 0.2%   0.3%  42.0KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  32.0KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_39_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_39_0_edwards25519_scalarmuldouble
 0.1%   0.2%  25.8KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.4KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  23.2KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  23.0KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  21.6KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.2%  86.0%  10.3MiB                   And 23479 smaller methods. Use -n N to show more.
47.9% 100.0%  12.0MiB                   .text section size, the file size is 25.0MiB

Base Branch Results

 File  .text     Size             Crate Name
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_39_0_aes_gcm_encrypt_avx512
 1.3%   2.7% 332.0KiB        aws_lc_sys aws_lc_0_39_0_aes_gcm_decrypt_avx512
 0.3%   0.6%  77.6KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  71.9KiB              prek prek::languages::<impl prek::config::Language>::run::{{closure}}::{{closure}}
 0.3%   0.6%  68.9KiB             prek? <prek::cli::Command as clap_builder::derive::Subcommand>::augment_subcommands
 0.2%   0.5%  56.8KiB              prek prek::languages::<impl prek::config::Language>::install::{{closure}}
 0.2%   0.4%  51.0KiB annotate_snippets annotate_snippets::renderer::render::render
 0.2%   0.4%  46.5KiB              prek prek::run::{{closure}}
 0.2%   0.3%  42.2KiB              prek prek::cli::run::run::run::{{closure}}
 0.1%   0.3%  31.8KiB             prek? <prek::cli::RunArgs as clap_builder::derive::Args>::augment_args
 0.1%   0.2%  28.0KiB        aws_lc_sys aws_lc_0_39_0_edwards25519_scalarmuldouble_alt
 0.1%   0.2%  27.8KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  27.5KiB        aws_lc_sys aws_lc_0_39_0_edwards25519_scalarmuldouble
 0.1%   0.2%  26.6KiB             prek? <prek::config::_::<impl serde_core::de::Deserialize for prek::config::Config>::deserialize::__Visitor as serde_core::de::Visitor>::visit_map
 0.1%   0.2%  26.3KiB              prek prek::cli::try_repo::try_repo::{{closure}}
 0.1%   0.2%  24.2KiB               std core::ptr::drop_in_place<prek::languages::<impl prek::config::Language>::install::{{closure}}>
 0.1%   0.2%  23.0KiB              prek prek::hooks::meta_hooks::MetaHooks::run::{{closure}}
 0.1%   0.2%  22.4KiB      serde_saphyr saphyr_parser_bw::scanner::Scanner<T>::fetch_more_tokens
 0.1%   0.2%  22.3KiB         [Unknown] Lp384_montjscalarmul_alt_p384_montjadd
 0.1%   0.2%  21.7KiB              prek prek::workspace::Project::init_hooks::{{closure}}
41.2%  86.0%  10.3MiB                   And 23418 smaller methods. Use -n N to show more.
47.9% 100.0%  11.9MiB                   .text section size, the file size is 24.9MiB

@prek-ci-bot
Copy link
Copy Markdown

prek-ci-bot bot commented Mar 21, 2026

⚡️ Hyperfine Benchmarks

Summary: 0 regressions, 0 improvements above the 10% threshold.

Environment
  • OS: Linux 6.14.0-1017-azure
  • CPU: 4 cores
  • prek version: prek 0.3.6+25 (a0f56e2 2026-03-21)
  • Rust version: rustc 1.94.0 (4a4ef493e 2026-03-02)
  • Hyperfine version: hyperfine 1.20.0
CLI Commands

Benchmarking basic commands in the main repo:

prek --version

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base --version 2.4 ± 0.1 2.3 2.7 1.00
prek-head --version 2.4 ± 0.1 2.3 2.9 1.00 ± 0.06

prek list

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base list 9.7 ± 0.2 9.3 10.1 1.03 ± 0.03
prek-head list 9.4 ± 0.2 8.9 9.9 1.00

prek validate-config .pre-commit-config.yaml

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base validate-config .pre-commit-config.yaml 3.2 ± 0.0 3.1 3.4 1.00
prek-head validate-config .pre-commit-config.yaml 3.3 ± 0.1 3.1 3.4 1.02 ± 0.02

prek sample-config

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base sample-config 2.7 ± 0.1 2.6 2.9 1.00
prek-head sample-config 2.7 ± 0.1 2.6 2.8 1.00 ± 0.03
Cold vs Warm Runs

Comparing first run (cold) vs subsequent runs (warm cache):

prek run --all-files (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 160.9 ± 2.8 156.3 164.5 1.01 ± 0.05
prek-head run --all-files 159.4 ± 7.4 152.6 179.0 1.00

prek run --all-files (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 161.3 ± 2.2 158.1 165.8 1.00
prek-head run --all-files 162.8 ± 4.4 156.0 172.7 1.01 ± 0.03
Full Hook Suite

Running the builtin hook suite on the benchmark workspace:

prek run --all-files (full builtin hook suite)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --all-files 163.7 ± 3.2 158.2 174.6 1.00
prek-head run --all-files 168.6 ± 25.0 159.8 341.0 1.03 ± 0.15
Individual Hook Performance

Benchmarking each hook individually on the test repo:

prek run trailing-whitespace --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run trailing-whitespace --all-files 23.3 ± 0.5 22.4 24.7 1.01 ± 0.03
prek-head run trailing-whitespace --all-files 22.9 ± 0.6 21.8 24.3 1.00

prek run end-of-file-fixer --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run end-of-file-fixer --all-files 29.7 ± 2.7 26.3 38.6 1.00
prek-head run end-of-file-fixer --all-files 30.3 ± 2.0 27.2 34.2 1.02 ± 0.11

prek run check-json --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-json --all-files 13.2 ± 0.4 12.5 14.0 1.01 ± 0.04
prek-head run check-json --all-files 13.1 ± 0.3 12.6 13.7 1.00

prek run check-yaml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-yaml --all-files 12.6 ± 0.6 12.2 15.4 1.00
prek-head run check-yaml --all-files 12.8 ± 0.3 12.4 13.5 1.02 ± 0.05

prek run check-toml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-toml --all-files 13.0 ± 0.3 12.4 13.5 1.00
prek-head run check-toml --all-files 13.1 ± 0.4 12.5 13.9 1.01 ± 0.04

prek run check-xml --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-xml --all-files 13.0 ± 0.3 12.4 13.5 1.00
prek-head run check-xml --all-files 13.0 ± 0.4 12.3 13.8 1.01 ± 0.04

prek run detect-private-key --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run detect-private-key --all-files 19.8 ± 1.4 17.5 23.5 1.00 ± 0.09
prek-head run detect-private-key --all-files 19.8 ± 1.1 17.6 22.1 1.00

prek run fix-byte-order-marker --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run fix-byte-order-marker --all-files 25.1 ± 1.9 21.8 27.4 1.00
prek-head run fix-byte-order-marker --all-files 25.1 ± 1.9 22.2 28.1 1.00 ± 0.11
Installation Performance

Benchmarking hook installation (fast path hooks skip Python setup):

prek install-hooks (cold - no cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 5.3 ± 0.1 5.2 5.4 1.01 ± 0.02
prek-head install-hooks 5.2 ± 0.0 5.2 5.3 1.00

prek install-hooks (warm - with cache)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base install-hooks 5.1 ± 0.1 5.0 5.2 1.00
prek-head install-hooks 5.2 ± 0.0 5.1 5.2 1.02 ± 0.02
File Filtering/Scoping Performance

Testing different file selection modes:

prek run (staged files only)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run 55.0 ± 1.3 53.2 58.3 1.00
prek-head run 55.8 ± 1.5 53.5 59.8 1.02 ± 0.04

prek run --files '*.json' (specific file type)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --files '*.json' 9.6 ± 0.2 9.3 10.0 1.00
prek-head run --files '*.json' 9.9 ± 0.2 9.6 10.3 1.03 ± 0.03
Workspace Discovery & Initialization

Benchmarking hook discovery and initialization overhead:

prek run --dry-run --all-files (measures init overhead)

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run --dry-run --all-files 15.0 ± 0.2 14.5 15.5 1.00
prek-head run --dry-run --all-files 15.1 ± 0.3 14.5 15.6 1.00 ± 0.02
Meta Hooks Performance

Benchmarking meta hooks separately:

prek run check-hooks-apply --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-hooks-apply --all-files 14.9 ± 0.4 13.4 15.2 1.10 ± 0.03
prek-head run check-hooks-apply --all-files 13.5 ± 0.2 13.2 13.8 1.00

prek run check-useless-excludes --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run check-useless-excludes --all-files 13.2 ± 0.2 13.0 13.5 1.00
prek-head run check-useless-excludes --all-files 13.3 ± 0.2 13.1 13.9 1.01 ± 0.02

prek run identity --all-files

Command Mean [ms] Min [ms] Max [ms] Relative
prek-base run identity --all-files 11.7 ± 0.2 11.5 12.1 1.00
prek-head run identity --all-files 11.8 ± 0.3 11.2 12.4 1.01 ± 0.03

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Rust-native builtin hook (format-ipy-cells) to prek for formatting # %% cell delimiters used in VS Code interactive Python notebooks, and wires it into the builtin hook registry, schema, docs, and builtin listing tests.

Changes:

  • Implement format-ipy-cells builtin hook with a structured parse → format → serialize pipeline plus unit/async tests.
  • Register the hook in builtin hook dispatch/metadata so it shows up in list-builtins and uses types: [python].
  • Update user-facing artifacts (JSON schema + docs) to include the new builtin hook.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
prek.schema.json Adds format-ipy-cells to the builtin hook id enum in the schema.
docs/builtin.md Documents the new builtin hook and its formatting rules.
crates/prek/tests/list_builtins.rs Updates builtin-list snapshots/expectations to include the new hook.
crates/prek/src/hooks/builtin_hooks/mod.rs Registers the new builtin hook enum variant and dispatch/metadata.
crates/prek/src/hooks/builtin_hooks/format_ipy_cells.rs New hook implementation + tests.

Comment on lines +399 to +401
- Remove leading blank lines within each cell.
- Ensure exactly two blank lines before each cell delimiter.
- Ensure one blank line between a module docstring and the first cell.
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say “Ensure one blank line between a module docstring and the first cell”, but the implementation applies the one-blank-line rule after any preceding line ending with triple quotes and before any cell delimiter. Also, the implementation trims both leading and trailing blank lines within each cell, while the docs only mention removing leading blanks. Please update this section to accurately reflect the behavior.

Suggested change
- Remove leading blank lines within each cell.
- Ensure exactly two blank lines before each cell delimiter.
- Ensure one blank line between a module docstring and the first cell.
- Remove leading and trailing blank lines within each cell.
- Ensure exactly two blank lines before each cell delimiter by default.
- If the line immediately before a cell delimiter ends with `"""`, ensure exactly one blank line between that line and the cell delimiter.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +67
for raw_line in text.lines() {
let line = raw_line.trim_end().to_string();

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse()/serialize() normalizes all line endings to \n (e.g., CRLF input becomes LF) because it uses text.lines() and then re-joins with \n. This will create large diffs on Windows and makes this hook implicitly act as a line-ending fixer. Consider preserving the original line endings (detect from input and re-emit), or operating on bytes like other fixers do (e.g., fix_trailing_whitespace).

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +176
// Emit cell content
for line in &cell.lines {
output.push_str(line);
output.push('\n');
}
}

output
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialize() always appends a trailing \n after the last emitted line, so a file that originally had no newline at EOF will be rewritten even if it’s otherwise “clean”. If this is intended, it should be documented and ideally covered by a test; otherwise, preserve whether the input ended with a final newline when serializing.

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +89
while lines.first().is_some_and(|line| line.trim().is_empty()) {
lines.remove(0);
}
while lines.last().is_some_and(|line| line.trim().is_empty()) {
lines.pop();
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trim_blank_lines() repeatedly calls lines.remove(0) in a loop, which is O(n²) for many leading blank lines. This can become noticeably slow on large files. Consider finding the first/last non-blank indices and using drain(..start) / truncate(end) (or slicing) to trim in linear time.

Suggested change
while lines.first().is_some_and(|line| line.trim().is_empty()) {
lines.remove(0);
}
while lines.last().is_some_and(|line| line.trim().is_empty()) {
lines.pop();
// Find the first non-blank line; if there isn't one, the whole vector is blank.
let first_nonblank = match lines
.iter()
.position(|line| !line.trim().is_empty())
{
Some(idx) => idx,
None => {
lines.clear();
return;
}
};
// Find the last non-blank line; fall back to `first_nonblank` to avoid panics.
let last_nonblank = lines
.iter()
.rposition(|line| !line.trim().is_empty())
.unwrap_or(first_nonblank);
// Remove trailing blank lines in a single truncate.
if last_nonblank + 1 < lines.len() {
lines.truncate(last_nonblank + 1);
}
// Remove leading blank lines in a single drain.
if first_nonblank > 0 {
lines.drain(0..first_nonblank);

Copilot uses AI. Check for mistakes.
@j178
Copy link
Copy Markdown
Owner

j178 commented Mar 21, 2026

Thanks for putting this together, the implementation and test coverage look solid. Before merging, I’d like to understand the expected adoption a bit better.

Adding a new builtin hook means committing to maintaining it long term in prek, so I want to make sure this is addressing a broadly used workflow rather than a niche case.

Do you have a sense of how widely this hook is used, or any signals that it has meaningful adoption beyond a smaller set of users?

@janosh
Copy link
Copy Markdown
Contributor Author

janosh commented Mar 21, 2026

there's next to no existing user base for this hook. this cell delimeter syntax is quite widely used though thanks to long-standing VS Code support https://code.visualstudio.com/docs/python/jupyter-support-py#_jupyter-code-cells

mostly used by people who do Python data analysis but want something lean to track in version control (i.e. not .ipynb) and easy to run in CI to ensure demo notebooks don't break over time as the code being demoed evolves (example). these are frequent use cases so i think this hook could garner a small to medium sized user base if offered in prek.

re maintenance effort, my experience with it has been that it's very low (see commit history in https://github.com/janosh/format-ipy-cells). that said, i understand if this is too niche. feel free to close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants