Conversation
Add a Rust-native builtin hook that formats `# %%` cell delimiters in VS Code interactive Python notebooks. Based on the Python [format-ipy-cells](https://github.com/janosh/format-ipy-cells) hook, reimplemented using a structured line-based parser instead of chained regex substitutions. The hook normalizes cell delimiter spacing, comment formatting, removes empty cells, ensures consistent blank lines between cells, and handles module docstring spacing. Usage: ```yaml repos: - repo: builtin hooks: - id: format-ipy-cells ```
8c76a97 to
6dca7df
Compare
- Replace `Option<Option<String>>` with `Option<String>` in parse_delimiter (clippy::option_option) - Use method references instead of redundant closures - Add format-ipy-cells to list_builtins snapshot tests
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1844 +/- ##
==========================================
- Coverage 91.96% 91.89% -0.08%
==========================================
Files 101 102 +1
Lines 20584 20949 +365
==========================================
+ Hits 18931 19252 +321
- Misses 1653 1697 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
📦 Cargo Bloat ComparisonBinary size change: +0.40% (24.9 MiB → 25.0 MiB) Expand for cargo-bloat outputHead Branch ResultsBase Branch Results |
⚡️ Hyperfine BenchmarksSummary: 0 regressions, 0 improvements above the 10% threshold. Environment
CLI CommandsBenchmarking basic commands in the main repo:
|
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base --version |
2.4 ± 0.1 | 2.3 | 2.7 | 1.00 |
prek-head --version |
2.4 ± 0.1 | 2.3 | 2.9 | 1.00 ± 0.06 |
prek list
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base list |
9.7 ± 0.2 | 9.3 | 10.1 | 1.03 ± 0.03 |
prek-head list |
9.4 ± 0.2 | 8.9 | 9.9 | 1.00 |
prek validate-config .pre-commit-config.yaml
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base validate-config .pre-commit-config.yaml |
3.2 ± 0.0 | 3.1 | 3.4 | 1.00 |
prek-head validate-config .pre-commit-config.yaml |
3.3 ± 0.1 | 3.1 | 3.4 | 1.02 ± 0.02 |
prek sample-config
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base sample-config |
2.7 ± 0.1 | 2.6 | 2.9 | 1.00 |
prek-head sample-config |
2.7 ± 0.1 | 2.6 | 2.8 | 1.00 ± 0.03 |
Cold vs Warm Runs
Comparing first run (cold) vs subsequent runs (warm cache):
prek run --all-files (cold - no cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
160.9 ± 2.8 | 156.3 | 164.5 | 1.01 ± 0.05 |
prek-head run --all-files |
159.4 ± 7.4 | 152.6 | 179.0 | 1.00 |
prek run --all-files (warm - with cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
161.3 ± 2.2 | 158.1 | 165.8 | 1.00 |
prek-head run --all-files |
162.8 ± 4.4 | 156.0 | 172.7 | 1.01 ± 0.03 |
Full Hook Suite
Running the builtin hook suite on the benchmark workspace:
prek run --all-files (full builtin hook suite)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --all-files |
163.7 ± 3.2 | 158.2 | 174.6 | 1.00 |
prek-head run --all-files |
168.6 ± 25.0 | 159.8 | 341.0 | 1.03 ± 0.15 |
Individual Hook Performance
Benchmarking each hook individually on the test repo:
prek run trailing-whitespace --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run trailing-whitespace --all-files |
23.3 ± 0.5 | 22.4 | 24.7 | 1.01 ± 0.03 |
prek-head run trailing-whitespace --all-files |
22.9 ± 0.6 | 21.8 | 24.3 | 1.00 |
prek run end-of-file-fixer --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run end-of-file-fixer --all-files |
29.7 ± 2.7 | 26.3 | 38.6 | 1.00 |
prek-head run end-of-file-fixer --all-files |
30.3 ± 2.0 | 27.2 | 34.2 | 1.02 ± 0.11 |
prek run check-json --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-json --all-files |
13.2 ± 0.4 | 12.5 | 14.0 | 1.01 ± 0.04 |
prek-head run check-json --all-files |
13.1 ± 0.3 | 12.6 | 13.7 | 1.00 |
prek run check-yaml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-yaml --all-files |
12.6 ± 0.6 | 12.2 | 15.4 | 1.00 |
prek-head run check-yaml --all-files |
12.8 ± 0.3 | 12.4 | 13.5 | 1.02 ± 0.05 |
prek run check-toml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-toml --all-files |
13.0 ± 0.3 | 12.4 | 13.5 | 1.00 |
prek-head run check-toml --all-files |
13.1 ± 0.4 | 12.5 | 13.9 | 1.01 ± 0.04 |
prek run check-xml --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-xml --all-files |
13.0 ± 0.3 | 12.4 | 13.5 | 1.00 |
prek-head run check-xml --all-files |
13.0 ± 0.4 | 12.3 | 13.8 | 1.01 ± 0.04 |
prek run detect-private-key --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run detect-private-key --all-files |
19.8 ± 1.4 | 17.5 | 23.5 | 1.00 ± 0.09 |
prek-head run detect-private-key --all-files |
19.8 ± 1.1 | 17.6 | 22.1 | 1.00 |
prek run fix-byte-order-marker --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run fix-byte-order-marker --all-files |
25.1 ± 1.9 | 21.8 | 27.4 | 1.00 |
prek-head run fix-byte-order-marker --all-files |
25.1 ± 1.9 | 22.2 | 28.1 | 1.00 ± 0.11 |
Installation Performance
Benchmarking hook installation (fast path hooks skip Python setup):
prek install-hooks (cold - no cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base install-hooks |
5.3 ± 0.1 | 5.2 | 5.4 | 1.01 ± 0.02 |
prek-head install-hooks |
5.2 ± 0.0 | 5.2 | 5.3 | 1.00 |
prek install-hooks (warm - with cache)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base install-hooks |
5.1 ± 0.1 | 5.0 | 5.2 | 1.00 |
prek-head install-hooks |
5.2 ± 0.0 | 5.1 | 5.2 | 1.02 ± 0.02 |
File Filtering/Scoping Performance
Testing different file selection modes:
prek run (staged files only)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run |
55.0 ± 1.3 | 53.2 | 58.3 | 1.00 |
prek-head run |
55.8 ± 1.5 | 53.5 | 59.8 | 1.02 ± 0.04 |
prek run --files '*.json' (specific file type)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --files '*.json' |
9.6 ± 0.2 | 9.3 | 10.0 | 1.00 |
prek-head run --files '*.json' |
9.9 ± 0.2 | 9.6 | 10.3 | 1.03 ± 0.03 |
Workspace Discovery & Initialization
Benchmarking hook discovery and initialization overhead:
prek run --dry-run --all-files (measures init overhead)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run --dry-run --all-files |
15.0 ± 0.2 | 14.5 | 15.5 | 1.00 |
prek-head run --dry-run --all-files |
15.1 ± 0.3 | 14.5 | 15.6 | 1.00 ± 0.02 |
Meta Hooks Performance
Benchmarking meta hooks separately:
prek run check-hooks-apply --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-hooks-apply --all-files |
14.9 ± 0.4 | 13.4 | 15.2 | 1.10 ± 0.03 |
prek-head run check-hooks-apply --all-files |
13.5 ± 0.2 | 13.2 | 13.8 | 1.00 |
prek run check-useless-excludes --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run check-useless-excludes --all-files |
13.2 ± 0.2 | 13.0 | 13.5 | 1.00 |
prek-head run check-useless-excludes --all-files |
13.3 ± 0.2 | 13.1 | 13.9 | 1.01 ± 0.02 |
prek run identity --all-files
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
prek-base run identity --all-files |
11.7 ± 0.2 | 11.5 | 12.1 | 1.00 |
prek-head run identity --all-files |
11.8 ± 0.3 | 11.2 | 12.4 | 1.01 ± 0.03 |
There was a problem hiding this comment.
Pull request overview
Adds a new Rust-native builtin hook (format-ipy-cells) to prek for formatting # %% cell delimiters used in VS Code interactive Python notebooks, and wires it into the builtin hook registry, schema, docs, and builtin listing tests.
Changes:
- Implement
format-ipy-cellsbuiltin hook with a structured parse → format → serialize pipeline plus unit/async tests. - Register the hook in builtin hook dispatch/metadata so it shows up in
list-builtinsand usestypes: [python]. - Update user-facing artifacts (JSON schema + docs) to include the new builtin hook.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| prek.schema.json | Adds format-ipy-cells to the builtin hook id enum in the schema. |
| docs/builtin.md | Documents the new builtin hook and its formatting rules. |
| crates/prek/tests/list_builtins.rs | Updates builtin-list snapshots/expectations to include the new hook. |
| crates/prek/src/hooks/builtin_hooks/mod.rs | Registers the new builtin hook enum variant and dispatch/metadata. |
| crates/prek/src/hooks/builtin_hooks/format_ipy_cells.rs | New hook implementation + tests. |
| - Remove leading blank lines within each cell. | ||
| - Ensure exactly two blank lines before each cell delimiter. | ||
| - Ensure one blank line between a module docstring and the first cell. |
There was a problem hiding this comment.
The docs say “Ensure one blank line between a module docstring and the first cell”, but the implementation applies the one-blank-line rule after any preceding line ending with triple quotes and before any cell delimiter. Also, the implementation trims both leading and trailing blank lines within each cell, while the docs only mention removing leading blanks. Please update this section to accurately reflect the behavior.
| - Remove leading blank lines within each cell. | |
| - Ensure exactly two blank lines before each cell delimiter. | |
| - Ensure one blank line between a module docstring and the first cell. | |
| - Remove leading and trailing blank lines within each cell. | |
| - Ensure exactly two blank lines before each cell delimiter by default. | |
| - If the line immediately before a cell delimiter ends with `"""`, ensure exactly one blank line between that line and the cell delimiter. |
| for raw_line in text.lines() { | ||
| let line = raw_line.trim_end().to_string(); | ||
|
|
There was a problem hiding this comment.
parse()/serialize() normalizes all line endings to \n (e.g., CRLF input becomes LF) because it uses text.lines() and then re-joins with \n. This will create large diffs on Windows and makes this hook implicitly act as a line-ending fixer. Consider preserving the original line endings (detect from input and re-emit), or operating on bytes like other fixers do (e.g., fix_trailing_whitespace).
| // Emit cell content | ||
| for line in &cell.lines { | ||
| output.push_str(line); | ||
| output.push('\n'); | ||
| } | ||
| } | ||
|
|
||
| output | ||
| } |
There was a problem hiding this comment.
serialize() always appends a trailing \n after the last emitted line, so a file that originally had no newline at EOF will be rewritten even if it’s otherwise “clean”. If this is intended, it should be documented and ideally covered by a test; otherwise, preserve whether the input ended with a final newline when serializing.
| while lines.first().is_some_and(|line| line.trim().is_empty()) { | ||
| lines.remove(0); | ||
| } | ||
| while lines.last().is_some_and(|line| line.trim().is_empty()) { | ||
| lines.pop(); |
There was a problem hiding this comment.
trim_blank_lines() repeatedly calls lines.remove(0) in a loop, which is O(n²) for many leading blank lines. This can become noticeably slow on large files. Consider finding the first/last non-blank indices and using drain(..start) / truncate(end) (or slicing) to trim in linear time.
| while lines.first().is_some_and(|line| line.trim().is_empty()) { | |
| lines.remove(0); | |
| } | |
| while lines.last().is_some_and(|line| line.trim().is_empty()) { | |
| lines.pop(); | |
| // Find the first non-blank line; if there isn't one, the whole vector is blank. | |
| let first_nonblank = match lines | |
| .iter() | |
| .position(|line| !line.trim().is_empty()) | |
| { | |
| Some(idx) => idx, | |
| None => { | |
| lines.clear(); | |
| return; | |
| } | |
| }; | |
| // Find the last non-blank line; fall back to `first_nonblank` to avoid panics. | |
| let last_nonblank = lines | |
| .iter() | |
| .rposition(|line| !line.trim().is_empty()) | |
| .unwrap_or(first_nonblank); | |
| // Remove trailing blank lines in a single truncate. | |
| if last_nonblank + 1 < lines.len() { | |
| lines.truncate(last_nonblank + 1); | |
| } | |
| // Remove leading blank lines in a single drain. | |
| if first_nonblank > 0 { | |
| lines.drain(0..first_nonblank); |
|
Thanks for putting this together, the implementation and test coverage look solid. Before merging, I’d like to understand the expected adoption a bit better. Adding a new builtin hook means committing to maintaining it long term in prek, so I want to make sure this is addressing a broadly used workflow rather than a niche case. Do you have a sense of how widely this hook is used, or any signals that it has meaningful adoption beyond a smaller set of users? |
|
there's next to no existing user base for this hook. this cell delimeter syntax is quite widely used though thanks to long-standing VS Code support https://code.visualstudio.com/docs/python/jupyter-support-py#_jupyter-code-cells mostly used by people who do Python data analysis but want something lean to track in version control (i.e. not re maintenance effort, my experience with it has been that it's very low (see commit history in https://github.com/janosh/format-ipy-cells). that said, i understand if this is too niche. feel free to close |
Summary
format-ipy-cellsthat formats# %%cell delimiters in VS Code interactive Python notebooksFormatting rules
#%%,# %%-># %%# %%comment,# %% comment-># %% commentUsage
No Python environment needed -- runs as native Rust code inside prek.
Test plan
cargo test -p prek format_ipy_cells)