Fix performance regression and simplify tag handling #23

jlevy · 2026-01-15T07:33:01Z

Summary

Fix major performance regression (29x slowdown) introduced in tag preprocessing
Add benchmarking tools (make benchmark, make profile) for performance testing
Simplify tag handling by removing the --tags CLI option (atomic mode is now always-on)
Refactor pattern matching using single-regex extraction with AtomicPattern dataclass
Consolidate all delimiter constants into the AtomicPattern dataclass

Performance

Before fix: ~2900ms (29x regression)
After fix: ~100ms (2.2x faster than v0.6.0 baseline of ~220ms)

Key Changes

Performance fix: Moved preprocess_tag_block_spacing before Markdown parsing to avoid repeated traversals
Benchmarking tools: New devtools/benchmark.py with --compare, --profile, and --semantic options
API simplification: Removed --tags option and TagWrapping enum - atomic tag handling is now the default and only mode
Code consolidation:
- New atomic_patterns.py module with AtomicPattern dataclass
- Each pattern contains its regex, delimiter strings, and escaped regex versions
- Removed ~500 lines of dead code (coalescing patterns, placeholder extraction)

Test plan

All 176 tests pass
Linting passes with 0 errors
Benchmark shows 54% faster than v0.6.0
No changes to expected test output

🤖 Generated with Claude Code

Group coalescing patterns by their expected first character ({, <, [, backtick) so we only check relevant patterns instead of all 891 pattern groups. Result: 19% faster (339ms vs 419ms in profile), now 54% faster than v0.6.0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move tag-specific pattern categorization logic from text_wrapping.py to tag_handling.py. Add get_tag_coalescing_patterns_by_start() that returns patterns already grouped by expected start character. This improves separation of concerns: tag_handling knows which patterns match which delimiters, while text_wrapping just uses the grouped data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace the complex coalescing-based word splitting with a simpler single-pass regex extraction approach: 1. Define atomic patterns (code spans, links, tags) using frozen AtomicPattern dataclasses for clarity and maintainability 2. Use a single compiled regex that matches all atomic constructs in one pass, extracting them as placeholders before word splitting 3. Split on whitespace, then restore original constructs This approach is simpler (fewer lines), faster (one regex pass vs per-word pattern matching), and keeps Markdown links as atomic units which improves wrapping behavior. Performance: 57% faster than v0.6.0 (96ms vs 225ms median). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove `--tags` CLI option and `TagWrapping` enum (atomic mode is now always-on) - Simplify `AtomicPattern` dataclass to just `name` and `pattern` fields - Define delimiter constants directly as module-level variables - Remove dead code: coalescing patterns, placeholder extraction from tag_handling - Remove `tags` parameter from all API functions (reformat_text, fill_markdown, etc.) - Move completed atomic-tag-wrapping specs to done/ The atomic tag wrapping behavior is now the only mode - template tags, code spans, and markdown links are always kept together as indivisible tokens during line wrapping. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove module-level delimiter constants (JINJA_TAG_OPEN, etc.) - Add required `open_delim`, `close_delim`, `open_re`, `close_re` fields to AtomicPattern - Update tag_handling.py to use pattern properties directly (e.g., SINGLE_JINJA_TAG.open_delim) - Remove unnecessary `or ""` guards since fields are now always strings Each pattern now contains all its information in one place, making the code cleaner and easier to maintain. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

jlevy and others added 5 commits January 14, 2026 22:40

jlevy merged commit 2557f43 into main Jan 15, 2026
5 checks passed

jlevy deleted the feature/benchmark-tooling branch January 15, 2026 07:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance regression and simplify tag handling #23

Fix performance regression and simplify tag handling #23

Uh oh!

jlevy commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix performance regression and simplify tag handling #23

Fix performance regression and simplify tag handling #23

Uh oh!

Conversation

jlevy commented Jan 15, 2026

Summary

Performance

Key Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants