Skip to content

Conversation

@jlevy
Copy link
Owner

@jlevy jlevy commented Jan 15, 2026

Summary

  • Fix major performance regression (29x slowdown) introduced in tag preprocessing
  • Add benchmarking tools (make benchmark, make profile) for performance testing
  • Simplify tag handling by removing the --tags CLI option (atomic mode is now always-on)
  • Refactor pattern matching using single-regex extraction with AtomicPattern dataclass
  • Consolidate all delimiter constants into the AtomicPattern dataclass

Performance

Before fix: ~2900ms (29x regression)
After fix: ~100ms (2.2x faster than v0.6.0 baseline of ~220ms)

Key Changes

  1. Performance fix: Moved preprocess_tag_block_spacing before Markdown parsing to avoid repeated traversals

  2. Benchmarking tools: New devtools/benchmark.py with --compare, --profile, and --semantic options

  3. API simplification: Removed --tags option and TagWrapping enum - atomic tag handling is now the default and only mode

  4. Code consolidation:

    • New atomic_patterns.py module with AtomicPattern dataclass
    • Each pattern contains its regex, delimiter strings, and escaped regex versions
    • Removed ~500 lines of dead code (coalescing patterns, placeholder extraction)

Test plan

  • All 176 tests pass
  • Linting passes with 0 errors
  • Benchmark shows 54% faster than v0.6.0
  • No changes to expected test output

🤖 Generated with Claude Code

jlevy and others added 5 commits January 14, 2026 22:40
Group coalescing patterns by their expected first character ({, <, [, backtick)
so we only check relevant patterns instead of all 891 pattern groups.

Result: 19% faster (339ms vs 419ms in profile), now 54% faster than v0.6.0

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move tag-specific pattern categorization logic from text_wrapping.py
to tag_handling.py. Add get_tag_coalescing_patterns_by_start() that
returns patterns already grouped by expected start character.

This improves separation of concerns: tag_handling knows which patterns
match which delimiters, while text_wrapping just uses the grouped data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace the complex coalescing-based word splitting with a simpler
single-pass regex extraction approach:

1. Define atomic patterns (code spans, links, tags) using frozen
   AtomicPattern dataclasses for clarity and maintainability

2. Use a single compiled regex that matches all atomic constructs
   in one pass, extracting them as placeholders before word splitting

3. Split on whitespace, then restore original constructs

This approach is simpler (fewer lines), faster (one regex pass vs
per-word pattern matching), and keeps Markdown links as atomic units
which improves wrapping behavior.

Performance: 57% faster than v0.6.0 (96ms vs 225ms median).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove `--tags` CLI option and `TagWrapping` enum (atomic mode is now always-on)
- Simplify `AtomicPattern` dataclass to just `name` and `pattern` fields
- Define delimiter constants directly as module-level variables
- Remove dead code: coalescing patterns, placeholder extraction from tag_handling
- Remove `tags` parameter from all API functions (reformat_text, fill_markdown, etc.)
- Move completed atomic-tag-wrapping specs to done/

The atomic tag wrapping behavior is now the only mode - template tags, code
spans, and markdown links are always kept together as indivisible tokens
during line wrapping.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove module-level delimiter constants (JINJA_TAG_OPEN, etc.)
- Add required `open_delim`, `close_delim`, `open_re`, `close_re` fields to AtomicPattern
- Update tag_handling.py to use pattern properties directly (e.g., SINGLE_JINJA_TAG.open_delim)
- Remove unnecessary `or ""` guards since fields are now always strings

Each pattern now contains all its information in one place, making the code
cleaner and easier to maintain.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jlevy jlevy merged commit 2557f43 into main Jan 15, 2026
5 checks passed
@jlevy jlevy deleted the feature/benchmark-tooling branch January 15, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants