Releases: coregx/coregex
v0.12.3: Cross-product literal expansion (14x speedup on regexdna)
Cross-Product Literal Expansion for Char Classes
Patterns with small char classes in the middle of concatenations (e.g., ag[act]gtaaa|tttac[agt]ct) were routed to UseDFA (pure lazy DFA scan) because the char class broke literal extraction — 120x slower than Rust, 1.3x slower than Go stdlib.
The literal extractor now computes the cross-product through small char classes (≤10 chars), producing full-length discriminating literals for the Teddy SIMD prefilter.
Results (6MB DNA input, AMD EPYC)
| Pattern | Before (v0.12.2) | After (v0.12.3) | Speedup |
|---|---|---|---|
ag[act]gtaaa|tttac[agt]ct |
463ms (UseDFA) | 32ms (UseTeddy) | 14x |
agg[act]taaa|ttta[agt]cct |
455ms (UseDFA) | 33ms (UseTeddy) | 14x |
aggg[acg]aaa|ttt[cgt]ccct |
457ms (UseDFA) | 32ms (UseTeddy) | 14x |
agggt[cgt]aa|tt[acg]accct |
466ms (UseDFA) | 32ms (UseTeddy) | 14x |
All 9 regexdna patterns now use UseTeddy — 10-20x faster than stdlib across the board.
Safety
- Three-layer overflow protection (CrossProductLimit=250, MaxLiterals=64, truncate-to-4-bytes)
- FoldCase guard prevents case-insensitive matching bugs
- DNA correctness: byte-for-byte match parity with Go stdlib across 1KB/64KB/1MB
- Zero regressions on all existing benchmarks
Reported by @kostya via regexdna benchmark.
v0.12.2: fix alternation patterns misrouted to ReverseSuffixSet
Fixed
- Alternation patterns misrouted to ReverseSuffixSet (Issue #116) — Patterns like
[cgt]gggtaaa|tttaccc[acg](alternation without.*prefix) were incorrectly routed to UseReverseSuffixSet strategy, producing wrong match boundaries. Fix: addedisSafeForReverseSuffixguard. matchStartZerooptimization too aggressive — Restricted to.*prefix only viahasDotStarPrefix(). Was previously enabled for all unanchored patterns, causing wrong match start for patterns like[^\s]+\.txt.
Fixes #116
v0.12.1: Bidirectional DFA fallback, bounded repetitions fix
Performance
- DFA bidirectional fallback for BoundedBacktracker — When BoundedBacktracker can't handle large inputs (exceeds 32M entry limit), use forward DFA + reverse DFA instead of PikeVM. O(n) total vs PikeVM's O(n×states).
(\w{2,8})+on 6MB: 654ms → 184ms (3.6x vs stdlib). - Digit-run skip optimization — For
\d+-leading patterns (IP addresses, version numbers), skip entire digit run on DFA failure instead of advancing one byte at a time.
Bug Fixes
- Bounded repetitions blocked ReverseSuffix strategy (#115) —
isSafeForReverseSuffixdidn't recognizeOpRepeat{min>=1}as a wildcard, blocking UseReverseSuffix for patterns with bounded repetitions. Fix: 2500ms → 0.5ms (5000x) on 100KB no-match. - CompositeSequenceDFA overmatching for bounded patterns — Bare character classes like
\w(maxMatch=1) were treated as unbounded by the DFA.\w\won "000" returned "000" instead of "00". - AVX2 Teddy assembly correctness (#74) — Fixed
teddySlimAVX2_2returning position -1 for valid candidates in short haystacks.
Benchmarks (regex-bench, AMD EPYC, 6MB input)
| Pattern | Go stdlib | coregex | Rust regex | vs stdlib |
|---|---|---|---|---|
| inner_literal | 231 ms | 0.25 ms | 0.31 ms | 926x |
| suffix | 234 ms | 0.89 ms | 1.09 ms | 263x |
| ip | 507 ms | 2.16 ms | 12.05 ms | 235x |
| char_class | 560 ms | 41 ms | 50 ms | 13.6x |
| word_repeat | 654 ms | 184 ms | 49 ms | 3.6x |
Extreme benchmarks (6MB no-match): ip 2542x, suffix 1945x, phone 863x, inner 598x vs stdlib.
v0.12.0: Rust-inspired optimizations
Performance
- Anti-quadratic guard for reverse suffix/inner/suffix-set searches — prevents O(n²) degradation on high false-positive suffix workloads, falls back to PikeVM when quadratic detected
- Lazy DFA 4x loop unrolling — process 4 state transitions per inner loop iteration, check special states between batches
- Prefilter
IsFast()gate — skip reverse search optimizations when fast SIMD-backed prefix prefilter already exists - DFA cache clear & continue — on cache overflow, clear and fall back to PikeVM for current search instead of permanently disabling DFA
Fixed
- OnePass DFA capture limit — tighten from 17 to 16 capture groups (
uint32slot mask = 32 bits)
Benchmark (AMD EPYC, regex-bench)
| Pattern | coregex | vs stdlib | vs Rust |
|---|---|---|---|
| suffix | 0.91ms | 257x | 1.4x faster |
| 0.70ms | 383x | 1.9x faster | |
| ip | 2.19ms | 225x | 5.5x faster |
| uri | 0.76ms | 340x | 1.2x faster |
| multiline_php | 0.60ms | 171x | 1.2x faster |
| anchored_php | 0.03ms | ~1x | 12.0x faster |
v0.11.9: Fix missing first-byte prefilter in FindAll
Fixed
- Missing first-byte prefilter in FindAll state-reusing path (#107)
findIndicesBoundedBacktrackerAtWithStatewas missinganchoredFirstBytesO(1) check- Pattern
^/.*[\w-]+\.php(without$) took 377ms instead of 40µs on 6MB input - Fix: 377ms → 40µs (9000x improvement for non-matching anchored patterns)
Full Changelog
v0.11.8: Fix UseAnchoredLiteral regression
Fixed
- Critical regression in UseAnchoredLiteral strategy (#107)
FindIndices*andfindIndicesAtWithStatewere missingUseAnchoredLiteralcase- Pattern
^/.*[\w-]+\.php$fell through to slow NFA path - Regression: 0.01ms → 408ms (40,000x slower)
- Fix: 408ms → 0.5ms (O(1) anchored literal matching restored)
Full Changelog
v0.11.7: FindAll optimization - 1.08x faster than stdlib
Fixed
FindAll now uses optimized state-reusing path
- FindAll was using slow per-match loop instead of optimized findAllIndicesStreaming
- Results for
(\w{2,8})+on 6MB: 2179ms → 834ms (2.6x faster) - Now 1.08x faster than stdlib (was 2.4x slower in regex-bench)
Full Changelog
See CHANGELOG.md
v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib
Performance
Major PikeVM optimization achieving 1.68x speedup over stdlib for large inputs (was 2.2x slower).
Key Changes
- Windowed BoundedBacktracker (V12): Search in 914KB windows before PikeVM fallback
- SlotTable architecture: Rust-style per-state slot storage
- Dynamic slot sizing: 0 (IsMatch), 2 (Find), full (Captures)
- Lightweight searchThread: 16 bytes (was 40+ bytes)
Benchmark Results
Pattern (\w{2,8})+ vs stdlib:
| Size | Speedup |
|---|---|
| 10KB | 1.68x faster |
| 50KB | 1.88x faster |
| 100KB | 2.04x faster |
| 1MB | 1.67x faster |
| 6MB | 1.68x faster |
6MB improvement: 1900ms → 628ms (3x faster)
Full Changelog
See CHANGELOG.md
v0.11.5: Fix checkHasWordBoundary catastrophic slowdown
Summary
Fixes catastrophic performance regression in patterns with \w{n,m} quantifiers (Issue #105).
Before: 3 minutes 22 seconds on 79KB input (7,000,000x slower than stdlib)
After: 3.6 µs on 79KB input (8.6x faster than stdlib)
Changes
Fixed
- checkHasWordBoundary catastrophic slowdown (Issue #105)
- Root cause: O(N*M) complexity from scanning all NFA states per byte
- Fix: Use
NewBuilderWithWordBoundary(), addhasWordBoundaryguards, anchored prefilter verification
Performance
- DFA state lookup: map → slice — 42% CPU time eliminated
- Literal extraction from capture/repeat groups — better prefilters
=($\w...){2}now extracts=$(2 bytes) instead of just=
Benchmarks (79KB input)
| Stage | Time | vs stdlib |
|---|---|---|
| Before fix | 3m 22s | 7,000,000x slower |
| After fix | 3.6 µs | 8.6x faster |
Credits
@danslo for root cause analysis and fix suggestions
Full Changelog: v0.11.4...v0.11.5
v0.11.4: FindAll multiline optimization
Fixed
- FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
FindIndicesAt()was missing dispatch forUseMultilineReverseSuffixIsMatch/Findwere fast (1µs), butFindAllwas slow (78ms) — 100x gap vs Rust- After fix:
FindAllon 6MB with 2000 matches: ~1ms (was 78ms)
Performance
| Operation | Before | After | Improvement |
|---|---|---|---|
| FindAll (6MB, 2000 matches) | 78ms | ~1ms | 78x faster |
| vs Rust gap | 100x slower | ~1.3x slower | Near parity! |
Changed
- Updated
golang.org/x/sysv0.39.0 → v0.40.0
Full Changelog: v0.11.3...v0.11.4