Skip to content

Releases: coregx/coregex

v0.12.3: Cross-product literal expansion (14x speedup on regexdna)

16 Feb 19:46
6337003

Choose a tag to compare

Cross-Product Literal Expansion for Char Classes

Patterns with small char classes in the middle of concatenations (e.g., ag[act]gtaaa|tttac[agt]ct) were routed to UseDFA (pure lazy DFA scan) because the char class broke literal extraction — 120x slower than Rust, 1.3x slower than Go stdlib.

The literal extractor now computes the cross-product through small char classes (≤10 chars), producing full-length discriminating literals for the Teddy SIMD prefilter.

Results (6MB DNA input, AMD EPYC)

Pattern Before (v0.12.2) After (v0.12.3) Speedup
ag[act]gtaaa|tttac[agt]ct 463ms (UseDFA) 32ms (UseTeddy) 14x
agg[act]taaa|ttta[agt]cct 455ms (UseDFA) 33ms (UseTeddy) 14x
aggg[acg]aaa|ttt[cgt]ccct 457ms (UseDFA) 32ms (UseTeddy) 14x
agggt[cgt]aa|tt[acg]accct 466ms (UseDFA) 32ms (UseTeddy) 14x

All 9 regexdna patterns now use UseTeddy10-20x faster than stdlib across the board.

Safety

  • Three-layer overflow protection (CrossProductLimit=250, MaxLiterals=64, truncate-to-4-bytes)
  • FoldCase guard prevents case-insensitive matching bugs
  • DNA correctness: byte-for-byte match parity with Go stdlib across 1KB/64KB/1MB
  • Zero regressions on all existing benchmarks

Reported by @kostya via regexdna benchmark.

v0.12.2: fix alternation patterns misrouted to ReverseSuffixSet

16 Feb 14:13
f6f6a76

Choose a tag to compare

Fixed

  • Alternation patterns misrouted to ReverseSuffixSet (Issue #116) — Patterns like [cgt]gggtaaa|tttaccc[acg] (alternation without .* prefix) were incorrectly routed to UseReverseSuffixSet strategy, producing wrong match boundaries. Fix: added isSafeForReverseSuffix guard.
  • matchStartZero optimization too aggressive — Restricted to .* prefix only via hasDotStarPrefix(). Was previously enabled for all unanchored patterns, causing wrong match start for patterns like [^\s]+\.txt.

Fixes #116

v0.12.1: Bidirectional DFA fallback, bounded repetitions fix

15 Feb 16:14
2516b6a

Choose a tag to compare

Performance

  • DFA bidirectional fallback for BoundedBacktracker — When BoundedBacktracker can't handle large inputs (exceeds 32M entry limit), use forward DFA + reverse DFA instead of PikeVM. O(n) total vs PikeVM's O(n×states). (\w{2,8})+ on 6MB: 654ms → 184ms (3.6x vs stdlib).
  • Digit-run skip optimization — For \d+-leading patterns (IP addresses, version numbers), skip entire digit run on DFA failure instead of advancing one byte at a time.

Bug Fixes

  • Bounded repetitions blocked ReverseSuffix strategy (#115) — isSafeForReverseSuffix didn't recognize OpRepeat{min>=1} as a wildcard, blocking UseReverseSuffix for patterns with bounded repetitions. Fix: 2500ms → 0.5ms (5000x) on 100KB no-match.
  • CompositeSequenceDFA overmatching for bounded patterns — Bare character classes like \w (maxMatch=1) were treated as unbounded by the DFA. \w\w on "000" returned "000" instead of "00".
  • AVX2 Teddy assembly correctness (#74) — Fixed teddySlimAVX2_2 returning position -1 for valid candidates in short haystacks.

Benchmarks (regex-bench, AMD EPYC, 6MB input)

Pattern Go stdlib coregex Rust regex vs stdlib
inner_literal 231 ms 0.25 ms 0.31 ms 926x
suffix 234 ms 0.89 ms 1.09 ms 263x
ip 507 ms 2.16 ms 12.05 ms 235x
char_class 560 ms 41 ms 50 ms 13.6x
word_repeat 654 ms 184 ms 49 ms 3.6x

Extreme benchmarks (6MB no-match): ip 2542x, suffix 1945x, phone 863x, inner 598x vs stdlib.

v0.12.0: Rust-inspired optimizations

06 Feb 01:15
a30fd70

Choose a tag to compare

Performance

  • Anti-quadratic guard for reverse suffix/inner/suffix-set searches — prevents O(n²) degradation on high false-positive suffix workloads, falls back to PikeVM when quadratic detected
  • Lazy DFA 4x loop unrolling — process 4 state transitions per inner loop iteration, check special states between batches
  • Prefilter IsFast() gate — skip reverse search optimizations when fast SIMD-backed prefix prefilter already exists
  • DFA cache clear & continue — on cache overflow, clear and fall back to PikeVM for current search instead of permanently disabling DFA

Fixed

  • OnePass DFA capture limit — tighten from 17 to 16 capture groups (uint32 slot mask = 32 bits)

Benchmark (AMD EPYC, regex-bench)

Pattern coregex vs stdlib vs Rust
suffix 0.91ms 257x 1.4x faster
email 0.70ms 383x 1.9x faster
ip 2.19ms 225x 5.5x faster
uri 0.76ms 340x 1.2x faster
multiline_php 0.60ms 171x 1.2x faster
anchored_php 0.03ms ~1x 12.0x faster

v0.11.9: Fix missing first-byte prefilter in FindAll

01 Feb 21:33
8b528fa

Choose a tag to compare

Fixed

  • Missing first-byte prefilter in FindAll state-reusing path (#107)
    • findIndicesBoundedBacktrackerAtWithState was missing anchoredFirstBytes O(1) check
    • Pattern ^/.*[\w-]+\.php (without $) took 377ms instead of 40µs on 6MB input
    • Fix: 377ms → 40µs (9000x improvement for non-matching anchored patterns)

Full Changelog

v0.11.8...v0.11.9

v0.11.8: Fix UseAnchoredLiteral regression

01 Feb 20:41
f0f527d

Choose a tag to compare

Fixed

  • Critical regression in UseAnchoredLiteral strategy (#107)
    • FindIndices* and findIndicesAtWithState were missing UseAnchoredLiteral case
    • Pattern ^/.*[\w-]+\.php$ fell through to slow NFA path
    • Regression: 0.01ms → 408ms (40,000x slower)
    • Fix: 408ms → 0.5ms (O(1) anchored literal matching restored)

Full Changelog

v0.11.7...v0.11.8

v0.11.7: FindAll optimization - 1.08x faster than stdlib

01 Feb 19:50
1480f40

Choose a tag to compare

Fixed

FindAll now uses optimized state-reusing path

  • FindAll was using slow per-match loop instead of optimized findAllIndicesStreaming
  • Results for (\w{2,8})+ on 6MB: 2179ms → 834ms (2.6x faster)
  • Now 1.08x faster than stdlib (was 2.4x slower in regex-bench)

Full Changelog

See CHANGELOG.md

v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib

01 Feb 18:56
fce1691

Choose a tag to compare

Performance

Major PikeVM optimization achieving 1.68x speedup over stdlib for large inputs (was 2.2x slower).

Key Changes

  • Windowed BoundedBacktracker (V12): Search in 914KB windows before PikeVM fallback
  • SlotTable architecture: Rust-style per-state slot storage
  • Dynamic slot sizing: 0 (IsMatch), 2 (Find), full (Captures)
  • Lightweight searchThread: 16 bytes (was 40+ bytes)

Benchmark Results

Pattern (\w{2,8})+ vs stdlib:

Size Speedup
10KB 1.68x faster
50KB 1.88x faster
100KB 2.04x faster
1MB 1.67x faster
6MB 1.68x faster

6MB improvement: 1900ms → 628ms (3x faster)

Full Changelog

See CHANGELOG.md

v0.11.5: Fix checkHasWordBoundary catastrophic slowdown

01 Feb 09:46
de173be

Choose a tag to compare

Summary

Fixes catastrophic performance regression in patterns with \w{n,m} quantifiers (Issue #105).

Before: 3 minutes 22 seconds on 79KB input (7,000,000x slower than stdlib)
After: 3.6 µs on 79KB input (8.6x faster than stdlib)

Changes

Fixed

  • checkHasWordBoundary catastrophic slowdown (Issue #105)
    • Root cause: O(N*M) complexity from scanning all NFA states per byte
    • Fix: Use NewBuilderWithWordBoundary(), add hasWordBoundary guards, anchored prefilter verification

Performance

  • DFA state lookup: map → slice — 42% CPU time eliminated
  • Literal extraction from capture/repeat groups — better prefilters
    • =($\w...){2} now extracts =$ (2 bytes) instead of just =

Benchmarks (79KB input)

Stage Time vs stdlib
Before fix 3m 22s 7,000,000x slower
After fix 3.6 µs 8.6x faster

Credits

@danslo for root cause analysis and fix suggestions

Full Changelog: v0.11.4...v0.11.5

v0.11.4: FindAll multiline optimization

16 Jan 15:59
8baa0ef

Choose a tag to compare

Fixed

  • FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
    • FindIndicesAt() was missing dispatch for UseMultilineReverseSuffix
    • IsMatch/Find were fast (1µs), but FindAll was slow (78ms) — 100x gap vs Rust
    • After fix: FindAll on 6MB with 2000 matches: ~1ms (was 78ms)

Performance

Operation Before After Improvement
FindAll (6MB, 2000 matches) 78ms ~1ms 78x faster
vs Rust gap 100x slower ~1.3x slower Near parity!

Changed

  • Updated golang.org/x/sys v0.39.0 → v0.40.0

Full Changelog: v0.11.3...v0.11.4