Releases · coregx/coregex

16 Feb 19:46

kolkov

v0.12.3

6337003

v0.12.3: Cross-product literal expansion (14x speedup on regexdna) Latest

Latest

Cross-Product Literal Expansion for Char Classes

Patterns with small char classes in the middle of concatenations (e.g., ag[act]gtaaa|tttac[agt]ct) were routed to UseDFA (pure lazy DFA scan) because the char class broke literal extraction — 120x slower than Rust, 1.3x slower than Go stdlib.

The literal extractor now computes the cross-product through small char classes (≤10 chars), producing full-length discriminating literals for the Teddy SIMD prefilter.

Results (6MB DNA input, AMD EPYC)

Pattern	Before (v0.12.2)	After (v0.12.3)	Speedup
`ag[act]gtaaa\|tttac[agt]ct`	463ms (UseDFA)	32ms (UseTeddy)	14x
`agg[act]taaa\|ttta[agt]cct`	455ms (UseDFA)	33ms (UseTeddy)	14x
`aggg[acg]aaa\|ttt[cgt]ccct`	457ms (UseDFA)	32ms (UseTeddy)	14x
`agggt[cgt]aa\|tt[acg]accct`	466ms (UseDFA)	32ms (UseTeddy)	14x

All 9 regexdna patterns now use UseTeddy — 10-20x faster than stdlib across the board.

Safety

Three-layer overflow protection (CrossProductLimit=250, MaxLiterals=64, truncate-to-4-bytes)
FoldCase guard prevents case-insensitive matching bugs
DNA correctness: byte-for-byte match parity with Go stdlib across 1KB/64KB/1MB
Zero regressions on all existing benchmarks

Reported by @kostya via regexdna benchmark.

Assets 2

16 Feb 14:13

kolkov

v0.12.2

f6f6a76

v0.12.2: fix alternation patterns misrouted to ReverseSuffixSet

Fixed

Alternation patterns misrouted to ReverseSuffixSet (Issue #116) — Patterns like [cgt]gggtaaa|tttaccc[acg] (alternation without .* prefix) were incorrectly routed to UseReverseSuffixSet strategy, producing wrong match boundaries. Fix: added isSafeForReverseSuffix guard.
matchStartZero optimization too aggressive — Restricted to .* prefix only via hasDotStarPrefix(). Was previously enabled for all unanchored patterns, causing wrong match start for patterns like [^\s]+\.txt.

Fixes #116

Assets 2

15 Feb 16:14

kolkov

v0.12.1

2516b6a

v0.12.1: Bidirectional DFA fallback, bounded repetitions fix

Performance

DFA bidirectional fallback for BoundedBacktracker — When BoundedBacktracker can't handle large inputs (exceeds 32M entry limit), use forward DFA + reverse DFA instead of PikeVM. O(n) total vs PikeVM's O(n×states). (\w{2,8})+ on 6MB: 654ms → 184ms (3.6x vs stdlib).
Digit-run skip optimization — For \d+-leading patterns (IP addresses, version numbers), skip entire digit run on DFA failure instead of advancing one byte at a time.

Bug Fixes

Bounded repetitions blocked ReverseSuffix strategy (#115) — isSafeForReverseSuffix didn't recognize OpRepeat{min>=1} as a wildcard, blocking UseReverseSuffix for patterns with bounded repetitions. Fix: 2500ms → 0.5ms (5000x) on 100KB no-match.
CompositeSequenceDFA overmatching for bounded patterns — Bare character classes like \w (maxMatch=1) were treated as unbounded by the DFA. \w\w on "000" returned "000" instead of "00".
AVX2 Teddy assembly correctness (#74) — Fixed teddySlimAVX2_2 returning position -1 for valid candidates in short haystacks.

Benchmarks (regex-bench, AMD EPYC, 6MB input)

Pattern	Go stdlib	coregex	Rust regex	vs stdlib
inner_literal	231 ms	0.25 ms	0.31 ms	926x
suffix	234 ms	0.89 ms	1.09 ms	263x
ip	507 ms	2.16 ms	12.05 ms	235x
char_class	560 ms	41 ms	50 ms	13.6x
word_repeat	654 ms	184 ms	49 ms	3.6x

Extreme benchmarks (6MB no-match): ip 2542x, suffix 1945x, phone 863x, inner 598x vs stdlib.

Assets 2

06 Feb 01:15

kolkov

v0.12.0

a30fd70

v0.12.0: Rust-inspired optimizations

Performance

Anti-quadratic guard for reverse suffix/inner/suffix-set searches — prevents O(n²) degradation on high false-positive suffix workloads, falls back to PikeVM when quadratic detected
Lazy DFA 4x loop unrolling — process 4 state transitions per inner loop iteration, check special states between batches
Prefilter IsFast() gate — skip reverse search optimizations when fast SIMD-backed prefix prefilter already exists
DFA cache clear & continue — on cache overflow, clear and fall back to PikeVM for current search instead of permanently disabling DFA

Fixed

OnePass DFA capture limit — tighten from 17 to 16 capture groups (uint32 slot mask = 32 bits)

Benchmark (AMD EPYC, regex-bench)

Pattern	coregex	vs stdlib	vs Rust
suffix	0.91ms	257x	1.4x faster
email	0.70ms	383x	1.9x faster
ip	2.19ms	225x	5.5x faster
uri	0.76ms	340x	1.2x faster
multiline_php	0.60ms	171x	1.2x faster
anchored_php	0.03ms	~1x	12.0x faster

Assets 2

01 Feb 21:33

kolkov

v0.11.9

8b528fa

v0.11.9: Fix missing first-byte prefilter in FindAll

Fixed

Missing first-byte prefilter in FindAll state-reusing path (#107)
- findIndicesBoundedBacktrackerAtWithState was missing anchoredFirstBytes O(1) check
- Pattern ^/.*[\w-]+\.php (without $) took 377ms instead of 40µs on 6MB input
- Fix: 377ms → 40µs (9000x improvement for non-matching anchored patterns)

Full Changelog

v0.11.8...v0.11.9

Assets 2

01 Feb 20:41

kolkov

v0.11.8

f0f527d

v0.11.8: Fix UseAnchoredLiteral regression

Fixed

Critical regression in UseAnchoredLiteral strategy (#107)
- FindIndices* and findIndicesAtWithState were missing UseAnchoredLiteral case
- Pattern ^/.*[\w-]+\.php$ fell through to slow NFA path
- Regression: 0.01ms → 408ms (40,000x slower)
- Fix: 408ms → 0.5ms (O(1) anchored literal matching restored)

Full Changelog

v0.11.7...v0.11.8

Assets 2

01 Feb 19:50

kolkov

v0.11.7

1480f40

v0.11.7: FindAll optimization - 1.08x faster than stdlib

Fixed

FindAll now uses optimized state-reusing path

FindAll was using slow per-match loop instead of optimized findAllIndicesStreaming
Results for (\w{2,8})+ on 6MB: 2179ms → 834ms (2.6x faster)
Now 1.08x faster than stdlib (was 2.4x slower in regex-bench)

Full Changelog

See CHANGELOG.md

Assets 2

01 Feb 18:56

kolkov

v0.11.6

fce1691

v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib

Performance

Major PikeVM optimization achieving 1.68x speedup over stdlib for large inputs (was 2.2x slower).

Key Changes

Windowed BoundedBacktracker (V12): Search in 914KB windows before PikeVM fallback
SlotTable architecture: Rust-style per-state slot storage
Dynamic slot sizing: 0 (IsMatch), 2 (Find), full (Captures)
Lightweight searchThread: 16 bytes (was 40+ bytes)

Benchmark Results

Pattern (\w{2,8})+ vs stdlib:

Size	Speedup
10KB	1.68x faster
50KB	1.88x faster
100KB	2.04x faster
1MB	1.67x faster
6MB	1.68x faster

6MB improvement: 1900ms → 628ms (3x faster)

Full Changelog

See CHANGELOG.md

Assets 2

01 Feb 09:46

kolkov

v0.11.5

de173be

v0.11.5: Fix checkHasWordBoundary catastrophic slowdown

Summary

Fixes catastrophic performance regression in patterns with \w{n,m} quantifiers (Issue #105).

Before: 3 minutes 22 seconds on 79KB input (7,000,000x slower than stdlib)
After: 3.6 µs on 79KB input (8.6x faster than stdlib)

Changes

Fixed

checkHasWordBoundary catastrophic slowdown (Issue #105)
- Root cause: O(N*M) complexity from scanning all NFA states per byte
- Fix: Use NewBuilderWithWordBoundary(), add hasWordBoundary guards, anchored prefilter verification

Performance

DFA state lookup: map → slice — 42% CPU time eliminated
Literal extraction from capture/repeat groups — better prefilters
- =($\w...){2} now extracts =$ (2 bytes) instead of just =

Benchmarks (79KB input)

Stage	Time	vs stdlib
Before fix	3m 22s	7,000,000x slower
After fix	3.6 µs	8.6x faster

Credits

@danslo for root cause analysis and fix suggestions

Full Changelog: v0.11.4...v0.11.5

Contributors

danslo

Assets 2

16 Jan 15:59

kolkov

v0.11.4

8baa0ef

v0.11.4: FindAll multiline optimization

Fixed

FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
- FindIndicesAt() was missing dispatch for UseMultilineReverseSuffix
- IsMatch/Find were fast (1µs), but FindAll was slow (78ms) — 100x gap vs Rust
- After fix: FindAll on 6MB with 2000 matches: ~1ms (was 78ms)

Performance

Operation	Before	After	Improvement
FindAll (6MB, 2000 matches)	78ms	~1ms	78x faster
vs Rust gap	100x slower	~1.3x slower	Near parity!

Changed

Updated golang.org/x/sys v0.39.0 → v0.40.0

Full Changelog: v0.11.3...v0.11.4

Assets 2

Releases: coregx/coregex

v0.12.3: Cross-product literal expansion (14x speedup on regexdna)

Cross-Product Literal Expansion for Char Classes

Results (6MB DNA input, AMD EPYC)

Safety

Uh oh!

v0.12.2: fix alternation patterns misrouted to ReverseSuffixSet

Fixed

Uh oh!

v0.12.1: Bidirectional DFA fallback, bounded repetitions fix

Performance

Bug Fixes

Benchmarks (regex-bench, AMD EPYC, 6MB input)

Uh oh!

v0.12.0: Rust-inspired optimizations

Performance

Fixed

Benchmark (AMD EPYC, regex-bench)

Uh oh!

v0.11.9: Fix missing first-byte prefilter in FindAll

Fixed

Full Changelog

Uh oh!

v0.11.8: Fix UseAnchoredLiteral regression

Fixed

Full Changelog

Uh oh!

v0.11.7: FindAll optimization - 1.08x faster than stdlib

Fixed

Full Changelog

Uh oh!

v0.11.6: PikeVM 6MB optimization - 1.68x faster than stdlib

Performance

Key Changes

Benchmark Results

Full Changelog

Uh oh!

v0.11.5: Fix checkHasWordBoundary catastrophic slowdown

Summary

Changes

Fixed

Performance

Benchmarks (79KB input)

Credits

Contributors

Uh oh!

v0.11.4: FindAll multiline optimization

Fixed

Performance

Changed

Uh oh!