Release v0.12.3: Cross-product literal expansion (14x speedup on regexdna) · coregx/coregex

Cross-Product Literal Expansion for Char Classes

Patterns with small char classes in the middle of concatenations (e.g., ag[act]gtaaa|tttac[agt]ct) were routed to UseDFA (pure lazy DFA scan) because the char class broke literal extraction — 120x slower than Rust, 1.3x slower than Go stdlib.

The literal extractor now computes the cross-product through small char classes (≤10 chars), producing full-length discriminating literals for the Teddy SIMD prefilter.

Results (6MB DNA input, AMD EPYC)

Pattern	Before (v0.12.2)	After (v0.12.3)	Speedup
`ag[act]gtaaa\|tttac[agt]ct`	463ms (UseDFA)	32ms (UseTeddy)	14x
`agg[act]taaa\|ttta[agt]cct`	455ms (UseDFA)	33ms (UseTeddy)	14x
`aggg[acg]aaa\|ttt[cgt]ccct`	457ms (UseDFA)	32ms (UseTeddy)	14x
`agggt[cgt]aa\|tt[acg]accct`	466ms (UseDFA)	32ms (UseTeddy)	14x

All 9 regexdna patterns now use UseTeddy — 10-20x faster than stdlib across the board.

Safety

Three-layer overflow protection (CrossProductLimit=250, MaxLiterals=64, truncate-to-4-bytes)
FoldCase guard prevents case-insensitive matching bugs
DNA correctness: byte-for-byte match parity with Go stdlib across 1KB/64KB/1MB
Zero regressions on all existing benchmarks

Reported by @kostya via regexdna benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12.3: Cross-product literal expansion (14x speedup on regexdna)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Cross-Product Literal Expansion for Char Classes

Results (6MB DNA input, AMD EPYC)

Safety

Uh oh!