Skip to content

feat: add pcre2 as optional feature#1959

Open
wheynelau wants to merge 1 commit intohuggingface:mainfrom
wheynelau:perf-pcre2
Open

feat: add pcre2 as optional feature#1959
wheynelau wants to merge 1 commit intohuggingface:mainfrom
wheynelau:perf-pcre2

Conversation

@wheynelau
Copy link
Contributor

@wheynelau wheynelau commented Mar 2, 2026

Motivation: Exploring performance profiling and noticed onig showing up in the profiles and tried swapping for pcre2. Happy to get some feedback - I'm not deeply familiar with the tradeoffs.

I have validated that all tests pass and the benchmarks shows that its better for GPT2 and Llama3 models:

Benchmark main (onig) pcre2
bpe-encode/BPE GPT2 encode 1705.7±19.00ms 3.6 MB/sec 1422.8±10.89ms 4.3 MB/sec
llama3-encode/llama3-encode 1912.2±21.94ms 3.2 MB/sec 1601.6±5.81ms 3.9 MB/sec
bpe-encode/BPE GPT2 encode, no cache 2.5±0.04s 2.5 MB/sec 2.1±0.02s 2.9 MB/sec
llama3-encode/llama3-offsets 257.9±7.03ms 24.0 MB/sec 240.8±3.05ms 25.7 MB/sec
llama3-encode/llama3-batch 340.9±5.99ms 18.2 MB/sec 319.3±3.05ms 19.4 MB/sec

Commands used:

cargo bench --no-default-features --features onig,progressbar,esaxx_fast -- --save-baseline main
cargo bench --no-default-features --features pcre2,progressbar,esaxx_fast -- --save-baseline pcre2

Based on perf these were my CPU samples:

func onig pcre2
NormalizedString::split 5.58% 1.44%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant