Skip to content

feat: remove string fmt allocation in hot loop in BpeBuilder::build#2010

Merged
ArthurZucker merged 2 commits intomainfrom
feat/remove_allocation_bpebuilder_hot_loop
Apr 10, 2026
Merged

feat: remove string fmt allocation in hot loop in BpeBuilder::build#2010
ArthurZucker merged 2 commits intomainfrom
feat/remove_allocation_bpebuilder_hot_loop

Conversation

@McPatate
Copy link
Copy Markdown
Member

@McPatate McPatate commented Apr 7, 2026

was running samply on encode_batch_fast and noticed that init was long, so digging deeper found that BpeBuilder::build was doing let new_token = format!("{}{}", a, &b[prefix_len..]); in a hot loop. Pre-allocating a buffer and writing into it directly resulted in a ~48% perf boost!

Test script:

fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    let tokenizer = Tokenizer::from_file("data/llama-3-tokenizer.json")?;
    let data = std::fs::read_to_string("data/big.txt")?;
    let lines: Vec<&str> = data.lines().collect();

    eprintln!("=== Batch encode_fast (lines) ===");
    let batch: Vec<_> = lines.clone();
    let _ = tokenizer.encode_batch_fast(batch, false)?;

    eprintln!("Done.");
    Ok(())
}

used:

$ cargo build --release --example profile_encode
$ samply record ./target/release/example/profile_encode

to find & measure the change

Before:
Screenshot 2026-04-07 at 18 29 04

After:
Screenshot 2026-04-07 at 18 27 12

36ms -> 25ms so ~45% faster

@McPatate McPatate requested a review from ArthurZucker April 7, 2026 16:29
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM down to add cargo bench for the full loading a tokenier from pretrained

@ArthurZucker
Copy link
Copy Markdown
Collaborator

/benchmark

@github-actions
Copy link
Copy Markdown

Python Benchmark results

Commit: b671e0d104c55542c8c791d3e94059e86ffd26ea

Benchmark Baseline (ms) This run (ms) Δ
test_async_encode_batch 1305.2 1403.5 +7.5%
test_async_encode_batch_fast 1054.6 1129.2 +7.1%
test_decode_batch 2.4 2.8 +20.6%
test_encode 2545.9 2583.8 +1.5%
test_encode_batch 1301.0 1409.1 +8.3%
test_encode_batch_multithreaded 1289.6 1377.1 +6.8%
test_encode_fast 1043.3 1136.1 +8.9%
test_from_file_albert 45.4 49.6 +9.3%
test_from_file_llama3 408.7 430.1 +5.3%
test_from_file_roberta 76.1 74.6 -1.9%
test_from_str_llama3 389.0 412.2 +6.0%
test_to_str_llama3 107.2 100.3 -6.4%
test_train_bpe_small 16.2 15.4 -5.1%

@github-actions
Copy link
Copy Markdown

Rust Benchmark results

Commit: b671e0d104c55542c8c791d3e94059e86ffd26ea

Benchmark Baseline (ns/iter) This run (ns/iter) Δ
bpe-gpt2/encode 1815016018 1877243893 +3%
bpe-gpt2/encode-batch 883721924 859022111 -2%
bpe-gpt2/encode-batch-no-cache 1024733230 1005161750 -1%
bpe-gpt2/encode-no-cache 2345818394 2423803297 +3%
llama3/concurrent-4t 76814529 49998387 -34%
llama3/encode 1754898015 1757227046 0%
llama3/encode-batch 867783684 849115206 -2%
llama3/encode-char-offsets 1067309310 1059360128 0%
llama3/encode-fast 1672139715 1671739252 0%
serialization/bpe-from-file-gpt2 47651117 43944420 -7%
serialization/deserialize-llama3 405279321 407202897 0%
serialization/deserialize-roberta 74238789 75087068 +1%
serialization/from-file-albert 36663177 36297092 0%
serialization/from-file-llama3 371594895 369749169 0%
serialization/from-file-roberta 62753817 63848716 +1%
serialization/save-llama3 109097437 98212058 -9%
train/bpe-small 17622182 17126709 -2%

@ArthurZucker ArthurZucker merged commit 51a6e82 into main Apr 10, 2026
38 checks passed
@ArthurZucker ArthurZucker deleted the feat/remove_allocation_bpebuilder_hot_loop branch April 10, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants