Skip to content

Commit e29c7c5

Browse files
committed
feat: add fasttokens benchmarks and tokenizer backend docs
Benchmarks: - Rename benches/tokenizer.rs to benches/tokenizer_simple.rs, add criterion fasttokens vs HF encode and batch-encode benchmarks - Add benches/tokenizer_dataset.rs: dataset-driven benchmark using LongBench-v2 (503 real-world samples), sequential and batched modes with correctness verification (~24x sequential, ~27x batched speedup) Docs: - docs/components/frontend/tokenizer-backends.md: user guide with configuration, compatibility notes, and benchmark results - docs/components/frontend/configuration.md: added Tokenizer section - docs/index.yml: added Tokenizer Backends page under Frontend
1 parent a9de37b commit e29c7c5

File tree

6 files changed

+490
-2
lines changed

6 files changed

+490
-2
lines changed

docs/components/frontend/configuration.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,12 @@ See the [Frontend Guide](frontend-guide.md) for KServe message formats and integ
9191
| `--metrics-prefix` | `DYN_METRICS_PREFIX` | `dynamo_frontend` | Prefix for frontend Prometheus metrics |
9292
| `--dump-config-to` | `DYN_DUMP_CONFIG_TO` || Dump resolved config to file path |
9393

94+
## Tokenizer
95+
96+
| CLI Argument | Env Var | Default | Description |
97+
|-------------|---------|---------|-------------|
98+
| `--dyn-tokenizer-backend` | `DYN_TOKENIZER_BACKEND` | `default` | Tokenizer backend: `default` (HuggingFace) or `fasttokens` (fastokens crate for high-performance BPE encoding). See [Tokenizer Backends](tokenizer-backends.md) |
99+
94100
## Experimental
95101

96102
| CLI Argument | Env Var | Default | Description |
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
title: Tokenizer Backends
5+
---
6+
7+
The Dynamo Frontend supports multiple tokenizer backends for BPE-based models. The backend controls how input text is tokenized before being sent to the inference engine.
8+
9+
## Tokenizer Backends
10+
11+
#### `default` HuggingFace Tokenizers
12+
13+
The default backend uses the [HuggingFace `tokenizers`](https://github.com/huggingface/tokenizers) library (Rust).
14+
It supports features in `tokenizer.json` files (normalizers, pre-tokenizers, post-processors, decoders, added tokens with special-token flags, and byte-fallback).
15+
16+
#### `fasttokens` High-Performance BPE Encoding
17+
18+
The `fasttokens` backend uses the [`fastokens`](https://github.com/Atero-ai/fastokens) crate, a purpose-built BPE encoder optimized for throughput.
19+
It is a _hybrid_ backend: encoding uses `fastokens` while decoding falls back to HuggingFace so that incremental detokenization, byte-fallback, and special-token handling work correctly.
20+
21+
Use this backend when tokenization is a measurable bottleneck, for example on high-concurrency prefill-heavy workloads.
22+
23+
#### Compatibility notes:
24+
25+
- Works with standard BPE `tokenizer.json` files (Qwen, LLaMA, GPT-family, Mistral, DeepSeek, etc.).
26+
- If `fastokens` cannot load a particular tokenizer file, the frontend logs a warning and transparently falls back to HuggingFace; requests are never dropped.
27+
- Has no effect on TikToken-format tokenizers (`.model` / `.tiktoken` files), which always use the TikToken backend.
28+
29+
## Configuration
30+
31+
Set the backend with a CLI flag or environment variable. The CLI flag takes precedence.
32+
33+
| CLI Argument | Env Var | Valid values | Default |
34+
|---|---|---|---|
35+
| `--dyn-tokenizer-backend` | `DYN_TOKENIZER_BACKEND` | `default`, `fasttokens` | `default` |
36+
37+
**Examples:**
38+
39+
```bash
40+
# CLI flag
41+
python -m dynamo.frontend --dyn-tokenizer-backend fasttokens
42+
43+
# Environment variable
44+
export DYN_TOKENIZER_BACKEND=fasttokens
45+
python -m dynamo.frontend
46+
```
47+
48+
## Dynamo Frontend Behavior
49+
50+
When `DYN_TOKENIZER_BACKEND=fasttokens` is set:
51+
52+
1. The frontend passes the environment variable to the Rust runtime.
53+
2. When building the tokenizer for a model, `ModelDeploymentCard::tokenizer()` attempts to load `fastokens::Tokenizer` from the same `tokenizer.json` file.
54+
3. If loading succeeds, a hybrid `FastTokenizer` is created that encodes with `fastokens` and decodes with HuggingFace.
55+
4. If loading fails (unsupported tokenizer features, missing file, etc.), the frontend logs a warning and falls back to the standard HuggingFace backend; no operator intervention is needed.

docs/index.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,8 @@ navigation:
200200
contents:
201201
- page: Frontend Guide
202202
path: components/frontend/frontend-guide.md
203+
- page: Tokenizer Backends
204+
path: components/frontend/tokenizer-backends.md
203205
- section: Router
204206
path: components/router/README.md
205207
contents:

lib/llm/Cargo.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,11 @@ bench = ["dynamo-kv-router/bench"]
3030
kv-router-stress = ["dep:clap", "dep:indicatif", "bench"]
3131

3232
[[bench]]
33-
name = "tokenizer"
33+
name = "tokenizer_simple"
34+
harness = false
35+
36+
[[bench]]
37+
name = "tokenizer_dataset"
3438
harness = false
3539

3640
[[bench]]

0 commit comments

Comments
 (0)