ai-dynamo · biswapanda · Mar 14, 2026 · Mar 15, 2026 · Mar 15, 2026
diff --git a/docs/components/frontend/configuration.md b/docs/components/frontend/configuration.md
@@ -91,6 +91,12 @@ See the [Frontend Guide](frontend-guide.md) for KServe message formats and integ
 | `--metrics-prefix` | `DYN_METRICS_PREFIX` | `dynamo_frontend` | Prefix for frontend Prometheus metrics |
 | `--dump-config-to` | `DYN_DUMP_CONFIG_TO` | — | Dump resolved config to file path |
 
+## Tokenizer
+
+| CLI Argument | Env Var | Default | Description |
+|-------------|---------|---------|-------------|
+| `--tokenizer` | `DYN_TOKENIZER` | `default` | Tokenizer backend: `default` (HuggingFace) or `fastokens` (fastokens crate for high-performance BPE encoding). See [Tokenizer Backends](tokenizer-backends.md) |
+
 ## Experimental
 
 | CLI Argument | Env Var | Default | Description |

diff --git a/docs/components/frontend/tokenizer-backends.md b/docs/components/frontend/tokenizer-backends.md
@@ -0,0 +1,55 @@
+---
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+title: Tokenizer Backends
+---
+
+The Dynamo Frontend supports multiple tokenizer backends for BPE-based models. The backend controls how input text is tokenized before being sent to the inference engine.
+
+## Tokenizer Backends
+
+#### `default` HuggingFace Tokenizers
+
+The default backend uses the [HuggingFace `tokenizers`](https://github.com/huggingface/tokenizers) library (Rust). 
+It supports features in `tokenizer.json` files (normalizers, pre-tokenizers, post-processors, decoders, added tokens with special-token flags, and byte-fallback).
+
+#### `fastokens` High-Performance BPE Encoding
+
+The `fastokens` backend uses the [`fastokens`](https://github.com/Atero-ai/fastokens) crate, a purpose-built BPE encoder optimized for throughput.
+It is a _hybrid_ backend: encoding uses `fastokens` while decoding falls back to HuggingFace so that incremental detokenization, byte-fallback, and special-token handling work correctly.
+
+Use this backend when tokenization is a measurable bottleneck, for example on high-concurrency prefill-heavy workloads.
+
+#### Compatibility notes:
+
+- Works with standard BPE `tokenizer.json` files (Qwen, LLaMA, GPT-family, Mistral, DeepSeek, etc.).
+- If `fastokens` cannot load a particular tokenizer file, the frontend logs a warning and transparently falls back to HuggingFace; requests are never dropped.
+- Has no effect on TikToken-format tokenizers (`.model` / `.tiktoken` files), which always use the TikToken backend.
+
+## Configuration
+
+Set the backend with a CLI flag or environment variable. The CLI flag takes precedence.
+
+| CLI Argument | Env Var | Valid values | Default |
+|---|---|---|---|
+| `--tokenizer` | `DYN_TOKENIZER` | `default`, `fastokens` | `default` |
+
+**Examples:**
+
+```bash
+# CLI flag
+python -m dynamo.frontend --tokenizer fastokens
+
+# Environment variable
+export DYN_TOKENIZER=fastokens
+python -m dynamo.frontend
+```
+
+## Dynamo Frontend Behavior
+
+When `DYN_TOKENIZER=fastokens` is set:
+
+1. The frontend passes the environment variable to the Rust runtime.
+2. When building the tokenizer for a model, `ModelDeploymentCard::tokenizer()` attempts to load `fastokens::Tokenizer` from the same `tokenizer.json` file.
+3. If loading succeeds, a hybrid `FastTokenizer` is created that encodes with `fastokens` and decodes with HuggingFace.
+4. If loading fails (unsupported tokenizer features, missing file, etc.), the frontend logs a warning and falls back to the standard HuggingFace backend; no operator intervention is needed.
diff --git a/docs/index.yml b/docs/index.yml
@@ -200,6 +200,8 @@ navigation:
         contents:
           - page: Frontend Guide
             path: components/frontend/frontend-guide.md
+          - page: Tokenizer Backends
+            path: components/frontend/tokenizer-backends.md
       - section: Router
         path: components/router/README.md
         contents:

@@ -30,7 +30,11 @@ bench = ["dynamo-kv-router/bench"]
 kv-router-stress = ["dep:clap", "dep:indicatif", "bench"]
 
 [[bench]]
-name = "tokenizer"
+name = "tokenizer_simple"
+harness = false
+
+[[bench]]
+name = "tokenizer_dataset"
 harness = false
 
 [[bench]]