dynamo/docs/components/frontend/tokenizer-backends.md at 34eddf7f34ba25ed9545789175579e91b0157e59 · ai-dynamo/dynamo

title
Tokenizer Backends

The Dynamo Frontend supports multiple tokenizer backends for BPE-based models. The backend controls how input text is tokenized before being sent to the inference engine.

Tokenizer Backends

`default` HuggingFace Tokenizers

The default backend uses the HuggingFace tokenizers library (Rust). It supports features in tokenizer.json files (normalizers, pre-tokenizers, post-processors, decoders, added tokens with special-token flags, and byte-fallback).

`fastokens` High-Performance BPE Encoding

The fastokens backend uses the fastokens crate, a purpose-built BPE encoder optimized for throughput. It is a hybrid backend: encoding uses fastokens while decoding falls back to HuggingFace so that incremental detokenization, byte-fallback, and special-token handling work correctly.

Use this backend when tokenization is a measurable bottleneck, for example on high-concurrency prefill-heavy workloads.

Compatibility notes:

Works with standard BPE tokenizer.json files (Qwen, LLaMA, GPT-family, Mistral, DeepSeek, etc.).
If fastokens cannot load a particular tokenizer file, the frontend logs a warning and transparently falls back to HuggingFace; requests are never dropped.
Has no effect on TikToken-format tokenizers (.model / .tiktoken files), which always use the TikToken backend.

Configuration

Set the backend with a CLI flag or environment variable. The CLI flag takes precedence.

CLI Argument	Env Var	Valid values	Default
`--tokenizer`	`DYN_TOKENIZER`	`default`, `fastokens`	`default`

Examples:

# CLI flag
python -m dynamo.frontend --tokenizer fastokens

# Environment variable
export DYN_TOKENIZER=fastokens
python -m dynamo.frontend

Dynamo Frontend Behavior

When DYN_TOKENIZER=fastokens is set:

The frontend passes the environment variable to the Rust runtime.
When building the tokenizer for a model, ModelDeploymentCard::tokenizer() attempts to load fastokens::Tokenizer from the same tokenizer.json file.
If loading succeeds, a hybrid FastTokenizer is created that encodes with fastokens and decodes with HuggingFace.
If loading fails (unsupported tokenizer features, missing file, etc.), the frontend logs a warning and falls back to the standard HuggingFace backend; no operator intervention is needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenizer Backends

`default` HuggingFace Tokenizers

`fastokens` High-Performance BPE Encoding

Compatibility notes:

Configuration

Dynamo Frontend Behavior

FilesExpand file tree

tokenizer-backends.md

Latest commit

History

tokenizer-backends.md

File metadata and controls

Tokenizer Backends

default HuggingFace Tokenizers

fastokens High-Performance BPE Encoding

Compatibility notes:

Configuration

Dynamo Frontend Behavior

`default` HuggingFace Tokenizers

`fastokens` High-Performance BPE Encoding