Skip to content

feat: integrate fastokens BPE tokenizer backend#7387

Open
biswapanda wants to merge 15 commits intomainfrom
bis/fast-tokens-dynamo
Open

feat: integrate fastokens BPE tokenizer backend#7387
biswapanda wants to merge 15 commits intomainfrom
bis/fast-tokens-dynamo

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Mar 14, 2026

Overview:

Add the fastokens crate (v0.1.0 from github.com/Atero-ai/fastokens) as an always-on workspace dependency for high-performance BPE encoding.

Details:

Core integration:

  • lib/llm/src/tokenizers/fast.rs: hybrid FastTokenizer that encodes with fastokens and decodes with HuggingFace, with 4 unit tests
  • lib/llm/src/model_card.rs: tokenizer() checks DYN_TOKENIZER=fastokens env var, falls back to HuggingFace on load failure

Frontend CLI:

  • --tokenizer flag / DYN_TOKENIZER env var with values "default" (HuggingFace) or "fastokens"

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Added --dyn-tokenizer-backend command-line option to select different tokenizer backends.
    • New high-performance tokenizer implementation now available.
  • Tests

    • Added tokenizer test suite and validation data.

Add the fastokens crate (v0.1.0 from github.com/Atero-ai/fastokens) as
an always-on workspace dependency for high-performance BPE encoding.

Core integration:
- lib/llm/src/tokenizers/fast.rs: hybrid FastTokenizer that encodes with
  fastokens and decodes with HuggingFace, with 4 unit tests
- lib/llm/src/model_card.rs: tokenizer() checks DYN_TOKENIZER_BACKEND=fasttokens
  env var, falls back to HuggingFace on load failure

Frontend CLI:
- --dyn-tokenizer-backend flag / DYN_TOKENIZER_BACKEND env var with
  values "default" (HuggingFace) or "fasttokens"
@biswapanda biswapanda self-assigned this Mar 14, 2026
@biswapanda biswapanda requested review from a team as code owners March 14, 2026 21:52
@github-actions github-actions bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Mar 14, 2026
@biswapanda biswapanda enabled auto-merge (squash) March 14, 2026 21:57
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Walkthrough

This pull request introduces support for a new "fasttokens" tokenizer backend. Changes include: adding the fastokens workspace dependency from a Git repository, exposing CLI configuration for backend selection, implementing a hybrid FastTokenizer struct that uses fastokens for encoding and HuggingFaceTokenizer for decoding, and providing test data for validation.

Changes

Cohort / File(s) Summary
Workspace Dependencies
Cargo.toml, lib/llm/Cargo.toml
Added fastokens workspace dependency from GitHub repository for high-performance tokenization.
CLI Configuration
components/src/dynamo/frontend/frontend_args.py
Added --dyn-tokenizer-backend CLI argument and tokenizer_backend field to FrontendConfig with environment variable support.
Environment Setup
components/src/dynamo/frontend/main.py
Added environment variable propagation for DYN_TOKENIZER_BACKEND when backend is set to "fasttokens".
Tokenizer Abstraction
lib/llm/src/tokenizers.rs
Exposed new fast module and FastTokenizer type in public API.
Tokenizer Implementation
lib/llm/src/tokenizers/fast.rs
Implemented hybrid FastTokenizer using fastokens for encoding with batch processing via rayon and HuggingFaceTokenizer for decoding, including comprehensive unit tests.
Dynamic Backend Selection
lib/llm/src/model_card.rs
Added environment-driven tokenizer selection logic, attempting FastTokenizer when DYN_TOKENIZER_BACKEND=fasttokens before falling back to HuggingFace.
Test Data
lib/llm/tests/data/sample-models/minimal-bpe/tokenizer.json
Added minimal BPE tokenizer configuration file for testing round-trip encode/decode workflows.
Project Metadata
lib/bindings/python/pyproject.toml
Commented out explicit license and license-files metadata declarations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop, hop, the tokens fly so fast,
With fastokens here at last!
Encoding swift with fuzzy cheer,
New backends blooming far and near,
The code now hops with speed so bright! 🚀✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: integrate fastokens BPE tokenizer backend' accurately summarizes the main change: integrating a new high-performance BPE tokenizer backend throughout the codebase.
Description check ✅ Passed The PR description covers overview, technical details, and references related issues, though some specifics lack depth.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@biswapanda biswapanda changed the title feat: integrate fasttokens high-performance BPE tokenizer backend feat: integrate fasttokens BPE tokenizer backend Mar 14, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
lib/llm/src/tokenizers/fast.rs (1)

125-143: Make the decode-stream test assert emitted text, not just absence of errors.

Right now this only proves step() doesn't fail. Empty or duplicated chunks would still pass. Please accumulate the returned chunks and compare them with a reference decode for the continuation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/tokenizers/fast.rs` around lines 125 - 143, The test
test_fast_with_decode_stream currently only checks that stream.step() doesn't
error; change it to accumulate the emitted chunks from
wrapper.decode_stream(&prompt_ids, true) by collecting each step(...) return
(concatenate non-empty chunks) into a single string, then obtain the expected
text by decoding the continuation (e.g. via wrapper.decode(cont_ids) or
wrapper.decode(continuation)) and assert equality (or assert that the
accumulated string contains the expected continuation); update references in the
test around FastTokenizer::from_file, TokenizerWrapper::from,
wrapper.decode_stream, and stream.step to perform this accumulation and
comparison instead of a no-op loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Cargo.toml`:
- Line 49: The fastokens Git dependency is floating and needs an immutable ref;
update the Cargo.toml dependency line for fastokens (the entry fastokens = { git
= "https://github.com/Atero-ai/fastokens", version = "0.1.0" }) to include a rev
(or tag) field with the specific commit hash or tag you want pinned (e.g., rev =
"COMMIT_SHA") so the manifest is reproducible; ensure you use the exact commit
SHA or tag from the fastokens repo and run cargo update -p fastokens if needed
to regenerate the lockfile.

In `@components/src/dynamo/frontend/frontend_args.py`:
- Around line 429-440: The CLI currently allows any string for the
--dyn-tokenizer-backend argument; update the add_argument call that defines
flag_name="--dyn-tokenizer-backend" (dest="tokenizer_backend") to restrict
values to the supported set by adding an explicit choices constraint (e.g.,
["default","fasttokens"]) so parsing fails fast for invalid CLI/env values and
documents the accepted options in the help text.

In `@components/src/dynamo/frontend/main.py`:
- Around line 168-169: The current logic only sets
os.environ["DYN_TOKENIZER_BACKEND"] = "fasttokens" when config.tokenizer_backend
== "fasttokens" and never clears it; update the branch around
config.tokenizer_backend in main.py so that when config.tokenizer_backend ==
"default" you remove/unset DYN_TOKENIZER_BACKEND (e.g., pop or del from
os.environ if present), keep setting it for "fasttokens" as before, and ensure
no stale environment value remains when the default backend is chosen.

In `@lib/bindings/python/pyproject.toml`:
- Around line 25-26: Uncomment and replace the outdated dict license entries in
pyproject.toml by adding explicit SPDX and license-files fields: remove the
commented lines and add license = "Apache-2.0" and license-files = ["LICENSE"]
so the project uses the PEP 639-compatible string expression and explicit
license file declaration (update the existing commented/old keys related to
license and license-files).

In `@lib/llm/src/model_card.rs`:
- Around line 384-386: The current logic silently treats any non-"fasttokens"
DYN_TOKENIZER_BACKEND as the default behavior; change the handling to explicitly
match allowed values: if the env var is "fasttokens" set use_fast = true, if it
is "default" (or an explicit "slow"/"rust" token value you support) set use_fast
= false, and for any other value emit a clear warning or return an error so
misconfiguration is visible; update the code that reads DYN_TOKENIZER_BACKEND
(the use_fast assignment) to perform a match on the string and log/propagate an
error on unsupported values rather than silently falling back.
- Around line 394-399: When attempting the fast tokenizer, don't call
p.to_str().ok_or_else(...) which returns early on non-UTF-8 paths and prevents
the HuggingFace fallback; instead check p.to_str() with an if let/ match and
only call crate::tokenizers::FastTokenizer::from_file(path_str) when to_str()
returns Some. If to_str() is None, skip the fast-tokenizer attempt (do not
return an error) so the existing HF loader/fallback logic can run; also when
FastTokenizer::from_file fails, allow the code to continue to the HuggingFace
fallback rather than short-circuiting.

---

Nitpick comments:
In `@lib/llm/src/tokenizers/fast.rs`:
- Around line 125-143: The test test_fast_with_decode_stream currently only
checks that stream.step() doesn't error; change it to accumulate the emitted
chunks from wrapper.decode_stream(&prompt_ids, true) by collecting each
step(...) return (concatenate non-empty chunks) into a single string, then
obtain the expected text by decoding the continuation (e.g. via
wrapper.decode(cont_ids) or wrapper.decode(continuation)) and assert equality
(or assert that the accumulated string contains the expected continuation);
update references in the test around FastTokenizer::from_file,
TokenizerWrapper::from, wrapper.decode_stream, and stream.step to perform this
accumulation and comparison instead of a no-op loop.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 14f36a69-69af-4531-bb2b-0dfeeb99d059

📥 Commits

Reviewing files that changed from the base of the PR and between 0b66515 and 13eccdd.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • Cargo.toml
  • components/src/dynamo/frontend/frontend_args.py
  • components/src/dynamo/frontend/main.py
  • lib/bindings/python/pyproject.toml
  • lib/llm/Cargo.toml
  • lib/llm/src/model_card.rs
  • lib/llm/src/tokenizers.rs
  • lib/llm/src/tokenizers/fast.rs
  • lib/llm/tests/data/sample-models/minimal-bpe/tokenizer.json


add_argument(
g,
flag_name="--dyn-tokenizer-backend",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be tokenizer - or backend is needed

also @jh-nv I see we mix dyn prefix and no dyn prefix - did we decide on a common naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed backend can be dropped

Copy link
Contributor Author

@biswapanda biswapanda Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jh-nv The DYN_ prefix only appears on the env var side consistently but --dyn- prefix on the CLI flag is not present for most of them. ok to drop dyn- prefix as well to keep it shorter --tokenizer

my initial thought was using --dyn for cli arg and DYN_ in env vars for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed - Updated arg to --tokenizer and env var to DYN_TOKENIZER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @nnshah1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the docs PR to reflect this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I'll rebase the docs PR, I was waiting for this PR to stabilize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nealvaidya , I've updated #7388

@biswapanda
Copy link
Contributor Author

There are cargo-deny related CI failures due to upstream transitive dependencies and a PR for fast-tokens is being reviewed by Crusoe team - Atero-ai/fastokens#5


Issue: The fastokens crate (v0.1.0) declares hf-hub = "0.4.3" with default features, which pulls in native-tls and openssl-sys. Dynamo's deny.toml explicitly bans both crates (lines 63-64), causing cargo-deny to fail in CI.

The dependency chain is:

fastokens -> hf-hub (default features) -> ureq -> native-tls -> openssl-sys

Cargo.toml Outdated
dynamo-kv-router = { path = "lib/kv-router", version = "1.0.0", features = ["metrics"] }
dynamo-async-openai = { path = "lib/async-openai", version = "1.0.0", features = ["byot"] }
dynamo-parsers = { path = "lib/parsers", version = "1.0.0" }
fastokens = { git = "https://github.com/biswapanda/fastokens", rev = "aed8b9f3ba024c689c72985d62506dfa79daec25", version = "0.1.0" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we create a fort under ai-dynamo? cc @saturley-hall @dmitry-tokarev-nv

Copy link
Contributor Author

@biswapanda biswapanda Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, lets create a fork / mirror of upstream https://github.com/Atero-ai/fastokens.

There is an cargo-deny of fastokens transitive dependency issue: #7387 (comment)

I've a PR for Crusoe team to upstream this change: Atero-ai/fastokens#5.

I've pointed it to my repo with fix in the meantime as a workaround - fastokens team will review the PR by IST morning.

@biswapanda biswapanda changed the title feat: integrate fasttokens BPE tokenizer backend feat: integrate fastokens BPE tokenizer backend Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants