feat: Add support for skip_special_tokens parameter in v1/completions and v1/chat/completions endpoints #4175

rmccorm4 · 2025-11-07T02:36:38Z

Overview:

Adds support for skip_special_tokens bool parameter in completions and chat_completions requests
Default remains false when not provided to avoid any unintended breakages
There is a TODO left in the code for future follow-up to consider updating the default to true to match other frameworks like vLLM/TRTLLM - but this needs to be further verified that nothing breaks before updating.

Details:

Example request setting it to true on a Qwen3 model with reasoning parser enabled:

(dynamo) rmccormick@ced35d0-lcedt:~/dynamo/dynamo$ curl localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '
{
  "model": "Qwen/Qwen3-0.6B",
  "messages": [{"role": "user", "content": "What is a good name for a dog?"}],
  "stream": false,
  "max_tokens": 1024,
  "skip_special_tokens": true
}' | jq

{
  "id": "chatcmpl-54ec61b1-8dfb-4eff-aac5-b7b5c620ba31",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "A good name for a dog could be something that reflects its traits, is easy to pronounce, and has a positive meaning. Here are a few suggestions:1. **Charlie** – Known for its friendly and loyal nature.2. **Buddy** – Symbolizes companionship and loyalty.3. **Max** – A classic and simple name.4. **Whiskers** – A playful and affectionate name.5. **Poodle** – A unique and popular choice for a dog.Choose a name that resonates with your dog’s personality and ensures it’s not taken by another dog. Let me know if you'd like more options!",
        "role": "assistant",
        "reasoning_content": "\nOkay, the user is asking for a good name for a dog. Let me start by thinking about common dog names. They might be looking for something catchy and memorable. Maybe something that's easy to spell and pronounce. Also, considering cultural aspects, some names are more popular in certain regions.\n\nI should think about the meaning behind the name. Maybe something that represents the dog's traits, like loyalty or companionship. Words like \"Buddy\" or \"Charlie\" come to mind. But I need to make sure the name is unique and not taken by another dog. \n\nWait, the user might be a dog owner looking for a name for their dog, so the name should be appropriate. Maybe include a word that means something positive, like \"Happy\" or \"Lucky.\" Also, considering different languages, maybe some names are more common in English, like \"Max\" or \"Buddy.\" \n\nI should also think about the name's length. Maybe two or three syllables. Let me brainstorm a few options: \"Buddy,\" \"Charlie,\" \"Max,\" \"Whiskers,\" \"Poodle.\" Are there any that I'm missing? Maybe \"Sphynx\" as a unique name, but that's more for a cat. \n\nI need to check if these names are commonly used. For example, \"Charlie\" is a popular name, and \"Buddy\" is also widely used. \"Max\" is another common choice. Maybe suggest a few options and explain why they're good. Also, mention that it's best to pick something that's not taken. \n\nWait, the user might not know what's popular, so providing a list of options with explanations would be helpful. Also, maybe include a note about considering the name's meaning and uniqueness. That should cover their needs.\n"
      },
      "finish_reason": "stop"
    }
  ],
  "created": 1762482887,
  "model": "Qwen/Qwen3-0.6B",
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 17,
    "completion_tokens": 510,
    "total_tokens": 527
  }
}

Existing validation is supported on this new field to make sure it is a bool:

(dynamo) rmccormick@ced35d0-lcedt:~/dynamo/dynamo$ curl localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '
{
  "model": "Qwen/Qwen3-0.6B",
  "messages": [{"role": "user", "content": "What is a good name for a dog?"}],
  "stream": false,
  "max_tokens": 1024,
  "skip_special_tokens": 123   
}' | jq

{
  "message": "Failed to deserialize the JSON body into the target type: invalid type: integer `123`, expected a boolean at line 9 column 1",
  "type": "Bad Request",
  "code": 400
}

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: [BUG]: There is no way to skip special tokens in a request #4062

Summary by CodeRabbit

New Features
- Added skip_special_tokens parameter to both chat completions and completions endpoints. Users can now optionally exclude special tokens from decoded output, with the default behavior unchanged.
Tests
- Added tests verifying special token filtering during decoding.

…dundant tests

…de to be safe

rmccorm4 · 2025-11-07T02:42:45Z

This has impact on user flows. For example, with reasoning parsers turned on, the content field of the response might contain content="<|im_end|>" instead of content="\n". This puts the onus on the clients to do model-specific postprocessing of outputs.

@2ez4bz can you give me an example model + curl request that would reproduce this <|im_end|> in the output so I can verify the scenario you described?

In the PR description I started a Qwen3 model with reasoning parser, but didn't see any special tokens in the responses, so I don't have an easily noticeable effect to demonstrate when setting skip_special_tokens to true.

python -m dynamo.vllm --model Qwen/Qwen3-0.6B --dyn-reasoning-parser qwen3 --enforce-eager --connector none

I was however able to see it working when set to true (due to your other bug about add_generation_prompt) on a tool call example with a llama model like so:

python -m dynamo.vllm --model meta-llama/Llama-3.1-8B-Instruct --dyn-tool-call-parser llama3_json --custom-jinja-template ~/dynamo/tool_calling/llama3/tool_chat_template_llama3.1_json.jinja --connector none --enforce-eager

With the add_generation_prompt bug, this would output special tokens like content='<|start_header_id|>assistant<|end_header_id|> which when I set skip_special_tokens=True after the changes in this PR would end up like content='assistant instead.

However, I think those special tokens shouldn't show up in the output at all after #4168 is merged, which is why I'm looking for a different test case that doesn't rely on a bug to observe.

coderabbitai · 2025-11-07T06:25:45Z

Walkthrough

A new skip_special_tokens feature is added to control token decoding behavior. The optional boolean field is introduced to the CommonExt struct, propagated through OpenAI protocol providers for chat and completions endpoints, and plumbed into the backend decoder function.

Changes

Cohort / File(s)	Summary
Core Backend Integration `lib/llm/src/backend.rs`	Added `skip_special_tokens` parameter to `Backend::decoder` function signature; updated generate flow to compute and pass `skip_special_tokens` from `request.output_options` to decoder.
Output Options Protocol Definition `lib/llm/src/protocols/openai/common_ext.rs`	Added new optional `skip_special_tokens: Option<bool>` field to `CommonExt` struct with serde and builder annotations; introduced `get_skip_special_tokens(&self) -> Option<bool>` accessor method to `CommonExtProvider` trait.
Documentation Updates `lib/llm/src/protocols/common.rs`	Removed inline documentation describing the "spaces_between_special_tokens" field from OutputOptions struct.
OpenAI Chat Completions Integration `lib/llm/src/protocols/openai/chat_completions.rs`	Added `get_skip_special_tokens` accessor to `CommonExtProvider` implementation for `NvCreateChatCompletionRequest`; updated `OpenAIOutputOptionsProvider` to delegate `get_skip_special_tokens` to the accessor; added unit tests covering None, Some(true), and Some(false) propagation scenarios.
OpenAI Completions Integration `lib/llm/src/protocols/openai/completions.rs`	Added `get_skip_special_tokens` accessor to `CommonExtProvider` implementation for `NvCreateCompletionRequest`; updated `OpenAIOutputOptionsProvider` to delegate `get_skip_special_tokens`; added unit tests verifying propagation for None, Some(true), and Some(false) cases.
Integration Testing `lib/llm/tests/tokenizers.rs`	Added `test_decode_with_skip_special_tokens` integration test verifying that tokens are decoded with and without special token markers based on the flag setting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Verify skip_special_tokens is correctly threaded from OpenAI request objects through providers to the backend decoder function.
Confirm the CommonExt struct field is properly serialized/deserialized by serde, and that builder annotations work correctly.
Validate that both NvCreateChatCompletionRequest and NvCreateCompletionRequest implementations are consistent and complete.
Review test coverage across all three propagation scenarios (None, Some(true), Some(false)) in both chat and completions modules.
Verify the integration test correctly asserts behavior differences when decoding with and without special tokens.

Poem

🐰 A flag hops through the code with care,
Special tokens skip through the air!
From CommonExt to decoder's call,
Skipping markers, one and all.
Tests confirm the journey's true! 🎉

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main feature: adding skip_special_tokens parameter support to both v1/completions and v1/chat/completions endpoints.
Description check	✅ Passed	The description covers the key sections with details about the feature, example requests/responses, validation behavior, and related issue reference.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

rmccorm4 · 2025-11-07T06:26:37Z

@coderabbitai review

coderabbitai · 2025-11-07T06:26:47Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f509493 and f498b0c.

📒 Files selected for processing (6)

lib/llm/src/backend.rs (2 hunks)
lib/llm/src/protocols/common.rs (0 hunks)
lib/llm/src/protocols/openai/chat_completions.rs (3 hunks)
lib/llm/src/protocols/openai/common_ext.rs (7 hunks)
lib/llm/src/protocols/openai/completions.rs (3 hunks)
lib/llm/tests/tokenizers.rs (1 hunks)

💤 Files with no reviewable changes (1)

lib/llm/src/protocols/common.rs

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: ishandhanani
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2025-09-19T07:32:44.210Z
Learning: The skip_tokenizer_init=True path in SGLang backend bypasses tokenization but has array slicing overhead in _process_token_stream that creates O(n) memory copying on every stream chunk, potentially causing quadratic behavior for long sequences.

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/llm/src/protocols/openai/common_ext.rs

📚 Learning: 2025-09-08T21:18:43.478Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.

Applied to files:

lib/llm/src/protocols/openai/common_ext.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

lib/llm/src/protocols/openai/chat_completions.rs
lib/llm/src/protocols/openai/completions.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

lib/llm/src/protocols/openai/chat_completions.rs

📚 Learning: 2025-09-19T07:32:44.210Z

Learnt from: ishandhanani
Repo: ai-dynamo/dynamo PR: 0
File: :0-0
Timestamp: 2025-09-19T07:32:44.210Z
Learning: The skip_tokenizer_init=True path in SGLang backend bypasses tokenization but has array slicing overhead in _process_token_stream that creates O(n) memory copying on every stream chunk, potentially causing quadratic behavior for long sequences.

Applied to files:

lib/llm/src/backend.rs

🧬 Code graph analysis (5)

lib/llm/tests/tokenizers.rs (2)

lib/llm/src/tokenizers.rs (5)

tokenizer (340-342)

from_file (83-85)

text (348-353)

token_ids (38-43)

token_ids (344-346)

lib/llm/src/tokenizers/hf.rs (1)

from_file (16-21)

lib/llm/src/protocols/openai/common_ext.rs (3)

lib/llm/src/protocols/openai/completions.rs (2)

get_skip_special_tokens (193-195)

get_skip_special_tokens (370-372)

lib/llm/src/protocols/openai/chat_completions.rs (2)

get_skip_special_tokens (202-204)

get_skip_special_tokens (269-271)

lib/llm/src/protocols/openai.rs (1)

get_skip_special_tokens (85-85)

lib/llm/src/protocols/openai/chat_completions.rs (2)

lib/llm/src/protocols/openai/completions.rs (5)

get_skip_special_tokens (193-195)

get_skip_special_tokens (370-372)

test_skip_special_tokens_none (422-438)

test_skip_special_tokens_true_propagates (441-456)

test_skip_special_tokens_false_propagates (459-474)

lib/llm/src/protocols/openai/common_ext.rs (1)

get_skip_special_tokens (112-112)

lib/llm/src/backend.rs (1)

lib/llm/src/tokenizers.rs (4)

tokenizer (340-342)

new (176-191)

new (268-275)

new (495-503)

lib/llm/src/protocols/openai/completions.rs (3)

lib/llm/src/protocols/openai/chat_completions.rs (5)

get_skip_special_tokens (202-204)

get_skip_special_tokens (269-271)

test_skip_special_tokens_none (331-349)

test_skip_special_tokens_true_propagates (352-369)

test_skip_special_tokens_false_propagates (372-389)

lib/llm/src/protocols/openai/common_ext.rs (1)

get_skip_special_tokens (112-112)

lib/llm/src/protocols/openai.rs (1)

get_skip_special_tokens (85-85)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: trtllm (arm64)
GitHub Check: vllm (amd64)
GitHub Check: Mirror Repository to GitLab
GitHub Check: clippy (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (.)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (9)

lib/llm/tests/tokenizers.rs (1)

182-255: LGTM! Comprehensive test for skip_special_tokens.

The test properly validates both decoding modes (with and without special tokens) and includes appropriate assertions for token markers, content preservation, and length comparison.

lib/llm/src/protocols/openai/common_ext.rs (2)

111-112: LGTM!

The trait method signature is consistent with other methods in CommonExtProvider.

238-284: LGTM! Comprehensive test coverage for skip_special_tokens.

The tests properly validate field assignment, serialization/deserialization, and the skip_serializing_if behavior for None values.

lib/llm/src/backend.rs (2)

92-112: LGTM! skip_special_tokens properly plumbed to decoder.

The parameter is correctly added to the decoder signature and passed through to the tokenizer's decode_stream.

133-144: LGTM! Default behavior is intentionally conservative.

The TODO comment appropriately documents the future consideration to change the default to true (matching vLLM/TRTLLM). The current default of false preserves existing behavior to prevent breaking changes.

lib/llm/src/protocols/openai/chat_completions.rs (2)

202-204: LGTM! Accessor correctly returns the field value.

Implementation is consistent with other CommonExtProvider methods.

269-271: LGTM! Delegation and test coverage are appropriate.

The OpenAIOutputOptionsProvider correctly delegates to CommonExtProvider, and the tests comprehensively validate propagation of None, Some(true), and Some(false) through the extract_output_options pipeline.

Also applies to: 324-390

lib/llm/src/protocols/openai/completions.rs (2)

193-195: LGTM! Accessor implementation is correct.

Consistent with the chat_completions.rs implementation.

370-372: LGTM! Implementation and tests are comprehensive.

The delegation pattern is correct, and tests thoroughly validate skip_special_tokens propagation for the completions endpoint, mirroring the coverage in chat_completions.rs.

Also applies to: 415-475

coderabbitai · 2025-11-07T06:29:00Z

lib/llm/src/protocols/openai/common_ext.rs

+    /// Whether to skip special tokens in the decoded output.
+    /// When true, special tokens (like EOS, BOS, PAD) are removed from the output text.
+    /// When false, special tokens are included in the output text.
+    /// Defaults to true if not specified (matching vLLM/TensorRT-LLM behavior).
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    #[builder(default, setter(strip_option))]
+    pub skip_special_tokens: Option<bool>,


⚠️ Potential issue | 🟠 Major

Fix documentation inconsistency with actual default behavior.

The documentation states "Defaults to true if not specified (matching vLLM/TensorRT-LLM behavior)" but the actual implementation in lib/llm/src/backend.rs Line 134 uses unwrap_or(false), defaulting to false. This creates confusion about the expected behavior.

Update the documentation to reflect the current default:

- /// Whether to skip special tokens in the decoded output. - /// When true, special tokens (like EOS, BOS, PAD) are removed from the output text. - /// When false, special tokens are included in the output text. - /// Defaults to true if not specified (matching vLLM/TensorRT-LLM behavior). + /// Whether to skip special tokens in the decoded output. + /// When true, special tokens (like EOS, BOS, PAD) are removed from the output text. + /// When false, special tokens are included in the output text. + /// Defaults to false if not specified. A future change may update this to true to match vLLM/TensorRT-LLM behavior.

🤖 Prompt for AI Agents

In lib/llm/src/protocols/openai/common_ext.rs around lines 76 to 82, the docstring incorrectly states the default is true while the implementation uses unwrap_or(false); update the documentation to state the default is false (i.e., "Defaults to false if not specified"), remove the parenthetical about matching vLLM/TensorRT-LLM or change it to reflect the actual behavior, and ensure serde/builder attributes remain unchanged.

rmccorm4 added 3 commits November 6, 2025 17:29

v1: propogate skip_special_tokens e2e and unit test, todo - remove re…

77d5be4

…dundant tests

Cleanup redundant tests, keep default as false with ability to overri…

4000589

…de to be safe

Add tokenizer unit test with skip_special_tokens true/false

e88e6fe

pull-request-size bot added the size/L label Nov 7, 2025

github-actions bot added the feat label Nov 7, 2025

lint

cae6fb9

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 02:47 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 02:49 Inactive

rmccorm4 marked this pull request as ready for review November 7, 2025 06:24

rmccorm4 requested a review from a team as a code owner November 7, 2025 06:24

Merge branch 'main' into rmccormick/fix_4062

f498b0c

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 06:24 Inactive

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 06:29 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support for skip_special_tokens parameter in v1/completions and v1/chat/completions endpoints #4175

feat: Add support for skip_special_tokens parameter in v1/completions and v1/chat/completions endpoints #4175

rmccorm4 commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

rmccorm4 commented Nov 7, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

rmccorm4 commented Nov 7, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add support for skip_special_tokens parameter in v1/completions and v1/chat/completions endpoints #4175

Are you sure you want to change the base?

feat: Add support for skip_special_tokens parameter in v1/completions and v1/chat/completions endpoints #4175

Conversation

rmccorm4 commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

rmccorm4 commented Nov 7, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

rmccorm4 commented Nov 7, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rmccorm4 commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2025 •

edited

Loading