[GOLD][vLLM] On-policy generation decodes prompts with skip_special_tokens=True (chat template stripped) — intentional?


Hi TRL team, thanks for GOLD and the vLLM integration.

I’m trying to understand the intended behavior of GOLD’s on-policy generation with vLLM, which was originally identified by @simran135 and discussed in https://github.com/siyan-zhao/OPSD/issues/3.

In `v0.29.0`, this line says:

https://github.com/huggingface/trl/blob/v0.29.0/trl/experimental/gold/gold_trainer.py#L1680

> "Decode prompts for vLLM (without special tokens - vLLM expects clean text)"

In `_generate_on_policy_outputs_vllm`, prompts are decoded with `skip_special_tokens=True` before being sent to vLLM:

```python
prompts_text_for_vllm = self.processing_class.batch_decode(
    inputs["prompts"],
    skip_special_tokens=True,  # strips <|im_start|>, <|im_end|>, etc.
)
```

This strips chat-template special tokens (e.g., `<|im_start|>`, `<|im_end|>`, role markers) before vLLM receives the prompt. Then vLLM gets raw text and tokenizes it, rather than receiving the original templated prompt IDs.

For chat-template-driven models (e.g., Qwen3), this may change rollout behavior. In particular, if thinking mode depends on template-time controls (such as template kwargs), it may not be activated during on-policy vLLM generation.

## Questions

1. Is using `skip_special_tokens=True` here intentional?
2. What is the reason for requiring “clean text” instead of preserving templated prompt IDs/text?

Looking forward to your response. Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GOLD][vLLM] On-policy generation decodes prompts with skip_special_tokens=True (chat template stripped) — intentional? #5241

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[GOLD][vLLM] On-policy generation decodes prompts with skip_special_tokens=True (chat template stripped) — intentional? #5241

Description

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions