[GRPO/RLOO] Extract tokenize prompts from `_generate_single_turn` by qgallouedec · Pull Request #5240 · huggingface/trl

qgallouedec · 2026-03-07T04:58:21Z

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

The previous PR unified tokenization across all generation backends into a single block at the top of _generate_single_turn. This PR extracts that tokenization into a dedicated _tokenize_prompts method and moves the calls to it into the callers, so that _generate_single_turn accepts pre-tokenized inputs.

Changes

New _tokenize_prompts(prompts) method in both GRPO and RLOO: Extracts tokenization logic (image extraction, apply_chat_template, multimodal field extraction) into a reusable method. Returns (prompt_ids, images, multimodal_fields).
_generate_single_turn signature change: From (prompts: list) to (prompt_ids, images=None, multimodal_fields=None). The method is now a pure generation method that accepts pre-tokenized inputs.
Updated call sites: GRPO _generate, GRPO _tool_call_loop, and RLOO _generate now call _tokenize_prompts before _generate_single_turn.

Why

This separation is a prerequisite for the token-in/token-out pipeline. In the tool-calling loop, the current code still re-tokenizes the full prompt+completion+tool-result messages on each iteration. With tokenization now happening outside _generate_single_turn, future PRs can replace the _tokenize_prompts call in the tool loop with direct token ID concatenation — avoiding re-tokenization entirely.

Backward compatibility

No user-facing API changes. _generate_single_turn and _tokenize_prompts are internal methods.

Note

Medium Risk
Touches the core generation path and tool-calling loop via a signature change to _generate_single_turn, so regressions would show up as incorrect tokenization/generation or broken multimodal/tool flows despite being an internal refactor.

Overview
Refactors GRPO and RLOO generation to separate prompt preprocessing from decoding/generation by introducing _tokenize_prompts(prompts) and changing _generate_single_turn to accept pre-tokenized prompt_ids plus extracted images/multimodal_fields.

Call sites (GRPO _generate and tool-calling loop, plus RLOO _generate) now tokenize once and pass token IDs through subsequent generation steps, reducing chances of decode/re-tokenize mismatches and preparing for a token-in/token-out tool-calling pipeline.

^{Written by Cursor Bugbot for commit 6c8f55c. This will update automatically on new commits. Configure here.}

…dling

…-token

… for None values

…-token

…ration

…_turn

… left-padding for per-token fields

albertvillanova

Thanks. Just a comment below to align with the simplicity principle.

trl/trainer/grpo_trainer.py

trl/trainer/rloo_trainer.py

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

…enerate

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

…rainers

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

…enerate

trl/trainer/rloo_trainer.py

trl/trainer/grpo_trainer.py

…r and RLOOTrainer

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

trl/trainer/grpo_trainer.py

qgallouedec and others added 25 commits March 5, 2026 19:10

support prompts or token IDs in VLLMClient and update API request han…

f10285e

…dling

test

7d2bb67

consistency

3b356ac

fix

82c4508

another fix

3ea2fcf

fix docstring

445f4ba

Add support for multi-modal inputs in VLLMClient and vllm_serve

8c6c88d

Merge branch 'main' into vllm-accept-token-ids

f617b2d

Merge branch 'main' into vllm-accept-token-ids

eaffd67

Move rollout_func from _generate_single_turn to _generate`

f3f6a5d

fix style

d417543

support multi-image

4b927d6

style

029fc1f

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

20b4039

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

b8e3912

Fix handling of images in OnlineDPOTrainer to ensure proper structure…

07181cb

… for None values

Merge branch 'main' into vllm-accept-token-ids

6ff1e56

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

9f340e4

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

d138be7

Move tokenization before vLLM generation call

09128d6

Fix deadlock issue by ensuring images are always gathered in VLLMGene…

7fd1711

…ration

Unify tokenization across all generation backends in _generate_single…

3ab04b0

…_turn

Extract tokenization out of _generate_single_turn into _tokenize_prompts

5d6d067

Enhance multimodal input handling in GRPO and RLOO trainers by adding…

b4d2c34

… left-padding for per-token fields

style

4922362

qgallouedec changed the base branch from main to unify-tokenization-generate March 7, 2026 04:58

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

37c48b3

This was referenced Mar 7, 2026

[GRPO/RLOO] Unify tokenization across all generation backends in _generate_single_turn #5239

Merged

[GRPO/RLOO] Tokenize before vLLM generation call #5238

Merged

Move rollout_func from _generate_single_turn to _generate #5232

Merged

albertvillanova approved these changes Mar 10, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

trl/trainer/rloo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 18 commits March 10, 2026 09:06

Merge branch 'main' into vllm-generate-with-token-ids

ade2831

Update trl/trainer/grpo_trainer.py

258e0a8

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Update trl/trainer/rloo_trainer.py

ef96048

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Merge branch 'vllm-generate-with-token-ids' into unify-tokenization-g…

0ee6495

…enerate

Update trl/trainer/grpo_trainer.py

bb6dc69

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Update trl/trainer/rloo_trainer.py

0effa0d

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

fad1fdd

Remove unused chat/tool configuration parameters from VLLM and RLOO t…

b35f250

…rainers

Update trl/generation/vllm_generation.py

040e392

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Update trl/trainer/rloo_trainer.py

ca2cae3

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Merge branch 'main' into vllm-generate-with-token-ids

fee553d

Merge branch 'vllm-generate-with-token-ids' into unify-tokenization-g…

90df2de

…enerate

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

f36c0ea

fix

fdaa90a

style

6f10cd2

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

533c337

Merge branch 'main' into unify-tokenization-generate

7e7e3b3

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

31d8a0c

cursor bot reviewed Mar 10, 2026

View reviewed changes

trl/trainer/rloo_trainer.py Show resolved Hide resolved

Base automatically changed from unify-tokenization-generate to main March 10, 2026 20:20

Merge branch 'main' into extract-tokenize-prompts

8b4f6af

cursor bot reviewed Mar 10, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

qgallouedec and others added 2 commits March 10, 2026 14:50

Merge branch 'main' into extract-tokenize-prompts

81cf273

Remove dead code: eliminate prompt tokenization logic from GRPOTraine…

918686b

…r and RLOOTrainer

cursor bot reviewed Mar 10, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec added 2 commits March 10, 2026 21:03

remove unused extra_fields from _generate_single_turn return value

9b8de83

style

6c8f55c

qgallouedec merged commit a65c830 into main Mar 10, 2026
14 checks passed

qgallouedec deleted the extract-tokenize-prompts branch March 10, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GRPO/RLOO] Extract tokenize prompts from `_generate_single_turn`#5240

[GRPO/RLOO] Extract tokenize prompts from `_generate_single_turn`#5240
qgallouedec merged 75 commits intomainfrom
extract-tokenize-prompts

qgallouedec commented Mar 7, 2026 •

edited by cursor bot

Loading

Uh oh!

albertvillanova left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qgallouedec commented Mar 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Why

Backward compatibility

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Mar 7, 2026 •

edited by cursor bot

Loading