Skip to content

[GRPO/RLOO] Extract tokenize prompts from _generate_single_turn#5240

Merged
qgallouedec merged 75 commits intomainfrom
extract-tokenize-prompts
Mar 10, 2026
Merged

[GRPO/RLOO] Extract tokenize prompts from _generate_single_turn#5240
qgallouedec merged 75 commits intomainfrom
extract-tokenize-prompts

Conversation

@qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Mar 7, 2026

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

The previous PR unified tokenization across all generation backends into a single block at the top of _generate_single_turn. This PR extracts that tokenization into a dedicated _tokenize_prompts method and moves the calls to it into the callers, so that _generate_single_turn accepts pre-tokenized inputs.

Changes

  • New _tokenize_prompts(prompts) method in both GRPO and RLOO: Extracts tokenization logic (image extraction, apply_chat_template, multimodal field extraction) into a reusable method. Returns (prompt_ids, images, multimodal_fields).
  • _generate_single_turn signature change: From (prompts: list) to (prompt_ids, images=None, multimodal_fields=None). The method is now a pure generation method that accepts pre-tokenized inputs.
  • Updated call sites: GRPO _generate, GRPO _tool_call_loop, and RLOO _generate now call _tokenize_prompts before _generate_single_turn.

Why

This separation is a prerequisite for the token-in/token-out pipeline. In the tool-calling loop, the current code still re-tokenizes the full prompt+completion+tool-result messages on each iteration. With tokenization now happening outside _generate_single_turn, future PRs can replace the _tokenize_prompts call in the tool loop with direct token ID concatenation — avoiding re-tokenization entirely.

Backward compatibility

No user-facing API changes. _generate_single_turn and _tokenize_prompts are internal methods.


Note

Medium Risk
Touches the core generation path and tool-calling loop via a signature change to _generate_single_turn, so regressions would show up as incorrect tokenization/generation or broken multimodal/tool flows despite being an internal refactor.

Overview
Refactors GRPO and RLOO generation to separate prompt preprocessing from decoding/generation by introducing _tokenize_prompts(prompts) and changing _generate_single_turn to accept pre-tokenized prompt_ids plus extracted images/multimodal_fields.

Call sites (GRPO _generate and tool-calling loop, plus RLOO _generate) now tokenize once and pass token IDs through subsequent generation steps, reducing chances of decode/re-tokenize mismatches and preparing for a token-in/token-out tool-calling pipeline.

Written by Cursor Bugbot for commit 6c8f55c. This will update automatically on new commits. Configure here.

qgallouedec and others added 25 commits March 5, 2026 19:10
@qgallouedec qgallouedec changed the base branch from main to unify-tokenization-generate March 7, 2026 04:58
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just a comment below to align with the simplicity principle.

qgallouedec and others added 18 commits March 10, 2026 09:06
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Base automatically changed from unify-tokenization-generate to main March 10, 2026 20:20
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@qgallouedec qgallouedec merged commit a65c830 into main Mar 10, 2026
14 checks passed
@qgallouedec qgallouedec deleted the extract-tokenize-prompts branch March 10, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants