[GRPO/RLOO] Extract tokenize prompts from _generate_single_turn#5240
Merged
qgallouedec merged 75 commits intomainfrom Mar 10, 2026
Merged
[GRPO/RLOO] Extract tokenize prompts from _generate_single_turn#5240qgallouedec merged 75 commits intomainfrom
_generate_single_turn#5240qgallouedec merged 75 commits intomainfrom
Conversation
… left-padding for per-token fields
This was referenced Mar 7, 2026
albertvillanova
approved these changes
Mar 10, 2026
Member
albertvillanova
left a comment
There was a problem hiding this comment.
Thanks. Just a comment below to align with the simplicity principle.
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).
When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via
apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.promptsin vLLM client and server #5225rollout_funcfrom_generate_single_turnto_generate#5232_generate_single_turn#5239The previous PR unified tokenization across all generation backends into a single block at the top of
_generate_single_turn. This PR extracts that tokenization into a dedicated_tokenize_promptsmethod and moves the calls to it into the callers, so that_generate_single_turnaccepts pre-tokenized inputs.Changes
_tokenize_prompts(prompts)method in both GRPO and RLOO: Extracts tokenization logic (image extraction,apply_chat_template, multimodal field extraction) into a reusable method. Returns(prompt_ids, images, multimodal_fields)._generate_single_turnsignature change: From(prompts: list)to(prompt_ids, images=None, multimodal_fields=None). The method is now a pure generation method that accepts pre-tokenized inputs._generate, GRPO_tool_call_loop, and RLOO_generatenow call_tokenize_promptsbefore_generate_single_turn.Why
This separation is a prerequisite for the token-in/token-out pipeline. In the tool-calling loop, the current code still re-tokenizes the full prompt+completion+tool-result messages on each iteration. With tokenization now happening outside
_generate_single_turn, future PRs can replace the_tokenize_promptscall in the tool loop with direct token ID concatenation — avoiding re-tokenization entirely.Backward compatibility
No user-facing API changes.
_generate_single_turnand_tokenize_promptsare internal methods.Note
Medium Risk
Touches the core generation path and tool-calling loop via a signature change to
_generate_single_turn, so regressions would show up as incorrect tokenization/generation or broken multimodal/tool flows despite being an internal refactor.Overview
Refactors GRPO and RLOO generation to separate prompt preprocessing from decoding/generation by introducing
_tokenize_prompts(prompts)and changing_generate_single_turnto accept pre-tokenizedprompt_idsplus extractedimages/multimodal_fields.Call sites (GRPO
_generateand tool-calling loop, plus RLOO_generate) now tokenize once and pass token IDs through subsequent generation steps, reducing chances of decode/re-tokenize mismatches and preparing for a token-in/token-out tool-calling pipeline.Written by Cursor Bugbot for commit 6c8f55c. This will update automatically on new commits. Configure here.