add multi-turn support for multi-modal RL #1703

hallerite · 2026-02-01T04:31:30Z

add experimental support for doing multi-turn RL with VLMs

Note

Medium Risk
Touches the multimodal data path that builds pixel_values/image_grid_thw for training samples; mistakes can silently misalign images with tokens across turns and degrade RL training stability.

Overview
Adds multi-turn multimodal (VLM) rollout support by extending the VLM image cache to extract and preprocess images across all trajectory steps, then supplying per-step cumulative pixel_values/image_grid_thw to branch_rollout (while interleave_rollout uses the final step’s images).

Updates the orchestrator rollout-processing path to pass the shared vlm_cache object into rollout conversion, selects the longest trajectory per example_id when building the cache, and expands unit tests to cover multi-turn image extraction, cache accessors, and rollout image assignment; docs now note higher KL mismatch for multi-image inputs.

^{Written by Cursor Bugbot for commit fb42176. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/prime_rl/orchestrator/trajectories.py

add multi-turn support

2984d63

hallerite force-pushed the hallerite/multimodal branch from b47a19f to 2984d63 Compare February 1, 2026 04:43

cursor bot reviewed Feb 1, 2026

View reviewed changes

src/prime_rl/orchestrator/trajectories.py Show resolved Hide resolved

hallerite added 2 commits February 1, 2026 06:42

fix bug with vlm cache

a8c9cf8

extend docs

fb42176

samsja approved these changes Feb 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multi-turn support for multi-modal RL #1703

add multi-turn support for multi-modal RL #1703

hallerite commented Feb 1, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add multi-turn support for multi-modal RL #1703

Are you sure you want to change the base?

add multi-turn support for multi-modal RL #1703

Conversation

hallerite commented Feb 1, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hallerite commented Feb 1, 2026 •

edited by cursor bot

Loading