You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/GRPO/DeveloperGuide/multi_turn.md
+9Lines changed: 9 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -172,6 +172,15 @@ The complete trajectory can be accessed via `trajectory_inputs` in `kwargs`.
172
172
173
173
For a concrete implementation, see the [MultiTurnThinkingTips class](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/plugin/plugin.py)
174
174
175
+
### Multimodal Data Override
176
+
In multimodal, multi-turn interactions, you may need to dynamically add, delete, or modify multimodal data during the conversation and ensure these changes are synchronized to the trainer.
177
+
178
+
Implementation: Use `rollout_infos` to override the original multimodal content in the dataset by specifying the corresponding keys.
179
+
180
+
Supported override keys: images, audios, videos.
181
+
182
+
For details, see [DeepEyes Scheduler](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/plugin/deepeyes/deepeyes_plugin.py#L403-L404).
183
+
175
184
### Returning response token IDs
176
185
177
186
In the default workflow the scheduler returns text, the trainer re-encodes it to token IDs for training.
0 commit comments