Support of trajectory aggregation for mrope multimodal model, and add multimodal prefix checks for trajectory merge #469

jackhu-bme · 2026-01-29T08:02:11Z

Summary

This PR enables trajectory aggregation for multimodal models using mRoPE, which was previously disallowed due to missing multimodal state and correctness concerns during aggregation.

Motivation

Trajectory aggregation is an important feature for accelerating training in multi-turn agent reinforcement learning. Compared to transition-level aggregation, it processes an entire interaction trajectory as a single training sample, eliminating redundant computation on repeated history prefixes and significantly improving training throughput.

This aggregation strategy is commonly used in large language model pre-training and fine-tuning, and is described in more detail in the Agent Lightning documentation: https://agent-lightning.github.io/posts/trajectory_level_aggregation/

In the current implementation, trajectory aggregation is explicitly disabled for multimodal models using mRoPE via an assertion. This restriction was introduced because the aggregated trajectories did not carry sufficient multimodal state to guarantee correctness, which could lead to incorrect mRoPE position assignment when image inputs are involved.

This PR revisits that restriction by making the required multimodal information explicit and introducing additional validation to ensure correctness.

Changes Made

Removed the hard assertion that disabled trajectory aggregation for M-RoPE-based multimodal models, allowing trajectories to proceed when image grid metadata is available.
Propagated image_grid_thw during trajectory merges by deriving the multimodal grid metadata from the last merged trace and appending it to image_grid_thw_list, ensuring consistent position_ids computation and correct image token alignment.
Added multimodal prefix consistency checks during trajectory aggregation: trajectory merges now require both token-level prefixes and image URL prefixes to match. Image URLs are semantically normalized (e.g., file:// and data: URLs with identical content map to the same hash), and mismatches are logged for debugging.

Testing

Not run with official examples, as the current repository does not yet include a multimodal, multi-turn trajectory example that exercises trajectory aggregation.

Existing examples fall into the following categories:

Pure-text, multi-turn agents (e.g., conversational or planning demos), which are suitable for trajectory aggregation but do not involve multimodal inputs.
Multimodal examples that are single-turn or stage-based pipelines, where images are provided per step but conversation history is not fed back into subsequent turns, so trajectory-level prefix merging is not exercised.
Single-shot or non-conversational tasks that do not involve trajectory aggregation.

The changes were verified locally using a custom multimodal, multi-turn workflow (X-ray image input with a crop tool), where conversation history is preserved across turns. Trajectory merges succeeded only when both token prefixes and image prefixes matched, and the new validation logic behaved as expected.

A standalone multimodal trajectory example (X-ray + crop tool) can be contributed in a follow-up PR if desired.

Breaking Changes / Risks

No breaking changes to public APIs.
Trajectory aggregation for multimodal mRoPE models is now enabled under stricter validation. In rare cases where token prefixes match but image prefixes do not, aggregation will be skipped to preserve correctness.

… multimodal prefix checks for trajectory merge

jackhu-bme · 2026-01-29T08:04:55Z

@microsoft-github-policy-service agree

Copilot

Pull request overview

This PR enables trajectory aggregation for multimodal models using M-RoPE (Multi-Resolution Rope) position embeddings, which was previously blocked by an assertion. The change adds multimodal state tracking and validation to ensure correctness when merging multi-turn trajectories that contain image inputs.

Changes:

Removed hard assertion blocking trajectory aggregation for M-RoPE models
Added image URL prefix matching during trajectory merge to ensure consistency across turns
Implemented image grid metadata propagation from merged trajectories for correct M-RoPE position computation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T08:08:09Z

agentlightning/verl/daemon.py

+        try:
+            with open(path, "rb") as handle:
+                data = handle.read()
+            import hashlib


The import statement for hashlib should be placed at the top of the file with other imports, rather than being imported locally within the function. This is inconsistent with Python best practices and the codebase's import patterns.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-01-29T08:08:10Z

agentlightning/verl/daemon.py

+        import base64
+        import hashlib


The import statements for base64 and hashlib should be placed at the top of the file with other imports, rather than being imported locally within the function. This is inconsistent with Python best practices and the codebase's import patterns.

Copilot · 2026-01-29T08:08:10Z

agentlightning/verl/daemon.py

+                    image_prefix_ok = image_urls_startswith(trace.get("image_urls", []), current_image_urls)
+                    if not image_prefix_ok:
+                        image_mismatch_count += 1
+                        if self.trace_aggregator.get("debug", False) == True:


The condition self.trace_aggregator.get("debug", False) == True is redundant. Since the get method returns a boolean, the explicit comparison to True is unnecessary and less Pythonic. The condition should be simplified to self.trace_aggregator.get("debug", False).

Copilot · 2026-01-29T08:08:10Z

agentlightning/verl/daemon.py

+                                turn_index,
+                                self.trace_aggregator.get("mismatch_log_dir", None),
+                            )
+                    if not token_prefix_ok and self.trace_aggregator.get("debug", False) == True:


The condition self.trace_aggregator.get("debug", False) == True is redundant. Since the get method returns a boolean, the explicit comparison to True is unnecessary and less Pythonic. The condition should be simplified to self.trace_aggregator.get("debug", False). This is consistent with the same pattern that should be fixed on line 998.

Support of trajectory aggregation for mrope multimodal model, and add…

d5fb957

… multimodal prefix checks for trajectory merge

Copilot AI review requested due to automatic review settings January 29, 2026 08:02

Copilot started reviewing on behalf of jackhu-bme January 29, 2026 08:02 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of trajectory aggregation for mrope multimodal model, and add multimodal prefix checks for trajectory merge #469

Support of trajectory aggregation for mrope multimodal model, and add multimodal prefix checks for trajectory merge #469

jackhu-bme commented Jan 29, 2026

Uh oh!

jackhu-bme commented Jan 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

jackhu-bme Jan 30, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Support of trajectory aggregation for mrope multimodal model, and add multimodal prefix checks for trajectory merge #469

Are you sure you want to change the base?

Support of trajectory aggregation for mrope multimodal model, and add multimodal prefix checks for trajectory merge #469

Conversation

jackhu-bme commented Jan 29, 2026

Summary

Motivation

Changes Made

Testing

Breaking Changes / Risks

Uh oh!

jackhu-bme commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

jackhu-bme Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jackhu-bme commented Jan 29, 2026 •

edited

Loading