[Bug] Sequence Packing of Qwen3-VL

**Describe the bug**

Hi devs! Recently we attempt to train Megatron-Bridge-based Qwen3-VL with slime and find large logprobs diff between train and rollout backends. After deeply investigating the modeling, we find that:

1. `get_rope_index` called in `model.py` cannot correctly process packed input_ids.
2. `_apply_rotary_pos_emb_thd` only support RoPE of traditional mapping without offsets when `apply_rope_fusion` is enabled. `--no-rope-fusion` must be set to get the correct behavior.

Here is a draft patch [fix](https://gist.github.com/lostkevin/fce38435f3b64fe35b47e777ad22c98c) (for dense and MoE, not strictly tested) on 5c7ebe7d8c31e75ddf882e1e6ced69d10e250786 .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Sequence Packing of Qwen3-VL #1776

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Sequence Packing of Qwen3-VL #1776

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions