Skip to content

[Bug] Sequence Packing of Qwen3-VLΒ #1776

@lostkevin

Description

@lostkevin

Describe the bug

Hi devs! Recently we attempt to train Megatron-Bridge-based Qwen3-VL with slime and find large logprobs diff between train and rollout backends. After deeply investigating the modeling, we find that:

  1. get_rope_index called in model.py cannot correctly process packed input_ids.
  2. _apply_rotary_pos_emb_thd only support RoPE of traditional mapping without offsets when apply_rope_fusion is enabled. --no-rope-fusion must be set to get the correct behavior.

Here is a draft patch fix (for dense and MoE, not strictly tested) on 5c7ebe7 .

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions