Skip to content

[Bug] modelopt_quant.py: NVFP4 input_scale not sliced to local experts with EP > 1 #21602

@vroomfondel

Description

@vroomfondel

Bug Description

ModelOptNvFp4FusedMoEMethod.process_weights_after_loading() in modelopt_quant.py crashes with EP > 1 in the else branch (when none of enable_flashinfer_cutlass_moe, enable_flashinfer_trtllm_moe, enable_flashinfer_cutedsl_moe is active).

w13_input_scale and w2_input_scale are allocated globally (num_experts) but multiplied against EP-local w13_weight_scale_2 (num_local_experts), causing a shape mismatch.

The cutedsl branch handles this correctly via _slice_scale(), but that helper is scoped inside the elif block and not reachable from else.

Reproduction

  • Model: nvidia/MiniMax-M2.5-NVFP4 (256 experts)
  • Config: TP=2, EP=2, no explicit MoE runner backend (hits the else branch)
  • SGLang version: 0.5.9-dev2 (commit acab24a), also reproducible on current main

Error

File ".../sglang/srt/layers/quantization/modelopt_quant.py", line 1560, in process_weights_after_loading
    (w13_input_scale * w13_weight_scale_2).to(torch.float32),
RuntimeError: The size of tensor a (256) must match the size of tensor b (128) at non-singleton dimension 0

Suggested Fix

Add EP-aware slicing in the else branch, same logic as _slice_scale():

        else:
            w13_input_scale = layer.w13_input_scale.max(dim=-1).values.to(torch.float32)
            w2_input_scale = layer.w2_input_scale
            # EP-aware slicing (no-op when ep_size=1)
            if layer.moe_ep_size > 1:
                _ep_start = layer.moe_ep_rank * layer.num_local_experts
                _ep_end = _ep_start + layer.num_local_experts
                w13_input_scale = w13_input_scale[_ep_start:_ep_end]
                w2_input_scale = w2_input_scale[_ep_start:_ep_end]

Note

PR #20963 (Nvidia modelopt refactoring) is currently migrating this code as-is into modelopt/schemes/modelopt_fp4.py — the bug will carry over unless fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions