[Bug] modelopt_quant.py: NVFP4 input_scale not sliced to local experts with EP > 1

## Bug Description

`ModelOptNvFp4FusedMoEMethod.process_weights_after_loading()` in `modelopt_quant.py` crashes with EP > 1 in the `else` branch (when none of `enable_flashinfer_cutlass_moe`, `enable_flashinfer_trtllm_moe`, `enable_flashinfer_cutedsl_moe` is active).

`w13_input_scale` and `w2_input_scale` are allocated globally (`num_experts`) but multiplied against EP-local `w13_weight_scale_2` (`num_local_experts`), causing a shape mismatch.

The `cutedsl` branch handles this correctly via `_slice_scale()`, but that helper is scoped inside the `elif` block and not reachable from `else`.

## Reproduction

- **Model**: `nvidia/MiniMax-M2.5-NVFP4` (256 experts)
- **Config**: TP=2, EP=2, no explicit MoE runner backend (hits the `else` branch)
- **SGLang version**: 0.5.9-dev2 (commit acab24a7), also reproducible on current main

## Error

```
File ".../sglang/srt/layers/quantization/modelopt_quant.py", line 1560, in process_weights_after_loading
    (w13_input_scale * w13_weight_scale_2).to(torch.float32),
RuntimeError: The size of tensor a (256) must match the size of tensor b (128) at non-singleton dimension 0
```

## Suggested Fix

Add EP-aware slicing in the `else` branch, same logic as `_slice_scale()`:

```python
        else:
            w13_input_scale = layer.w13_input_scale.max(dim=-1).values.to(torch.float32)
            w2_input_scale = layer.w2_input_scale
            # EP-aware slicing (no-op when ep_size=1)
            if layer.moe_ep_size > 1:
                _ep_start = layer.moe_ep_rank * layer.num_local_experts
                _ep_end = _ep_start + layer.num_local_experts
                w13_input_scale = w13_input_scale[_ep_start:_ep_end]
                w2_input_scale = w2_input_scale[_ep_start:_ep_end]
```

## Note

PR #20963 (Nvidia modelopt refactoring) is currently migrating this code as-is into `modelopt/schemes/modelopt_fp4.py` — the bug will carry over unless fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] modelopt_quant.py: NVFP4 input_scale not sliced to local experts with EP > 1 #21602

Bug Description

Reproduction

Error

Suggested Fix

Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] modelopt_quant.py: NVFP4 input_scale not sliced to local experts with EP > 1 #21602

Description

Bug Description

Reproduction

Error

Suggested Fix

Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions