[Bug] Intermittent segfault in Triton MoE kernel during piecewise CUDA graph warmup on B200

## Description

During piecewise CUDA graph warmup compilation for `nvidia/Qwen3.5-397B-A17B-NVFP4` (4-GPU, FP4 quantization) on B200, a segmentation fault occurs in the Triton NVIDIA driver backend while executing the fused MoE experts kernel.

The crash happens at ~93% of the "Compiling num tokens" phase (69/74 iterations).

## Error Stack Trace

```
Fatal Python error: Segmentation fault

Current thread (most recent call first):
  File "triton/backends/nvidia/driver.py", line 668 in inner
  File "triton/backends/nvidia/driver.py", line 712 in __call__
  File "triton/runtime/jit.py", line 757 in run
  File "triton_kernels/matmul_ogs.py", line 467 in matmul_ogs
  File "sglang/srt/layers/moe/fused_moe_triton/triton_kernels_moe.py", line 306 in triton_kernel_fused_experts_with_bias
  File "sglang/srt/layers/moe/moe_runner/triton_kernels.py", line 115 in run
  File "sglang/srt/layers/moe/moe_runner/runner.py", line 117 in run
  File "sglang/srt/layers/quantization/unquant.py", line 423 in forward_cuda
  File "sglang/srt/layers/moe/fused_moe_triton/layer.py", line 1034 in run_moe_core
  File "sglang/srt/layers/moe/fused_moe_triton/layer.py", line 1013 in forward_impl
  File "sglang/srt/models/gpt_oss.py", line 269 in moe_impl
  ...
  File "sglang/srt/model_executor/piecewise_cuda_graph_runner.py", line 406 in warmup_compile
  File "sglang/srt/model_executor/piecewise_cuda_graph_runner.py", line 309 in __init__
  File "sglang/srt/model_executor/model_runner.py", line 2450 in init_piecewise_cuda_graphs
```

## Environment

- **GPU**: NVIDIA B200
- **Model**: `nvidia/Qwen3.5-397B-A17B-NVFP4` (FP4 quantization, TP=4)
- **Attention backend**: `trtllm_mha`
- **CI job**: `stage-c-test-4-gpu-b200 (0)` in [PR Test run](https://github.com/sgl-project/sglang/actions/runs/23680918199/attempts/7?pr=19915)

## Analysis

The segfault originates in the Triton NVIDIA driver backend (`triton/backends/nvidia/driver.py:668`) during JIT compilation/execution of the MoE `matmul_ogs` kernel. This appears to be a Triton + B200 (SM100) driver-level issue during CUDA graph warmup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Intermittent segfault in Triton MoE kernel during piecewise CUDA graph warmup on B200 #21629

Description

Error Stack Trace

Environment

Analysis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Intermittent segfault in Triton MoE kernel during piecewise CUDA graph warmup on B200 #21629

Description

Description

Error Stack Trace

Environment

Analysis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions