Skip to content

Commit cd33db6

Browse files
authored
bugfix: Verify num_experts greater or equal to local_experts + offset (#1469)
## 📌 Description Verify that `num_experts >= local_num_experts + local_expert_offset` to avoid Illegal memory access. Currently when calling `fused_moe.trtllm_fp8_per_tensor_scale_moe` with `local_num_experts+local_expert_offset > num_experts`, it results in a `CUDA: Illegal memory access`. Signed-off-by: Amir Klein <[email protected]>
1 parent fe442a2 commit cd33db6

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

csrc/trtllm_fused_moe_kernel_launcher.cu

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,8 @@ at::Tensor trtllm_fp8_per_tensor_scale_moe_launcher(
9696
TORCH_CHECK(num_experts % 4 == 0,
9797
"Routing kernel expects that num_experts must be divisible by 4");
9898
TORCH_CHECK(num_experts > top_k, "num_experts must be greater than top_k");
99+
TORCH_CHECK(local_num_experts + local_expert_offset <= num_experts,
100+
"num_experts must be greater or equal to local_num_experts + local_expert_offset");
99101

100102
tensorrt_llm::kernels::trtllmgen_moe::MoE::MoERunnerArgs args;
101103
tensorrt_llm::kernels::trtllmgen_moe::MoE::MoEWorkspace workspace;

0 commit comments

Comments
 (0)