Skip to content

Commit f5df544

Browse files
committed
Revert "Heuristics for mmq_id -> original threshold (ikawrakow#734)"
This reverts commit 966a6ce.
1 parent c0d27ea commit f5df544

File tree

1 file changed

+1
-9
lines changed

1 file changed

+1
-9
lines changed

ggml/src/ggml-cuda.cu

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2687,15 +2687,7 @@ static bool ggml_cuda_moe_up_gate_unary(ggml_backend_cuda_context & ctx, ggml_te
26872687

26882688
ggml_tensor dst_row = *dst;
26892689

2690-
// The heuristics src1->ne[2] <= 32*src0->ne[2] to use the mul_mat_id implementation instead of the original version
2691-
// is derived from
2692-
// * DeepSeek-Lite: 64 total, 6 active experts
2693-
// * GPT-OSS-20B : 32 total, 4 active experts
2694-
// * Qwen3-30B-A3B: 128 total, 8 active experts
2695-
// My original hypothesis was that it is dependent on the total/active experts ratio, but from these 3 it
2696-
// looks like it really depends just on the total number of experts.
2697-
// TODO: verify with more models, or perhaps make the magic constant '32' to be defined via a compile time define.
2698-
if (src1->ne[2] <= 32*src0->ne[2] &&
2690+
if (src1->ne[2] <= 2048 && // TODO: this depends on number of total vs number of active experts -> need to find optimum threshod
26992691
ggml_is_quantized(src0_1->type) && src0_1->type == src0_2->type && src1->ne[1] == 1 && src1->ne[3] == 1 &&
27002692
ggml_cuda_can_use_mmq_id(src0_1->type, ggml_cuda_info().devices[ctx.device].cc, src1->ne[2])) {
27012693

0 commit comments

Comments
 (0)