Various fused ops around expert selection #840

ikawrakow · 2025-10-19T12:01:49Z

Nothing earth shattering, just 1-2% performance gains.

but CUDA is somehow not 100% correct as I get a slightly different PPL (lower!)

Something is not right and I don't see the bug. On the CPU one doesn't gain much if anything, so not a big loss.

ikawrakow · 2025-10-19T16:02:17Z

OK, here some graphs for Ling-mini-2.0-Q4_K_M (RTX-4080) showing the cumulative effect of the changes since grouped expert routing was added in Ling/Ring in #838. Also shown results for the not yet merged PR 16063 in mainline. u-batch size is 2048 and FA is on.

Prompt processing

Token generation

magikRUKKOLA · 2025-10-19T19:46:17Z

@ikawrakow
Is it possible to use Ling-mini-2.0-Q2_K.gguf as a draft model for speculative decoding with ubergarm/smol-IQ4-KSS ?

ikawrakow · 2025-10-20T09:25:09Z

Is it possible to use Ling-mini-2.0-Q2_K.gguf as a draft model for speculative decoding with ubergarm/smol-IQ4-KSS ?

If the vocabulary is the same it should be possible. But I haven't tried myself.

Iwan Kawrakow added 9 commits October 18, 2025 10:09

Fuse sigmoid+add+grouped_topk+get_rows (CPU)

8f5f93e

Fix CPU + CUDA

2c66dc8

but CUDA is somehow not 100% correct as I get a slightly different PPL (lower!)

Minor

f3ff1a5

Fuse sigmoid+add+topk+get_rows (CUDA)

8fe2bb9

Fuse sigmoid+add+topk+get_rows (CPU)

18d9f4f

Fuse topk+view+get_rows+reshape+softmax (CPU)

c8ed454

Fuse topk+view+get_rows+reshape+softmax (CUDA)

b79aad9

cpu: turn off the openai topk fusing for now

0fb9d49

Something is not right and I don't see the bug. On the CPU one doesn't gain much if anything, so not a big loss.

Also fuse sum_rows and div

1d70b89

ikawrakow merged commit 28d3e63 into main Oct 19, 2025

ikawrakow mentioned this pull request Oct 23, 2025

Faster tensor name formatting #860

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Various fused ops around expert selection #840

Various fused ops around expert selection #840

Uh oh!

ikawrakow commented Oct 19, 2025

Uh oh!

ikawrakow commented Oct 19, 2025

Uh oh!

magikRUKKOLA commented Oct 19, 2025

Uh oh!

ikawrakow commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Various fused ops around expert selection #840

Various fused ops around expert selection #840

Uh oh!

Conversation

ikawrakow commented Oct 19, 2025

Uh oh!

ikawrakow commented Oct 19, 2025

Prompt processing

Token generation

Uh oh!

magikRUKKOLA commented Oct 19, 2025

Uh oh!

ikawrakow commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants