Skip to content

Conversation

@ikawrakow
Copy link
Owner

Nothing earth shattering, just 1-2% performance gains.

@ikawrakow
Copy link
Owner Author

OK, here some graphs for Ling-mini-2.0-Q4_K_M (RTX-4080) showing the cumulative effect of the changes since grouped expert routing was added in Ling/Ring in #838. Also shown results for the not yet merged PR 16063 in mainline. u-batch size is 2048 and FA is on.

Prompt processing

u1

Token generation

u1a

@ikawrakow ikawrakow merged commit 28d3e63 into main Oct 19, 2025
@magikRUKKOLA
Copy link

@ikawrakow
Is it possible to use Ling-mini-2.0-Q2_K.gguf as a draft model for speculative decoding with ubergarm/smol-IQ4-KSS ?

@ikawrakow
Copy link
Owner Author

Is it possible to use Ling-mini-2.0-Q2_K.gguf as a draft model for speculative decoding with ubergarm/smol-IQ4-KSS ?

If the vocabulary is the same it should be possible. But I haven't tried myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants