Skip to content

Conversation

@am17an
Copy link
Collaborator

@am17an am17an commented Oct 29, 2025

Helpful in TG

Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes

Model Test t/s master t/s set-row Speedup
gpt-oss 20B MXFP4 MoE tg128 186.10 187.25 1.01
gpt-oss 20B MXFP4 MoE tg256 184.63 187.19 1.01
gpt-oss 20B MXFP4 MoE tg512 183.95 184.45 1.00
qwen3moe 30B.A3B Q4_0 tg128 161.55 163.48 1.01
qwen3moe 30B.A3B Q4_0 tg256 162.39 163.33 1.01
qwen3moe 30B.A3B Q4_0 tg512 159.14 160.86 1.01

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 29, 2025
@am17an am17an requested a review from slaren as a code owner October 29, 2025 10:56
@am17an am17an merged commit e41bcce into ggml-org:master Oct 29, 2025
68 of 69 checks passed
@am17an am17an deleted the cuda-fast-div-setrows branch October 29, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants