Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

See #15669.

I'll push tests in a separate PR because I suspect they'll break other backends.

@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner August 30, 2025 19:15
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 30, 2025
@0cc4m
Copy link
Collaborator

0cc4m commented Aug 31, 2025

I see no issues. Looks good.

gpu_info backends model_type model_size fa test avg_ts(master) avg_ts(pr) %
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan gpt-oss 20B Q8_0 11.27 GiB 0 pp512 573.07 572.15 -0.2%
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan gpt-oss 20B Q8_0 11.27 GiB 0 tg128 108.78 109.03 +0.2%
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan llama 13B Q4_0 12.56 GiB 0 pp512 260.80 260.26 -0.2%
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan llama 13B Q4_0 12.56 GiB 0 tg128 26.62 26.55 -0.2%
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan llama 7B Q4_0 3.56 GiB 0 pp512 835.60 831.11 -0.5%
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan llama 7B Q4_0 3.56 GiB 0 tg128 80.48 80.30 -0.2%
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan llama 8B Q4_K - Small 4.36 GiB 0 pp512 293.51 292.27 -0.4%
AMD Radeon (TM) Pro VII (RADV VEGA20) Vulkan llama 8B Q4_K - Small 4.36 GiB 0 tg128 71.97 71.54 -0.6%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan gpt-oss 20B Q8_0 11.27 GiB 0 pp512 184.80 184.76 -0.0%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan gpt-oss 20B Q8_0 11.27 GiB 0 tg128 20.03 20.02 -0.1%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan llama 13B Q4_0 12.56 GiB 0 pp512 276.56 273.86 -1.0%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan llama 13B Q4_0 12.56 GiB 0 tg128 16.47 16.49 +0.1%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan llama 7B Q4_0 3.56 GiB 0 pp512 658.03 657.06 -0.1%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan llama 7B Q4_0 3.56 GiB 0 tg128 46.83 46.72 -0.2%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan llama 8B Q4_K - Small 4.36 GiB 0 pp512 100.39 100.39 -0.0%
Intel(R) Arc(tm) A770 Graphics (DG2) Vulkan llama 8B Q4_K - Small 4.36 GiB 0 tg128 30.55 30.52 -0.1%
NVIDIA GeForce RTX 3090 Vulkan gpt-oss 20B Q8_0 11.27 GiB 0 pp512 3832.77 3816.03 -0.4%
NVIDIA GeForce RTX 3090 Vulkan gpt-oss 20B Q8_0 11.27 GiB 0 tg128 147.77 146.95 -0.6%
NVIDIA GeForce RTX 3090 Vulkan llama 13B Q4_0 12.56 GiB 0 pp512 1722.12 1718.14 -0.2%
NVIDIA GeForce RTX 3090 Vulkan llama 13B Q4_0 12.56 GiB 0 tg128 52.11 51.55 -1.1%
NVIDIA GeForce RTX 3090 Vulkan llama 7B Q4_0 3.56 GiB 0 pp512 4406.60 4371.66 -0.8%
NVIDIA GeForce RTX 3090 Vulkan llama 7B Q4_0 3.56 GiB 0 tg128 143.50 143.31 -0.1%
NVIDIA GeForce RTX 3090 Vulkan llama 8B Q4_K - Small 4.36 GiB 0 pp512 4492.99 4488.97 -0.1%
NVIDIA GeForce RTX 3090 Vulkan llama 8B Q4_K - Small 4.36 GiB 0 tg128 118.45 117.96 -0.4%

@0cc4m 0cc4m merged commit bbbf5ec into ggml-org:master Aug 31, 2025
46 of 48 checks passed
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants