Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

infinitalo · 2025-10-13T07:12:25Z

Make sure to read the contributing guidelines before submitting a PR

Fix GGML_VK_CHECK_RESULTS build option by properly supporting IM2COL_3D.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

olyasir · 2025-10-16T08:26:01Z

ggml/src/ggml-vulkan/vulkan-shaders/out_prod_q8_0.comp

+        vec2 acc = vec2(0.0);
+
+        for (uint k = 0; k < p.ne01; k++) {
+            if (i0 + 1 >= p.ne20) { // XXX


what does XXX mean? why i0+1? wont it skip last element?

Sorry, I forgot to remove this comment. Usually XXX is used in some codebases to bring attention to some line of code that could be problematic, to be reviewed later. It's similar to TODO or FIXME.

In this case what you asked about i0+1 is the reason I marked it with XXX, so I would verify it worked later.

This shader uses i0 to calculate an index iqs into a Q8 data block. This index is then used to call dequantize(ib, iqs, 0), which retrieves two Q8 values from data_a[] and store them into a vec2, this is why I use i0+1, otherwise the second value would be out of range.

This math here is a bit tricky, since i0 is calculated from get_dst_indices/data_d and indexes into data_a so it could be wrong, but I believe it's correct at the moment. Let me know if you see a mistake.

this is for not accessing 32th value, right? p.ne20 is output index? what if output is larger than 32? also if i0 is 31 isn't it still valid (only 32 not valid)?

for example:
Scenario 1: p.ne20 = 64 (64 elements)

i0 = 31: Check is 32 >= 64? NO → processes element 31 → BUG (accesses qs[32])

i0 = 63: Check is 64 >= 64? YES → skips element 63 → WRONG RESULT (misses last element)

ggml/src/ggml-vulkan/vulkan-shaders/out_prod_q8_0.comp

Shouldn't change any behavior since currently nb00 is always 1. Robustness is usually disabled for Q8/Q4 shaders since having it enabled impacts performance more significantly for those types than F16/F32.

Introduce a CMAKE option for disabling Adreno-specific shaders if needed, this improves build time, but should not be used when targeting Adreno devices.

Extend device detection to classify Qualcomm Adreno GPUs, enabling targeted workarounds and shader selection when those devices are present.

Avoid subgroup operations on Adreno by selecting safer paths to sidestep compiler/driver bugs while preserving behavior.

Similar to what we do for other vendors such as Intel.

This optimization broke inference on Adreno.

Add build-time generation of Adreno-targeted shader variants under a guard, so Adreno devices use safer code paths while other GPUs remain unaffected.

Increase OUT_PROD Q8 performance through improving memory locality.

This makes finetuning work without crashing on Adreno 830.

infinitalo · 2025-10-16T20:40:29Z

@olyasir I've addressed your comments and pushed some cleanups, please feel free to give it another look.

I've also edited the PR to merge into temp-latest-finetuning instead of temp-latest.

github-actions bot added Nvidia GPU Vulkan examples ggml testing build devops python script android labels Oct 13, 2025

infinitalo force-pushed the italo/adreno_inference branch 10 times, most recently from c53efd4 to 082fa25 Compare October 14, 2025 18:32

Vulkan: Fix IM2COL_3D vk_check_results

8a3af34

Fix GGML_VK_CHECK_RESULTS build option by properly supporting IM2COL_3D.

infinitalo force-pushed the italo/adreno_inference branch from 082fa25 to 0f376cd Compare October 14, 2025 18:33

infinitalo mentioned this pull request Oct 15, 2025

WIP: llama: Vulkan: Fix Adreno Q8_0 issues. #11

Closed

olyasir requested changes Oct 16, 2025

View reviewed changes

Italo Nicola added 6 commits October 16, 2025 16:31

Vulkan: Clean up OUT_PROD shader and pipelines

45ec3b5

Shouldn't change any behavior since currently nb00 is always 1. Robustness is usually disabled for Q8/Q4 shaders since having it enabled impacts performance more significantly for those types than F16/F32.

Vulkan: Add build option for Adreno-specific fixes

20662e2

Introduce a CMAKE option for disabling Adreno-specific shaders if needed, this improves build time, but should not be used when targeting Adreno devices.

Vulkan: Add QUALCOMM_ADRENO to vk_device_architecture list

54a58b7

Extend device detection to classify Qualcomm Adreno GPUs, enabling targeted workarounds and shader selection when those devices are present.

Vulkan: disable subgroups on Adreno

a863b05

Avoid subgroup operations on Adreno by selecting safer paths to sidestep compiler/driver bugs while preserving behavior.

Vulkan: disable mul_mat_l on adreno

dcf4558

Similar to what we do for other vendors such as Intel.

Vulkan: disable rms_norm fusion on Adreno

9742768

This optimization broke inference on Adreno.

Italo Nicola added 3 commits October 16, 2025 16:32

Vulkan: generate Adreno-specific shader variants

bb65a52

Add build-time generation of Adreno-targeted shader variants under a guard, so Adreno devices use safer code paths while other GPUs remain unaffected.

Vulkan: Improve Q8 OUT_PROD performance

1721689

Increase OUT_PROD Q8 performance through improving memory locality.

Vulkan: Implement MUL_MAT tiling workaround

0f6d6da

This makes finetuning work without crashing on Adreno 830.

infinitalo force-pushed the italo/adreno_inference branch from 0f376cd to 0f6d6da Compare October 16, 2025 20:15

infinitalo changed the base branch from temp-latest to temp-latest-finetuning October 16, 2025 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Uh oh!

infinitalo commented Oct 13, 2025

Uh oh!

Uh oh!

olyasir Oct 16, 2025

Uh oh!

infinitalo Oct 16, 2025

Uh oh!

olyasir Oct 17, 2025

Uh oh!

Uh oh!

infinitalo commented Oct 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Are you sure you want to change the base?

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Uh oh!

Conversation

infinitalo commented Oct 13, 2025

Uh oh!

Uh oh!

olyasir Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

infinitalo Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

olyasir Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

infinitalo commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

infinitalo commented Oct 16, 2025 •

edited

Loading