Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

infinitalo · 2025-10-13T07:12:25Z

Make sure to read the contributing guidelines before submitting a PR

ggml/src/ggml-vulkan/ggml-vulkan.cpp

ggml/src/ggml-vulkan/vulkan-shaders/out_prod_q8_0.comp

infinitalo · 2025-10-16T20:40:29Z

@olyasir I've addressed your comments and pushed some cleanups, please feel free to give it another look.

I've also edited the PR to merge into temp-latest-finetuning instead of temp-latest.

gianni-cor · 2025-10-24T11:25:07Z

@infinitalo can you please also check the failed pipelines?

Shouldn't change any behavior since currently nb00 is always 1. Robustness is usually disabled for Q8/Q4 shaders since having it enabled impacts performance more significantly for those types than F16/F32.

Introduce a CMAKE option for disabling Adreno-specific shaders if needed, this improves build time, but should not be used when targeting Adreno devices.

Extend device detection to classify Qualcomm Adreno GPUs, enabling targeted workarounds and shader selection when those devices are present.

Avoid subgroup operations on Adreno by selecting safer paths to sidestep compiler/driver bugs while preserving behavior.

Similar to what we do for other vendors such as Intel.

This optimization broke inference on Adreno.

Add build-time generation of Adreno-targeted shader variants under a guard, so Adreno devices use safer code paths while other GPUs remain unaffected.

Increase OUT_PROD Q8 performance through improving memory locality.

This makes finetuning work without crashing on Adreno 830.

Signed-off-by: Marcus Edel <[email protected]>

github-actions bot added Nvidia GPU Vulkan examples ggml testing build devops python script android labels Oct 13, 2025

infinitalo force-pushed the italo/adreno_inference branch 11 times, most recently from 082fa25 to 0f376cd Compare October 14, 2025 18:33

infinitalo mentioned this pull request Oct 15, 2025

WIP: llama: Vulkan: Fix Adreno Q8_0 issues. #11

Closed

olyasir requested changes Oct 16, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Show resolved Hide resolved

ggml/src/ggml-vulkan/vulkan-shaders/out_prod_q8_0.comp Outdated Show resolved Hide resolved

ggml/src/ggml-vulkan/vulkan-shaders/out_prod_q8_0.comp Outdated Show resolved Hide resolved

infinitalo force-pushed the italo/adreno_inference branch from 0f376cd to 0f6d6da Compare October 16, 2025 20:15

infinitalo changed the base branch from temp-latest to temp-latest-finetuning October 16, 2025 20:39

infinitalo force-pushed the italo/adreno_inference branch 2 times, most recently from e40fed4 to 3f57c82 Compare October 21, 2025 18:20

olyasir approved these changes Oct 22, 2025

View reviewed changes

olyasir approved these changes Oct 24, 2025

View reviewed changes

Italo Nicola added 11 commits October 24, 2025 18:12

Vulkan: add large tests

119c469

Vulkan: Clean up OUT_PROD shader and pipelines

81cae36

Shouldn't change any behavior since currently nb00 is always 1. Robustness is usually disabled for Q8/Q4 shaders since having it enabled impacts performance more significantly for those types than F16/F32.

Vulkan: Add build option for Adreno-specific fixes

74d3f0c

Introduce a CMAKE option for disabling Adreno-specific shaders if needed, this improves build time, but should not be used when targeting Adreno devices.

Vulkan: Add QUALCOMM_ADRENO to vk_device_architecture list

6fc930e

Extend device detection to classify Qualcomm Adreno GPUs, enabling targeted workarounds and shader selection when those devices are present.

Vulkan: disable subgroups on Adreno

dc97c4f

Avoid subgroup operations on Adreno by selecting safer paths to sidestep compiler/driver bugs while preserving behavior.

Vulkan: disable mul_mat_l on adreno

9354382

Similar to what we do for other vendors such as Intel.

Vulkan: disable rms_norm fusion on Adreno

1bdabf7

This optimization broke inference on Adreno.

Vulkan: generate Adreno-specific shader variants

25e2579

Add build-time generation of Adreno-targeted shader variants under a guard, so Adreno devices use safer code paths while other GPUs remain unaffected.

Vulkan: Improve Q8 OUT_PROD performance

32246b9

Increase OUT_PROD Q8 performance through improving memory locality.

Vulkan: Implement MUL_MAT tiling workaround

294942b

This makes finetuning work without crashing on Adreno 830.

Vulkan: Add Q4_K Adreno variant for mul_mat_vec

2c9a0e8

infinitalo force-pushed the italo/adreno_inference branch from 3f57c82 to 2c9a0e8 Compare October 24, 2025 21:14

ci: increase llvmpipe tests timeout from 4200 to 6200

808e097

gianni-cor self-requested a review October 27, 2025 16:36

gianni-cor approved these changes Oct 27, 2025

View reviewed changes

zoq added 2 commits October 27, 2025 14:29

Fix Q4 OUT_PROD iq upper handling.

2cbf514

Signed-off-by: Marcus Edel <[email protected]>

Make sure we only return the supported types for the GET_ROWS kernel.

c94702b

Signed-off-by: Marcus Edel <[email protected]>

github-actions bot added the Apple Metal label Oct 27, 2025

olyasir merged commit f9e7293 into tetherto:temp-latest-finetuning Oct 27, 2025
39 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Uh oh!

infinitalo commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

infinitalo commented Oct 16, 2025 •

edited

Loading

Uh oh!

gianni-cor commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Fixes for Adreno inference (Q8/Q4) and OUT_PROD #34

Uh oh!

Conversation

infinitalo commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

infinitalo commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gianni-cor commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

infinitalo commented Oct 16, 2025 •

edited

Loading

gianni-cor commented Oct 24, 2025 •

edited

Loading