Hexagon Op queue & dispatch optimizations #16820

max-krasnyansky · 2025-10-28T15:47:33Z

Optimize dspqueue (used for sending Op requests to the NPU) by doing all response processing in-place.
This removes the need for the dedicated read threads (started internally by the dspqueue library) and essentially
eliminates all polling in that path.

(For the curious, the dspqueue CPU side sources can be found here
https://github.com/qualcomm/fastrpc/tree/main/src/dspqueue)

We can also bump the CPU backend thread counts now for the default use-cases since we still rely on the
CPU for Flash Attention and a few other Ops.

We're not going to release the buffers without flushing the session queue. So there is no need to inc/dec the refcounts for every request. We also don't need to include those bufs in the response.

We can use more CPU cores now that the dedicated dspqueue polling threads are not used (ie no contention). Also enable more agressive polling for now since we still map Flash Attention (and a few other kernels) to the CPU and those dspqueue threads were keeping the CPU cores are higher clock freqs.

max-krasnyansky · 2025-10-28T15:50:30Z

@l3utterfly this would be interesting for your use-case (ie APK with ggml-hexagon enabled).
Please give this a shot when you get the chance.

l3utterfly · 2025-10-28T16:00:27Z

@max-krasnyansky Thank you! I will test this out!

lhez

Looks good to me

lhez · 2025-10-29T05:48:45Z

Some server tests are failing, but should be unrelated.

max-krasnyansky added 3 commits October 28, 2025 08:28

hexagon: remove dspqueue callbacks and do all read processing inplace

2b86354

hexagon: there is no need to ref/deref the buffers at this point

ac7a334

We're not going to release the buffers without flushing the session queue. So there is no need to inc/dec the refcounts for every request. We also don't need to include those bufs in the response.

max-krasnyansky requested a review from lhez October 28, 2025 15:52

hexagon: add lhez as the second code owner

d884764

github-actions bot added script Script related ggml changes relating to the ggml tensor library for machine learning labels Oct 28, 2025

lhez approved these changes Oct 29, 2025

View reviewed changes

max-krasnyansky merged commit 3eb2be1 into ggml-org:master Oct 29, 2025
75 of 83 checks passed

max-krasnyansky deleted the hexagon-dspqueue-opts branch October 29, 2025 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hexagon Op queue & dispatch optimizations #16820

Hexagon Op queue & dispatch optimizations #16820

max-krasnyansky commented Oct 28, 2025 •

edited

Loading

Uh oh!

max-krasnyansky commented Oct 28, 2025

Uh oh!

l3utterfly commented Oct 28, 2025

Uh oh!

lhez left a comment

Uh oh!

lhez commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hexagon Op queue & dispatch optimizations #16820

Hexagon Op queue & dispatch optimizations #16820

Conversation

max-krasnyansky commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-krasnyansky commented Oct 28, 2025

Uh oh!

l3utterfly commented Oct 28, 2025

Uh oh!

lhez left a comment

Choose a reason for hiding this comment

Uh oh!

lhez commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

max-krasnyansky commented Oct 28, 2025 •

edited

Loading