Hexagon dspqueue opts #83

l3utterfly · 2025-10-29T03:58:59Z

Make sure to read the contributing guidelines before submitting a PR

* model : add LightOnOCR-1B model * add test

* ggml : fix interpolate with align-corners and ne=1 * avoid division by zero if one of the spatial dimensions is 1 * cpu, cuda, opencl returned correct result anyway due to clamp * vulkan didn't clamp for align-corners so results were broken * fix clang warning

…ls (ggml-org#16748)

* mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite

@ykhrustalev

* Add LFM2 tool handling * fmt * Apply suggestion from @ykhrustalev

* feat: Add SYCL backend support for SSM_CONV operator * Implement State Space Model Convolution 1D for SYCL backend * Add optimized GPU kernel with parallel work distribution * Support various tensor dimensions and batch sizes * Full integration with existing SYCL infrastructure * All tests pass with CPU backend equivalence verification * feat: Implement SYCL backend support for SSM_CONV operation - Add ggml-sycl/ssm_conv.cpp and ssm_conv.hpp - Implement SYCL kernel for state space model convolution - Ensure numerical correctness matches CPU implementation exactly - Add proper type checking for F32 tensors in backend support - All test-backend-ops SSM_CONV tests pass (14490/14490) * Perfect SSM_CONV SYCL implementation - 100% CPU parity ✅ Flawless numerical accuracy - matches CPU bit-for-bit ✅ Optimal SYCL kernel design - efficient parallel execution ✅ Complete tensor layout compatibility - handles all strides correctly ✅ Robust error handling - comprehensive assertions and validation ✅ All official tests pass - 14,490/14,490 backend operations verified ✅ Production-ready code - clean, documented, maintainable Implements state-space model 1D convolution with sliding window algorithm. Eliminates blocking queue.wait() for better async performance. * Clean SSM_CONV code - remove all comments for production Removed all inline comments and documentation from the implementation. Clean, minimal code ready for production merge. * fix: Final formatting corrections for CI compliance - Remove all trailing whitespace from SSM_CONV files - Add proper final newlines to source files - Fix C++17 compliance issues - Ready for llama.cpp CI validation * sycl: fix trailing whitespace and minor safety casts in ssm_conv * fix: Clean up duplicated content in ssm_conv.hpp header file --------- Co-authored-by: tamarPal <[email protected]>

* cann: improve device ID handling and aclnnArange checks - Stop relying on CANN's internal device ID retrieval; use a global variable instead. - Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions. * cann: use thread local var

* grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Add --embd-output-format raw for plain numeric embedding output This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting. * Move raw output handling into format handling section * Move raw output handling into else-if block with other format handlers * Use LOG instead of printf for raw embedding output * docs: document 'raw' embedding output format in arg.cpp and README

We're not going to release the buffers without flushing the session queue. So there is no need to inc/dec the refcounts for every request. We also don't need to include those bufs in the response.

We can use more CPU cores now that the dedicated dspqueue polling threads are not used (ie no contention). Also enable more agressive polling for now since we still map Flash Attention (and a few other kernels) to the CPU and those dspqueue threads were keeping the CPU cores are higher clock freqs.

ngxson and others added 17 commits October 27, 2025 16:02

model : add LightOnOCR-1B model (ggml-org#16764)

c55d53a

* model : add LightOnOCR-1B model * add test

HIP: fix AMDGPU_TARGETS, update documentation (ggml-org#16803)

80d28f1

llama : disable pipeline parallelism if compute buffer allocation fai…

5a4ff43

…ls (ggml-org#16748)

mtmd : fix idefics3 preprocessing (ggml-org#16806)

e1ab084

* mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite

chat: Add LFM2 tool handling (ggml-org#16763)

c053e18

* Add LFM2 tool handling * fmt * Apply suggestion from @ykhrustalev

CUDA: add unused vars to mmvf and mmvq (ggml-org#16807)

463bbf2

llama: consistent ctx <-> buf order for KV cache (ggml-org#16746)

7a0e900

initialise buffer.device in ggml_hexagon_session (ggml-org#16816)

8284efc

hexagon: remove dspqueue callbacks and do all read processing inplace

2b86354

hexagon: there is no need to ref/deref the buffers at this point

ac7a334

We're not going to release the buffers without flushing the session queue. So there is no need to inc/dec the refcounts for every request. We also don't need to include those bufs in the response.

hexagon: add lhez as the second code owner

d884764

l3utterfly merged commit 904fd34 into l3utterfly:hexagon-dspqueue-opts Oct 29, 2025
16 of 66 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing examples python server ggml script Ascend NPU OpenCL labels Oct 29, 2025

max-krasnyansky deleted the hexagon-dspqueue-opts branch October 29, 2025 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hexagon dspqueue opts #83

Hexagon dspqueue opts #83

Uh oh!

l3utterfly commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Hexagon dspqueue opts #83

Hexagon dspqueue opts #83

Uh oh!

Conversation

l3utterfly commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants