forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Hexagon dspqueue opts #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
l3utterfly
merged 17 commits into
l3utterfly:hexagon-dspqueue-opts
from
qualcomm:hexagon-dspqueue-opts
Oct 29, 2025
Merged
Hexagon dspqueue opts #83
l3utterfly
merged 17 commits into
l3utterfly:hexagon-dspqueue-opts
from
qualcomm:hexagon-dspqueue-opts
Oct 29, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* model : add LightOnOCR-1B model * add test
* ggml : fix interpolate with align-corners and ne=1 * avoid division by zero if one of the spatial dimensions is 1 * cpu, cuda, opencl returned correct result anyway due to clamp * vulkan didn't clamp for align-corners so results were broken * fix clang warning
* mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite
* Add LFM2 tool handling * fmt * Apply suggestion from @ykhrustalev
* feat: Add SYCL backend support for SSM_CONV operator * Implement State Space Model Convolution 1D for SYCL backend * Add optimized GPU kernel with parallel work distribution * Support various tensor dimensions and batch sizes * Full integration with existing SYCL infrastructure * All tests pass with CPU backend equivalence verification * feat: Implement SYCL backend support for SSM_CONV operation - Add ggml-sycl/ssm_conv.cpp and ssm_conv.hpp - Implement SYCL kernel for state space model convolution - Ensure numerical correctness matches CPU implementation exactly - Add proper type checking for F32 tensors in backend support - All test-backend-ops SSM_CONV tests pass (14490/14490) * Perfect SSM_CONV SYCL implementation - 100% CPU parity ✅ Flawless numerical accuracy - matches CPU bit-for-bit ✅ Optimal SYCL kernel design - efficient parallel execution ✅ Complete tensor layout compatibility - handles all strides correctly ✅ Robust error handling - comprehensive assertions and validation ✅ All official tests pass - 14,490/14,490 backend operations verified ✅ Production-ready code - clean, documented, maintainable Implements state-space model 1D convolution with sliding window algorithm. Eliminates blocking queue.wait() for better async performance. * Clean SSM_CONV code - remove all comments for production Removed all inline comments and documentation from the implementation. Clean, minimal code ready for production merge. * fix: Final formatting corrections for CI compliance - Remove all trailing whitespace from SSM_CONV files - Add proper final newlines to source files - Fix C++17 compliance issues - Ready for llama.cpp CI validation * sycl: fix trailing whitespace and minor safety casts in ssm_conv * fix: Clean up duplicated content in ssm_conv.hpp header file --------- Co-authored-by: tamarPal <[email protected]>
* cann: improve device ID handling and aclnnArange checks - Stop relying on CANN's internal device ID retrieval; use a global variable instead. - Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions. * cann: use thread local var
* grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>
* Add --embd-output-format raw for plain numeric embedding output This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting. * Move raw output handling into format handling section * Move raw output handling into else-if block with other format handlers * Use LOG instead of printf for raw embedding output * docs: document 'raw' embedding output format in arg.cpp and README
We're not going to release the buffers without flushing the session queue. So there is no need to inc/dec the refcounts for every request. We also don't need to include those bufs in the response.
We can use more CPU cores now that the dedicated dspqueue polling threads are not used (ie no contention). Also enable more agressive polling for now since we still map Flash Attention (and a few other kernels) to the CPU and those dspqueue threads were keeping the CPU cores are higher clock freqs.
904fd34
into
l3utterfly:hexagon-dspqueue-opts
16 of 66 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Ascend NPU
documentation
Improvements or additions to documentation
examples
ggml
Nvidia GPU
OpenCL
python
script
server
SYCL
testing
Vulkan
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR