ggml-hexagon: respect input size when getting/setting tensor data #16836

l3utterfly · 2025-10-29T08:42:31Z

It seems currently the input size parameter when getting/setting tensors is ignored. This crashes when attempting to save/load state data, because we only save/load filled kv cells during llama_kv_cache::state_read_data and llama_kv_cache::state_write_data. The cache saving/loading wants to read partial tensors, so it fails the assert in ggml_backend_hexagon_buffer_get_tensor and ggml_backend_hexagon_buffer_set_tensor.

This PR updates get/set tensor to read and repack partial rows based on the passed in size input. This is tested to allow saving and loading kv caches successfully on an S25+ Ultra.

allows partial repacking/copying when get tensor size is smaller than the actual tensor

max-krasnyansky · 2025-10-30T04:55:17Z

@l3utterfly sorry for the delayed ack. I want to test it out on my setup but keep getting sidetracked.
I'm a bit surprised that we set set_tensor calls for the quantized tensors during save/restore.
btw What's the easiest way to trigger/test this with the cli tools, llama-cli...--prompt-cache?

l3utterfly · 2025-10-30T05:00:30Z

@max-krasnyansky yeah, that would be the best way to trigger save/load cache.

I believe the run-cli.sh currently sets kv cache quants to Q8_0, so it reads and sets quantised tensors.

max-krasnyansky · 2025-10-30T05:05:23Z

@max-krasnyansky yeah, that would be the best way to trigger save/load cache.

I believe the run-cli.sh currently sets kv cache quants to Q8_0, so it reads and sets quantised tensors.

Ah. That makes sense. Thanks will test asap and merge. Thank you thank you.

max-krasnyansky · 2025-10-30T19:56:18Z

@l3utterfly
Everything looks good in my tests as well.
Tested --prompt-cache with GPT-OSS-20B (MXFP4) and Q4/Q8 models.

Somehow the commit ends up with a duplicate copy of the repack_mxfp4_mxfp4x4x2 function.
I had to remove it by hand to compile. Perhaps, a rebase glitch?
Can you please rebase again and we'll merge it.

 ~/src/llama.cpp-hexagon$ git grep repack_mxfp4
convert_hf_to_gguf.py:    def repack_mxfp4(self, new_name: str, blocks: Tensor, scales: Tensor):
convert_hf_to_gguf.py:                self.repack_mxfp4(new_name, blocks0, data_torch)
convert_hf_to_gguf.py:                self.repack_mxfp4(new_name_gate, blocks0, scales0)
convert_hf_to_gguf.py:                self.repack_mxfp4(new_name_up, blocks1, scales1)
ggml/src/ggml-hexagon/ggml-hexagon.cpp:static void repack_mxfp4_mxfp4x4x2(ggml_tensor * t, const void * data, size_t size) { <<<
ggml/src/ggml-hexagon/ggml-hexagon.cpp:static void repack_mxfp4_mxfp4x4x2(ggml_tensor * t, const void * data, size_t size) { <<<
ggml/src/ggml-hexagon/ggml-hexagon.cpp:static void repack_mxfp4x4x2_mxfp4(void * data, const ggml_tensor * t, size_t size) {

l3utterfly · 2025-10-31T02:51:11Z

@max-krasnyansky fixed! Thanks for testing!

respect input size when getting/setting tensor data

3c8522d

allows partial repacking/copying when get tensor size is smaller than the actual tensor

l3utterfly requested a review from max-krasnyansky as a code owner October 29, 2025 08:42

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 29, 2025

Removed duplicate repack_mxfp4_mxfp4x4x2 function

14ee45b

l3utterfly requested a review from lhez as a code owner October 31, 2025 02:49

max-krasnyansky approved these changes Oct 31, 2025

View reviewed changes

max-krasnyansky merged commit 13002a0 into ggml-org:master Oct 31, 2025
72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-hexagon: respect input size when getting/setting tensor data #16836

ggml-hexagon: respect input size when getting/setting tensor data #16836

l3utterfly commented Oct 29, 2025

Uh oh!

max-krasnyansky commented Oct 30, 2025

Uh oh!

l3utterfly commented Oct 30, 2025

Uh oh!

max-krasnyansky commented Oct 30, 2025

Uh oh!

max-krasnyansky commented Oct 30, 2025

Uh oh!

l3utterfly commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-hexagon: respect input size when getting/setting tensor data #16836

ggml-hexagon: respect input size when getting/setting tensor data #16836

Conversation

l3utterfly commented Oct 29, 2025

Uh oh!

max-krasnyansky commented Oct 30, 2025

Uh oh!

l3utterfly commented Oct 30, 2025

Uh oh!

max-krasnyansky commented Oct 30, 2025

Uh oh!

max-krasnyansky commented Oct 30, 2025

Uh oh!

l3utterfly commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants