merge from zhouwg ggml-hexagon #66

l3utterfly · 2025-05-23T06:19:27Z

No description provided.

* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1 * Fixed lldb attach * Simplify by having the child do ggml_print_backtrace_symbols

ggml-ci

…rg#13532)

* wip llama 4 conversion * rm redundant __init__ * fix conversion * fix conversion * test impl * try this * reshape patch_embeddings_0 * fix view * rm ffn_post_norm * cgraph ok * f32 for pos embd * add image marker tokens * Llama4UnfoldConvolution * correct pixel shuffle * fix merge conflicts * correct * add debug_graph * logits matched, but it still preceives the image incorrectly * fix style * add image_grid_pinpoints * handle llama 4 preprocessing * rm load_image_size * rm unused line * fix * small fix 2 * add test & docs * fix llava-1.6 test * test: add notion of huge models * add comment * add warn about degraded quality

* sycl: reviewing and updating docs * Updates Runtime error codes * Improves OOM troubleshooting entry * Added a llama 3 sample * Updated supported models * Updated releases table

…32B incoherence (ggml-org#13607)

…rg#13482) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag

* Update CANN model support status * Update of model support * update * update * update * fix format of CANN.md * fix format of CANN.md * fix format of CANN.md

* kv-cache : prepare for SWA ggml-ci * kv-cache : initial iSWA implementation ggml-ci * kv-cache : rework error recovery logic ggml-ci * models : fix Phi-3 SWA parameters ggml-ci * model : adjust Granite to rope factor changes ggml-ci * server : check if context can do shifts ggml-ci * iswa : for now, always enable shifts (experiment) ggml-ci * kv-cache : simplify SWA logic ggml-ci * kv-cache : apply defrag when we fail to find slots for the batch ggml-ci * llama : update docs about llama_decode ggml-ci * kv-cache : update warning logs when no space for the batch is available ggml-ci * llama : add llama_kv_self_seq_pos_min() * kv-cache : keep track of partial SWA computes and print warnings * server : disallow use cases involving partial SWA context ggml-ci * llama : add param to control SWA cache size ggml-ci * minor : clean-up ggml-ci

…to fix infinity values in output (ggml-org#13639)

* CUDA: skip fully masked-out KV in FA vec kernel

…13653) ggml-ci

ggml-ci

* Update mtmd-helper.cpp * Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Xuan-Son Nguyen <[email protected]> --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>

* small fixes * remove ifdef

…ITY op to accelerate D2D memory copy (ggml-org#13647) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <[email protected]> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <[email protected]> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <[email protected]> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <[email protected]> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]>

* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo

* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci

* server : fix first message identification When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message. Co-authored-by: Piotr Stankiewicz <[email protected]> Signed-off-by: Dorin Geman <[email protected]> * server : Fix checks for first role message for stream=True Co-authored-by: Piotr Stankiewicz <[email protected]> Signed-off-by: Dorin Geman <[email protected]> --------- Signed-off-by: Dorin Geman <[email protected]> Co-authored-by: Piotr Stankiewicz <[email protected]>

* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.

* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order

Signed-off-by: Emmanuel Ferdman <[email protected]>

* switch retrieval to llama_encode * enable --no-warmup for retrieval

…orks in a standard Android APP)

…antv

…roduced in kantv-ai/kantv#281)

D2hugging and others added 30 commits May 19, 2025 13:25

fix: check model pointer validity before use (ggml-org#13631)

9c55e5c

ggml : Fix missing backtrace on Linux (ggml/1228)

60aea02

* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1 * Fixed lldb attach * Simplify by having the child do ggml_print_backtrace_symbols

ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)

8b5e19a

mnist: fix segmentation fault (ggml/1227)

6c35981

sync : ggml

d30cb5a

ggml-ci

ci : upgraded oneAPI version in SYCL workflows and dockerfile (ggml-o…

f71f40a

…rg#13532)

sycl : backend documentation review (ggml-org#13544)

725f23f

* sycl: reviewing and updating docs * Updates Runtime error codes * Improves OOM troubleshooting entry * Added a llama 3 sample * Updated supported models * Updated releases table

Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 …

8960efd

…32B incoherence (ggml-org#13607)

common : add load_progress_callback (ggml-org#13617)

1dfbf2c

CANN: Update CANN model support (ggml-org#13162)

f0adb80

* Update CANN model support status * Update of model support * update * update * update * fix format of CANN.md * fix format of CANN.md * fix format of CANN.md

metal : fix typo in FA kernel comments (ggml-org#13651)

c00a263

Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 …

c9c64de

…to fix infinity values in output (ggml-org#13639)

sycl: disable reorder for sycl mulmat (ggml-org#13536)

4245e62

tests : avoid github urls due to throttling (ggml-org#13654)

759e37b

CUDA: skip fully masked-out KV in FA vec kernel (ggml-org#13584)

b69f164

* CUDA: skip fully masked-out KV in FA vec kernel

llama : remove llama_kv_cache_view API + remove deprecated (ggml-org#…

a4090d1

…13653) ggml-ci

model : fix llama4 graph (ggml-org#13663)

be02396

ggml-ci

mtmd-helper : bug fix to token batching in mtmd (ggml-org#13650)

b7a1746

* Update mtmd-helper.cpp * Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Xuan-Son Nguyen <[email protected]> --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>

vulkan: fix warnings (ggml-org#13626)

fb1cab2

* small fixes * remove ifdef

model : disable SWA for Phi models (ggml-org#13676)

b44890d

* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo

kv-cache : simplify the interface (ggml-org#13660)

797f2ac

* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci

server : Add the endpoints /api/tags and /api/chat (ggml-org#13659)

0d5c742

* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.

ggml : add ggml_gelu_erf() (ggml-org#13667)

cf4cb59

* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order

gguf-py : display the invalid gguf type (ggml-org#13687)

eb0f5c2

Signed-off-by: Emmanuel Ferdman <[email protected]>

examples : switch retrieval to llama_encode (ggml-org#13685)

2aa777d

* switch retrieval to llama_encode * enable --no-warmup for retrieval

jeffzhou2000 added 16 commits May 23, 2025 11:01

ggml-hexagon: fix a build issue in CI

ae07b3d

ggml-dsp: cleanup code

8c9bf8f

ggml-hexagon: sync with upstream

7e0a440

ggml-dsp: cleanup code

8706d83

ggml-dsp:refine ggmlhexagon_dsp_add_f32

5279547

ggml-dsp: refine logic of thread_counts

1eee593

ggml-hexagon: release v1.06 and ready for code review

d48f16e

ggml-dsp: make GGML_OP_ADD more faster on cDSP side

14d00f1

ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…

5b84e56

…orks in a standard Android APP)

sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…

2ebfa2d

…antv

sync with upstream

2b9d91f

sync with upstream

addfaaf

ggml-hexagon: upgrade QNN SDK to v2.34.0.250424

947e43a

sync with upstream

32fd16a

ggml-hexagon: sync from project kantv(fix a long-term issue which int…

cc46b70

…roduced in kantv-ai/kantv#281)

ggml-hexagon: sync with upstream llama.cpp

0389cf3

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python server ggml Apple Metal script labels May 23, 2025

l3utterfly merged commit 235a5f8 into l3utterfly:ggml-hexagon May 23, 2025
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge from zhouwg ggml-hexagon #66

merge from zhouwg ggml-hexagon #66

Uh oh!

l3utterfly commented May 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge from zhouwg ggml-hexagon #66

merge from zhouwg ggml-hexagon #66

Uh oh!

Conversation

l3utterfly commented May 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants