merge from zhouwg ggml-hexagon #67

l3utterfly · 2025-06-03T07:15:07Z

No description provided.

Reuse the f16/f32 copy shaders, and just scale the number of elements according to the type size.

* [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <[email protected]> * codestyle adjustment Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]>

* server : support audio input * add audio support on webui

ggml-ci

* ggml : add ggml_gelu_erf() CUDA kernel * missing semicolon

…gml-org#12379) * add common_json w/ support for truncated json healing * add common_chat_msg_diff * partial common_chat_parse * refactor parser w/ optionals * server: wire chat diffs in stream mode * fix trigger of thinking models (must happen after thoughts are closed) * fix functionary v3.2 raw python! * rename: common_chat_syntax (now contains format) * rm common_regex.at_start * don't return empty <think></think> * accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`) * fix QwQ 32B tool call parsing after thoughts (hermes2) * better logs for grammar triggers * consume spaces after parse_json_tool_calls * fix required tool calls w/ thinking models that have pre-opened thinking tags * fix thinking model's initial trigger + test qwq's template * run most test_tool_call tests in stream + non-stream modes * make functionary v3.2 parsing more strict (differentiate first match from others) * send final diff from server, to close off raw python arguments * support partial content streaming in Generic mode * tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5) * Update function-calling.md * Update tool_bench.py * chat-parser: remove input from exception (llm output may contain PII) --------- Co-authored-by: ochafik <[email protected]> Co-authored-by: Olivier Chafik <[email protected]>

…-org#13752) Temporarily reverted due to failing fp16 DIV operation This reverts commit 02cdd2d. ggml-ci

Co-authored-by: ochafik <[email protected]>

* Multimodal: Added Moondream2 model and fixed ggml.org link * Apply suggestions from code review --------- Co-authored-by: name <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

* mtmd : add Qwen2-Audio support * small clean up * update discussion link * clarify mtmd_get_output_embd * clarification in multimodal.md * fix ultravox bug * ggml_cont

* kv-cache : rework kv_cell ggml-ci * kv-cells : use "shift" instead of "delta" consistently ggml-ci * llama : add llama_max_parallel_sequences() ggml-ci * kv-cells : update comments [no ci] * context : fail upon construction if sequences exceed max value ggml-ci * kv-cells : get_pos() -> pos_get() + comments ggml-ci * kv-cells : fix tracking of "used" cells ggml-ci

… w/ enable_thinking:false) (ggml-org#13771) --------- Co-authored-by: ochafik <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

* cann: add the basic FA support * cann: update the readme * cann: update the FlashAttention with PSEShift * cann: update the input parameters in FA * cann: update the alibi with max_bias * cann: add the constrints of softcap * cann: update the docs CANN.md * cann: update the docs CANN.md * cann: fix typo of CANN.md * cann: add some comments and update the CANN.md * cann: update the CANN.md * cann: update the inner precise for fusedInferAttention * cann: update the constraints of flash_attn_ext on ggml-cann.cpp * cann: clean the whitespace * cann: clean the whitespace * cann: add a new endline

…orks in a standard Android APP)

…antv

…roduced in kantv-ai/kantv#281)

foldl and others added 30 commits May 23, 2025 06:33

use LOG_WARN to replace std::cerr (ggml-org#13657)

a127ff1

vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't s…

c10ed6c

…upport it (ggml-org#13696)

vulkan: support CPY from any type to itself (ggml-org#13695)

1dcd019

Reuse the f16/f32 copy shaders, and just scale the number of elements according to the type size.

ggml : fix the order of ggml_unary_op (ggml-org#13718)

e16c473

CANN: Support MUL_MAT_ID for q8_0 and q4_0 (ggml-org#13705)

faaaff5

* [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <[email protected]> * codestyle adjustment Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]>

server : support audio input (ggml-org#13714)

9ecf3e6

* server : support audio input * add audio support on webui

llama : allow custom list of swa_layers (ggml-org#13726)

8a2afb7

hparams : initialize arrays (ggml-org#13728)

d13d0f6

ggml-ci

ci : add winget package updater (ggml-org#13732)

a70a8a6

ci : enable winget package updates (ggml-org#13734)

b775345

CUDA: fix race condition in FA vector kernels (ggml-org#13742)

ffd0eae

vocab : fix ugm tokenizer precision (ggml-org#13743)

c3a2624

ggml : add ggml_gelu_erf() CUDA kernel (ggml-org#13719)

4c32832

* ggml : add ggml_gelu_erf() CUDA kernel * missing semicolon

Move GLM4 f32 attention fix to the correct function (ggml-org#13750)

259469c

ggml-cpu : set openmp wait time if not set (ggml-org#13758)

2bd1b30

releases : enable openmp in windows cpu backend build (ggml-org#13756)

17fc817

releases : bundle llvm omp library in windows release (ggml-org#13763)

a2d02d5

SYCL: revert "sycl: simplify bin_bcast_kernel (ggml-org#13383)" (ggml…

515fdbf

…-org#13752) Temporarily reverted due to failing fp16 DIV operation This reverts commit 02cdd2d. ggml-ci

llama : add support for Qwen3 MoE tied word embeddings (ggml-org#13768)

4032ca4

server: fix/test add_generation_prompt (ggml-org#13770)

d785f9c

Co-authored-by: ochafik <[email protected]>

docs : add Moondream2 pre-quantized link (ggml-org#13745)

a08c1d2

* Multimodal: Added Moondream2 model and fixed ggml.org link * Apply suggestions from code review --------- Co-authored-by: name <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

mtmd : add support for Qwen2-Audio and SeaLLM-Audio (ggml-org#13760)

40aaa8a

* mtmd : add Qwen2-Audio support * small clean up * update discussion link * clarify mtmd_get_output_embd * clarification in multimodal.md * fix ultravox bug * ggml_cont

rpc : Fix build on OpenBSD (ggml-org#13541)

c508256

tests : improve UGM tokenizer test coverage (ggml-org#13773)

aa50ba4

webui : bump max upload file size to 500MB (ggml-org#13779)

2f099b5

server: add --reasoning-budget 0 to disable thinking (incl. qwen3…

e121edc

… w/ enable_thinking:false) (ggml-org#13771) --------- Co-authored-by: ochafik <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

vulkan: mark IM2COL as supporting non-contig (ggml-org#13783)

fef693d

jeffzhou2000 added 17 commits June 1, 2025 11:02

ggml-dsp:refine ggmlhexagon_dsp_add_f32

2773933

ggml-dsp: refine logic of thread_counts

e3a3d2c

ggml-hexagon: release v1.06 and ready for code review

c629118

ggml-dsp: make GGML_OP_ADD more faster on cDSP side

4f49c7a

ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…

00b5d44

…orks in a standard Android APP)

sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…

cbda1c8

…antv

sync with upstream

df64fef

sync with upstream

6f6cd17

ggml-hexagon: upgrade QNN SDK to v2.34.0.250424

987e959

sync with upstream

2c50925

ggml-hexagon: sync from project kantv(fix a long-term issue which int…

915e31e

…roduced in kantv-ai/kantv#281)

ggml-hexagon: sync with upstream llama.cpp

26e27c9

build: enable self-contained-build to simplify workflow

ebbdc41

sync with upstream

24a5e69

add prebuilt binary libggmlop-skel.so

5b21435

refine ggml-hexagon.cfg for the prebuilt binary libggmlop-skel.so

6962ac6

refine scripts to avoid confusion

0c53100

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python server ggml script labels Jun 3, 2025

l3utterfly closed this Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge from zhouwg ggml-hexagon #67

merge from zhouwg ggml-hexagon #67

Uh oh!

l3utterfly commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge from zhouwg ggml-hexagon #67

merge from zhouwg ggml-hexagon #67

Uh oh!

Conversation

l3utterfly commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants