merge ggml-hexagon implementation #62

l3utterfly · 2025-04-30T03:06:04Z

No description provided.

* convert : experimental support for `--mmproj` flag * fix bad ctrl+f replace * fix style * split into subclasses TextModel and VisionModel * rename Mode --> ModelBase * small fix * correct CLIP_VISION arch name (because existing GGUF already use it) * Apply suggestions from code review Co-authored-by: compilade <[email protected]> * fix Mistral3Model * fix typo Co-authored-by: compilade <[email protected]> --------- Co-authored-by: compilade <[email protected]>

…li` (ggml-org#13012) * mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli` * support for minicpmv * remove cpp files of llava and minicpmv * update hot topics * mtmd : add not supported msg for qwen2vl * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

…g#12871) * ggml : add SSE 4.2 variant for CPUs without AVX * ggml : add x64 base ABI variant

* llava : update documentations * fix typo

* metal : add memory pool for temp allocs (wip) [no ci] * cont : free buffers from the heap * cont : resize heap [no ci] * cont : refactor heap [no ci] * cont : heap for each cmd buffer [no ci] * cont : fix free * wip * cont : fix alignment [no ci] * cont : not working .. [no ci] * cont : heap allocation now works [no ci] * cont : use MTLHeapTypePlacement ggml-ci * metal : use dynamic MTLHeap allocations ggml-ci * metal : add comments * metal : disable softmax use of mem_pool ggml-ci * metal : final touches

* security : add note about RPC functionality * security : add note about llama-server

* mtmd : support SmolVLM (version 1 and 2) * correct chat template * fix n_patches * scale_factor is an int * add more models to test

* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID * fix logic for RoPE support, CUDA graphs

…nd (ggml-org#13060) closes ggml-org#13051

…13021) * append mult-eos,half-rope,bos to GLM4-0414 * remove unset var

* add pixtral text model (vision is wip) * cgraph ok, just missing 2D RoPE * fix bad rebase * first working version * fix problem with img_break token * support dynamic image size * update docs * update test script

* Sigint rework in mtmd vision example * Applied suggestions on mtmd-cli PR * Forgot to invert one of the conditions * Update examples/llava/mtmd-cli.cpp * Removed redundant exit check --------- Co-authored-by: pl752 <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>

* tune matmul for gcn * this one is more power efficient * Update ggml/src/ggml-vulkan/ggml-vulkan.cpp Co-authored-by: 0cc4m <[email protected]> * disable this tune for the proprietary driver --------- Co-authored-by: 0cc4m <[email protected]>

…gml-org#13090) ggml-ci

* arg : clean up handling --mmproj with -hf * rm change about no_mmproj * Revert "rm change about no_mmproj" This reverts commit 2cac8e0. * handle no_mmproj explicitly * skip download mmproj on examples not using it

* arg : add --no-mmproj-offload * Update common/arg.cpp

…#13091)

* cmake : do not include ./src as public for libllama ggml-ci * cmake : rework tests ggml-ci * llguidance : remove unicode include ggml-ci * cmake : make c++17 private ggml-ci

* ggml-cpu : kernels for faster depthwise 2D convolution * fix compile: remove static after moving to ops.cpp * add dilation for depthwise_conv_2d * review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace * review: rename depthwise_conv_2d -> conv_2d_dw everywhere

ggml-ci

…org#12943) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.

…ffer_context

…orks in a standard Android APP)

…antv

jmorganca and others added 30 commits April 20, 2025 08:28

metal: add neg operator (ggml-org#13029)

4ba9d71

vulkan: support noncontiguous rms_norm (ggml-org#13031)

6616820

llava: fix errors in clip.h on certain compilers (ggml-org#13030)

6602304

SYCL: Add non-contiguous support in ROPE (ggml-org#12993)

5368ddd

ggml-ci

ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (ggml-or…

1d735c0

…g#12871) * ggml : add SSE 4.2 variant for CPUs without AVX * ggml : add x64 base ABI variant

llava : update documentations (ggml-org#13055)

2434535

* llava : update documentations * fix typo

security : add note about RPC and server functionality (ggml-org#13061)

ab47dec

* security : add note about RPC functionality * security : add note about llama-server

mtmd : support SmolVLM (version 1 and 2) (ggml-org#13050)

dc39a5e

* mtmd : support SmolVLM (version 1 and 2) * correct chat template * fix n_patches * scale_factor is an int * add more models to test

CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (ggml-org#13014)

658987c

* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID * fix logic for RoPE support, CUDA graphs

rpc : add command line option for number of threads for the CPU backe…

2cca6c0

…nd (ggml-org#13060) closes ggml-org#13051

convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z (ggml-org#…

eb1776b

…13021) * append mult-eos,half-rope,bos to GLM4-0414 * remove unset var

mtmd : Support Pixtral 12B (ggml-org#13065)

ecda2ec

* add pixtral text model (vision is wip) * cgraph ok, just missing 2D RoPE * fix bad rebase * first working version * fix problem with img_break token * support dynamic image size * update docs * update test script

vulkan: matmul gcn tuning (ggml-org#13016)

b3b6d86

* tune matmul for gcn * this one is more power efficient * Update ggml/src/ggml-vulkan/ggml-vulkan.cpp Co-authored-by: 0cc4m <[email protected]> * disable this tune for the proprietary driver --------- Co-authored-by: 0cc4m <[email protected]>

metal : fix floating-point range of attention scores in FA kernels (g…

7604a7d

…gml-org#13090) ggml-ci

arg : clean up handling --mmproj with -hf (ggml-org#13082)

80982e8

* arg : clean up handling --mmproj with -hf * rm change about no_mmproj * Revert "rm change about no_mmproj" This reverts commit 2cac8e0. * handle no_mmproj explicitly * skip download mmproj on examples not using it

arg : add --no-mmproj-offload (ggml-org#13093)

7c727fb

* arg : add --no-mmproj-offload * Update common/arg.cpp

clang-tidy : disable warning about missing math parenthesis (ggml-org…

572b314

…#13091)

cmake : do not include ./src as public for libllama (ggml-org#13062)

13b4548

* cmake : do not include ./src as public for libllama ggml-ci * cmake : rework tests ggml-ci * llguidance : remove unicode include ggml-ci * cmake : make c++17 private ggml-ci

CUDA: use switch statements in constexpr functions (ggml-org#13095)

b10d8bf

sync : ggml

63b4911

ggml-ci

ggml : fix trailing whitespaces (#0)

87616f0

embeddings : fix batch sizes (ggml-org#13076)

226251e

ggml-ci

clip : remove boi/eoi embeddings for GLM-edge model (ggml-org#13081)

13be08d

change the reorder tensor from init to execute OP (ggml-org#13003)

514c456

jeffzhou2000 added 17 commits April 29, 2025 21:07

ggml-hexagon: refine pinned-memory feature

cb0dfd7

ggml-hexagon: refine build system in ggml-hexagon

88acf6d

ggml-hexagon: remove redundant code in struct ggml_backend_hexagon_bu…

67c7d06

…ffer_context

ggml-hexagon: upgrade Android NDK to android-ndk-r28

36c3ff6

ggml-dsp: split ggml-dsp.c into multiple files and cleanup

57cfbbe

ggml-dsp: refine ggml-dsp and make ggml-dsp more clear

bcb5012

ggml-hexagon: fix a minior issue in dev ops

6931510

ggml-hexagon: fix a build issue in CI

c45cd5e

ggml-dsp: cleanup code

7b55a46

ggml-hexagon: sync with upstream

6f11897

ggml-dsp: cleanup code

157b6b1

ggml-dsp:refine ggmlhexagon_dsp_add_f32

d4afea4

ggml-dsp: refine logic of thread_counts

2862e27

ggml-hexagon: release v1.06 and ready for code review

c36bd93

ggml-dsp: make GGML_OP_ADD more faster on cDSP side

4f70d23

ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…

7b00b51

…orks in a standard Android APP)

sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…

b6072fa

…antv

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples python server ggml Apple Metal script labels Apr 30, 2025

l3utterfly merged commit fe88096 into l3utterfly:ggml-hexagon Apr 30, 2025
31 of 49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge ggml-hexagon implementation #62

merge ggml-hexagon implementation #62

Uh oh!

l3utterfly commented Apr 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge ggml-hexagon implementation #62

merge ggml-hexagon implementation #62

Uh oh!

Conversation

l3utterfly commented Apr 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants