Sync master with upstream release b6475 #250

jan-service-account · 2025-09-15T00:36:03Z

Updates dev branch with latest release (b6475) from ggml-org/llama.cpp

…" (ggml-org#15910) * Revert "sycl: add usage of enqueue_functions extension (ggml-org#14244)" This reverts commit 8308f98. * fix missed revert code, format the code

…g#15947) * vulkan: implement ggml igpu device type, implement pci id support * fix compiler warning * prevent printf overflow warning

ggml-ci

…g#15948) The function 'output_reserve' return type is 'uint32_t', so need to add explicit casting.

…15790) To pull and run models via: llama-server -dr gemma3 Add some validators and sanitizers for Docker Model urls and metadata Signed-off-by: Eric Curtin <[email protected]>

…d incorrect zTensor free (ggml-org#15839)

ggml-ci

* metal : run graphs ops concurrently ggml-ci * cont : add flags for debugging and disabling concurrency ggml-ci * cont : refactor and handle fusing ggml-ci * cont : simplify - no need to use GPU address ggml-ci * cont : prepare mem ranges for reuse + add ggml-metal-common.cpp ggml-ci * cont : avoid redundant keywords in cpp [no ci] * metal : reorder graph for better concurrency ggml-ci * metal : fix race on mem pool buffers ggml-ci * cont : add env GGML_METAL_GRAPH_OPTIMIZE_DISABLE ggml-ci * cont : refactor, optimize, add comments ggml-ci * cont : refactor ggml-metal.m ggml-ci * minor : update logs [no ci]

* metal : refactor bin kernels loading ggml-ci * metal : refactor rms kernel loading ggml-ci * ci : try to add memory leaks check ggml-ci * ci : try to enable memory leak detection for Mac * cont : seems to be working

* llama : allow using iGPUs with --device * mtmd : allow iGPU * rpc-server : allow iGPU

…ers (ggml-org#15705) Use this to query register count for shader compiles on NVIDIA. Currently this is only for performance debug, but it could eventually be used in some heuristics like split_k.

* vulkan: fix failing dequant shaders * add missing const

* ggml-zdnn: rm user mapped buffers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rm dead code Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to fix missing extra data buffer free Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>

* doc : update documentation for --tensor-split * Update tools/main/README.md Co-authored-by: Johannes Gäßler <[email protected]> * Update tools/main/README.md Co-authored-by: Diego Devesa <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: Diego Devesa <[email protected]>

* releases : update ROCM, add gfx1200, gfx1201, gfx1151 * releases : set target to 13.3 for macos-x64 * add hipblaslt.dll to release * add hipblaslt/library to release

Fix regression introduced with commit 50f4281

* metal : fix kernel requirements ggml-ci * cont : fix supports_op * cont : fix supports_op for ARGMAX

) * build: fix the cache keys for Windows HIP release job Update the cache keys to include the HIP SDK version, preventing the use of outdated ROCm installation caches. * build: sync changes from release.yml to build.yml - Update HIP SDK version to 25.Q3 and ROCm version to 6.4.2 - Update the cache keys to reflect the new versions * build: remove Windows HIP release for gfx1151 since the current stable rocWMMA does not support gfx1151.

* vulkan: move mul_mm dequantization steps into a separate file and functions * improve mul_mm vector load code * fix debug mode issues and warnings

…adeon RX 9000 series (ggml-org#15994) * rocm.Dockerfile: added gfx1200,gfx1201 architectures to support AMD Radeon RX 9000 series https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html#rdna-os states the Radeon RX 9000 series is supported support from Ubuntu 24.04.2, and the dockerfile is using 24.04 which is ROCm 6.4. This fixed the `ROCm error: invalid device function` I was getting when trying to use the rocm container.

* metal : remove mem pool usage ggml-ci * metal : remove mem pool implementation ggml-ci * metal : take into account the actual allocated memory of the tensor ggml-ci * cont : use ggml_backend_buft_get_alloc_size ggml-ci * cont : improve, comments ggml-ci * cont : add functions for the extra tensor sizes * metal : add comments ggml-ci * metal : implement .get_alloc_size for the rest of the buffer types ggml-ci * metal : remove ggml_metal_heap ggml-ci

* add grok-2 support * type fix * type fix * type fix * "fix" vocab for invalid sequences * fix expert tensor mapping and spaces in vocab * add chat template * fix norm tensor mapping * rename layer_out_norm to ffn_post_norm * ensure ffn_post_norm is mapped * fix experts merging * remove erroneous FFN_GATE entry * concatenate split tensors and add more metadata * process all expert layers and try cat instead of hstack * add support for community BPE vocab * fix expert feed forward length and ffn_down concat * commit this too * add ffn_up/gate/down, unsure if sequence is right * add ffn_gate/down/up to tensor names * correct residual moe (still not working) * mess-- * fix embedding scale being applied twice * add built in chat template * change beta fast for grok if default value * remove spm vocab in favor of community bpe vocab * change attention temp length metadata type to integer * update attention temp length metadata * remove comment * replace M_SQRT2 with std::sqrt(2) * add yarn metadata, move defaults to hparams

NeoZhangJianyu and others added 24 commits September 12, 2025 09:15

Revert "sycl: add usage of enqueue_functions extension (ggml-org#14244)…

704d90c

…" (ggml-org#15910) * Revert "sycl: add usage of enqueue_functions extension (ggml-org#14244)" This reverts commit 8308f98. * fix missed revert code, format the code

vulkan: Make device memory check more portable (ggml-org#15939)

6c88ad8

Vulkan iGPU device selection overhaul and PCI ID API support (ggml-or…

304ac56

…g#15947) * vulkan: implement ggml igpu device type, implement pci id support * fix compiler warning * prevent printf overflow warning

server : adjust prompt similarity thold + add logs (ggml-org#15913)

f088b6a

ggml-ci

context : remove redundant explicit casting to the same type (ggml-or…

f4e664f

…g#15948) The function 'output_reserve' return type is 'uint32_t', so need to add explicit casting.

Add docker protocol support for llama-server model loading (ggml-org#…

4bf5549

…15790) To pull and run models via: llama-server -dr gemma3 Add some validators and sanitizers for Docker Model urls and metadata Signed-off-by: Eric Curtin <[email protected]>

ggml-zdnn: fix ggml-org#15414, activate FP16 and BF16 acceleration an…

40be511

…d incorrect zTensor free (ggml-org#15839)

metal : fix memory leaks (ggml-org#15962)

84d7b2f

ggml-ci

metal : refactor kernel loading (ggml-org#15964)

55758b0

* metal : refactor bin kernels loading ggml-ci * metal : refactor rms kernel loading ggml-ci * ci : try to add memory leaks check ggml-ci * ci : try to enable memory leak detection for Mac * cont : seems to be working

llama : allow using iGPUs with --device (ggml-org#15951)

50f4281

* llama : allow using iGPUs with --device * mtmd : allow iGPU * rpc-server : allow iGPU

vulkan: initialize vulkan-hpp to allow using extension function point…

b9c9c9f

…ers (ggml-org#15705) Use this to query register count for shader compiles on NVIDIA. Currently this is only for performance debug, but it could eventually be used in some heuristics like split_k.

vulkan: fix failing dequant shaders (ggml-org#15862)

aa0c461

* vulkan: fix failing dequant shaders * add missing const

releases : update ROCM, add gfx1200, gfx1201, gfx1151 (ggml-org#15972)

9ecb884

* releases : update ROCM, add gfx1200, gfx1201, gfx1151 * releases : set target to 13.3 for macos-x64 * add hipblaslt.dll to release * add hipblaslt/library to release

rpc : fix regression when --device is used (ggml-org#15981)

918b26f

Fix regression introduced with commit 50f4281

metal : fix kernel requirements (ggml-org#15983)

a14bd35

* metal : fix kernel requirements ggml-ci * cont : fix supports_op * cont : fix supports_op for ARGMAX

Vulkan: Clean up mul_mm shader (ggml-org#15987)

261e6a2

* vulkan: move mul_mm dequantization steps into a separate file and functions * improve mul_mm vector load code * fix debug mode issues and warnings

server : only attempt to enable thinking if using jinja (ggml-org#15967)

6c019cb

jan-service-account merged commit f453acf into dev Sep 15, 2025
3 checks passed

jan-service-account deleted the update-dev-from-master-2025-09-15-00-36 branch September 15, 2025 00:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6475 #250

Sync master with upstream release b6475 #250

Uh oh!

jan-service-account commented Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Sync master with upstream release b6475 #250

Sync master with upstream release b6475 #250

Uh oh!

Conversation

jan-service-account commented Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants