Sync master with upstream release b6387 #242

jan-service-account · 2025-09-05T00:33:08Z

Updates dev branch with latest release (b6387) from ggml-org/llama.cpp

) Fixes ggml-org#15330 Adjust the allocation size of acl_rstd. The parameter `dims` is set to 3 according to the CANN documentation. Co-authored-by: Yuchuan <[email protected]>

* add conv3d support * add ggml_pad_ext for cpu & cuda backend * cuda/cpu: add im2col_3d support * cuda: make im2col a little faster * fix cuda pad/scale/im2col3d * make im2col_3d faster * gguf: support loading tensors which n_dims > GGML_MAX_DIMS * fix cuda get_rows * avoid ggml_conv_3d conflict * correct GGML_OP_COUNT assertion * avoid build failure * avoid build failure on MacOS * cuda: remove unnecessary MIN define * fix cpu im2col_3d * adjust the code style * cuda: use simpler loop in get_rows * add test_im2col_3d to test-backend-ops * test-backend-ops.cpp: remove trailing whitespace * cpu: im2col_3d support non continuous src Co-authored-by: Jeff Bolz <[email protected]> * fix test_im2col_3d * remove unused variables * cuda: get_rows: dfloat2 -> float2 * add test_pad_ext to test-backend-ops.cpp * add gguf_init_from_file_ext impl * Revert "gguf: support loading tensors which n_dims > GGML_MAX_DIMS" This reverts commit d8377a0. * Revert "add gguf_init_from_file_ext impl" This reverts commit d9f1d13. * update ggml_backend_vk_device_supports_op * fix ggml_backend_vk_device_supports_op * update other backend supports op for ggml_pad_ext * metal/opencl/sycl/vulkan: fix GGML_OP_PAD check in supports_op --------- Co-authored-by: Jeff Bolz <[email protected]>

This is a key change, just letting users know. Signed-off-by: Eric Curtin <[email protected]>

* server: add exceed_context_size_error type * change error code to 400

* CANN:Refactor ND to NZ workspace to be per-device in Ascend backend - Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID. - Functions `release_nz_workspace`, `relloc_nz_workspace`, and `get_nz_workspace` now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios. - This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently. Co-authored-by: hipudding <[email protected]> * refactor Signed-off-by: noemotiovon <[email protected]> * rename Signed-off-by: noemotiovon <[email protected]> * fix review comments Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]> Co-authored-by: hipudding <[email protected]>

…15791) * llama : set n_outputs to 1 to avoid 0 outputs mean-pooling This commit modifies the llama_context constructor to set n_outputs to 1. The motivation for this is that when using pooling, and specifically mean pooling, for embeddings having n_outputs set to 0 can lead to the following error: ```console $ build/bin/llama-embedding -m models/nomic-embed-text-1.5-Q4_K_M.gguf \ --pooling mean -p "Hello, how are you?" ... llama_context: CPU output buffer size = 0.12 MiB /home/danbev/work/ai/llama.cpp/ggml/src/ggml.c:3023: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed 0x0000743c96d107e3 in __GI___wait4 (pid=292978, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30 warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory 30 in ../sysdeps/unix/sysv/linux/wait4.c 196 waitpid(child_pid, NULL, 0); 230 ggml_print_backtrace(); 3023 GGML_ASSERT(ggml_can_mul_mat(a, b)); 1823 cur = ggml_mul_mat(ctx0, ggml_cont(ctx0, ggml_transpose(ctx0, inp)), inp_mean); 18983 llm->build_pooling(cls, cls_b, cls_out, cls_out_b); 1399 auto * gf = model.build_graph(gparams); 292 auto * gf = graph_reserve(1, n_seqs, n_outputs, mctx.get(), true); 2329 auto * ctx = new llama_context(*model, params); 913 llama_context * lctx = llama_init_from_model(model, cparams); 105 common_init_result llama_init = common_init_from_params(params); [Inferior 1 (process 292976) detached] Aborted (core dumped) ``` Co-authored-by: Georgi Gerganov <[email protected]> * add comment about not reserving graphs with zero outputs * add assert in graph_reserve to ensure n_outputs >= 1 --------- Co-authored-by: Georgi Gerganov <[email protected]>

…-org#15799) Branch: GGMLMetalNE20 Signed-off-by: Gabe Goodhart <[email protected]>

This commit add support for the EmbeddingGemma 300m. This model supports sliding window attention (SWA) and a new swq_type is introduced to support symmetric SWA masking. This commit also extracts the code from the function llama_is_masked_swa in llama-impl.h, so that the logic can be shared by both llm_graph_input_attn_no_cache::set_input and llama_kv_cache::set_input_kq_mask. With this commit the EmbeddingGemma 300m model can be converted to to GGUF and used with llama.cpp. Once the model has been uploaded to HuggingFace it can be used like this: ```console ./build/bin/llama-cli -hf ggml-org/embeddinggemma-300m-GGUF:Q8_0 ```

* feat: add Jinja tester PySide6 simple app * Linter fixes * Pylint fixes * Whitespace * Add commandline support; add formatter; add extensions * Remove testing actions * Silence flake8 warnings for commandline mode * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * Fix trailing whitespace/newline logic * Update scripts/jinja/jinja-tester.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update scripts/jinja/jinja-tester.py Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>

* feat: nemotron thinking & toolcalling support * Trailing whitespaces * Corrected template for Nemotron * Template and parser fixes * Final template and grammar changes * Whitespace * Always do lazy grammar processing since </think> tag will always be there. * Allow extra content after toolcall * Whitespace * New tests: thinking + tools, tools + content, thinking + tools + content (new!) * Whitespace * Remove cURL test script

…gml-org#15639) Co-authored-by: CNE Pierre FICHEPOIL <[email protected]>

noemotiovon and others added 13 commits September 4, 2025 11:03

CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760

239b60e

) Fixes ggml-org#15330 Adjust the allocation size of acl_rstd. The parameter `dims` is set to 3 according to the CANN documentation. Co-authored-by: Yuchuan <[email protected]>

opencl: add hs=40 to FA (ggml-org#15758)

820bc98

CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784)

5421f63

Document the new max GPU layers default in help (ggml-org#15771)

badb80c

This is a key change, just letting users know. Signed-off-by: Eric Curtin <[email protected]>

server: add exceed_context_size_error type (ggml-org#15780)

a68d914

* server: add exceed_context_size_error type * change error code to 400

metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml…

856ed09

…-org#15799) Branch: GGMLMetalNE20 Signed-off-by: Gabe Goodhart <[email protected]>

chat : fixed crash when Hermes 2 <tool_call> had a newline before it (g…

4fd1242

…gml-org#15639) Co-authored-by: CNE Pierre FICHEPOIL <[email protected]>

jan-service-account merged commit 0b3190b into dev Sep 5, 2025
3 checks passed

jan-service-account deleted the update-dev-from-master-2025-09-05-00-33 branch September 5, 2025 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6387 #242

Sync master with upstream release b6387 #242

Uh oh!

jan-service-account commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Sync master with upstream release b6387 #242

Sync master with upstream release b6387 #242

Uh oh!

Conversation

jan-service-account commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants