Sync master with upstream release b6423 #246

jan-service-account · 2025-09-09T00:33:28Z

Updates dev branch with latest release (b6423) from ggml-org/llama.cpp

* CANN: Switch to stream synchronization Switch to stream synchronization because events are not effective. Co-authored-by: hipudding <[email protected]> * CANN: add Comments --------- Co-authored-by: hipudding <[email protected]>

* model : avoid ggml_cont_3d for fused QKV weights ggml-ci * kv-cache : make cpy_k and cpy_v implementation more readable ggml-ci * cont : add comments ggml-ci * cont : minor fix [no ci] * cont : one more fix * cont : clarity ggml-ci * kv-cache : require contiguous heads of k_cur and v_cur ggml-ci

ggml-ci

…gml-org#15835) ggml-ci

…#15867) * convert : force setting sliding_window from original config This commit modifies the set_gguf_parameters method for EmbeddingGemma so that it reads the sliding_window parameter from the original model config.json and uses that value. The motivation for this change is that the Gemma3TextConfig constructor adjusts the sliding_window value, which can lead to inconsistencies when converting models as we expects this value to match the original model's configuration. Refs: https://github.com/huggingface/transformers/blob/bb45d3631ec7026db04a77d33a52b31766372160/src/transformers/models/gemma3/configuration_gemma3.py#L230 * fix flake8 error * add link to huggingface PR

* ggml: allow casting between f32 and i32 * fix cuda * add vulkan * fix CPU non-cont * add non-cont test case * add note * extend test number range * correct note * add cont version for vulkan

* metal : refactor ggml-ci * cont : refactor FA-vec kernel * cont : print metal library load time * minor : warn to debug + bettern kernel names ggml-ci * metal : optimize mul_mv q8_0 ggml-ci * metal : simplify FA pipeline creation functions ggml-ci * metal : improve naming consistency * metal : safer function constants offsets ggml-ci * metal : comments ggml-ci

…s too large (ggml-org#15868) * cuda : fix supports_op condition for get_rows when src1->ne2 > 1 ggml-ci * ggml : add comment about ggml_get_rows ggml-ci * cuda : add FIXME [no ci] * cuda : update support condition ggml-ci

ggml-org#15533) * Add DeepSeek V3.1 thinking mode support - Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value - Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation) - Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format: - Extracts reasoning content before '</think>' tag into reasoning_content - Extracts regular content after '</think>' tag into content - No opening '<think>' tag in V3.1 format - Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking' - Added V3.1 case to parsing switch statement This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag. * Another attempt by V3.1 non-thinking * Fix test, but it's not asserting anything. * Ignore vim swap files in tests dir * Update the test * Try using try_find_literal instead of regex * passing test * Revert "Try using try_find_literal instead of regex" This reverts commit c50d887. * Remove unnecessary change * Remove comment * Add code to handle non-thinking mode. * Try to set message['prefix'] when thinking is enabled. * This fixes reasoning, but breaks normal content. We need state in the chat parser. * DeepSeek V3.1 thinking is now the default. Disable with `--reasoning-budget 0`. * Simplify (DeepSeek V3.1 reasoning) * Fix sign inversion bug * Add some tool calling code (not working). * Tool calls working in non-reasoning mode. * Attempt a unit test for tool call parsing. * Passing test * Add tests for both happy path and broken fenced DeepSeek V3.1 tool call variants. * Passing DeepSeek V3.1 tool call tests, but model is not working. * Revert assistance response prefill change. Not my monkeys. * Add fenced_thinking unit test variant. Passes, but thinking tool calling still isn't working for some reason. * Tests pass in reasoning mode. Also e2e tool test passes. * Make a copy of the parse_json_tool_calls function for deepseek-v3.1 so as to not accidentally introduce regressions. * Fix thinking_forced_open logic. tool calling broken. Need to add another test case. * That's what I get for cargo culting a newline. * Add multi tool call test for deepseek v3.1 non-reasoning * Move test, remove .gitignore change * Place deepseek-v3.1 reasoning test directly into existing reasoning function per CISC's request. * Address whitespace CI failure. * Merge two assert_equals per CISC's request. * Add DeepSeek-V3.1 tests to tests/test-chat.cpp per CISC's request. * Merge deepseek V3.1 and regular parse_json_tool_calls() function behaviors by adding optional update_cursor argument. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * DeepSeek V3.1 fix reasoning_format none * Strip grammar down to strictly what we expect based on model card. Throw out parts we cargo culted from R1 that don't make sense. * Update tests/test-chat-parser.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * DeepSeek V3.1 - Add edge case where thinking is forced open, there is tool calling in the reasoning content, but then the model just stops the output without closing the </think> tag, so it's not a partial. In this case, use the tool call in the reasoning content. * DeepSeek V3.1 - simplify update_cursor * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update common/chat.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Fix indent --------- Co-authored-by: openhands <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

* vulkan: sort graph to allow more parallel execution Add a backend proc to allow the backend to modify the graph. The vulkan implementation looks at which nodes depend on each other and greedily reorders them to group together nodes that don't depend on each other. It only reorders the nodes, doesn't change the contents of any of them. With ggml-org#15489, this reduces the number of synchronizations needed. * call optimize_graph per-split

Add svg and png based off llama1-icon.svg

noemotiovon and others added 16 commits September 8, 2025 10:03

tests: large sizes for get_rows (ggml-org#15687)

d413dca

context : fix n_outputs during reserve (ggml-org#15858)

663027f

ggml-ci

batched-bench : fix llama_synchronize usage during prompt processing (g…

a885dcf

…gml-org#15835) ggml-ci

CUDA: non-contiguous src0 not supported for PAD (ggml-org#15869)

5ef22d2

ggml: allow casting between f32 and i32 (ggml-org#15783)

9fcb29f

* ggml: allow casting between f32 and i32 * fix cuda * add vulkan * fix CPU non-cont * add non-cont test case * add note * extend test number range * correct note * add cont version for vulkan

server : bring back timings_per_token (ggml-org#15879)

56920f5

CUDA: generate_cu_files.py - add missing mxfp4 (ggml-org#15880)

0a16bf5

media : add llama1 icon (ggml-org#15878)

fe1c92c

Add svg and png based off llama1-icon.svg

json : support enum values within allOf (ggml-org#15830)

7057faf

jan-service-account merged commit a153710 into dev Sep 9, 2025
3 checks passed

jan-service-account deleted the update-dev-from-master-2025-09-09-00-33 branch September 9, 2025 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6423 #246

Sync master with upstream release b6423 #246

Uh oh!

jan-service-account commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Sync master with upstream release b6423 #246

Sync master with upstream release b6423 #246

Uh oh!

Conversation

jan-service-account commented Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants