Releases: ochafik/llama.cpp
Releases · ochafik/llama.cpp
b5495
`server`: fix format of streamed tool call deltas (diff name, fix id …
b5494
server: fix regression on streamed non-chat completion w/ stops (#13785) * more forgiving message diffs: partial stop words aren't erased, full stops are * Add (slow) server test for completion + stream + stop
b5493
examples : allow extracting embeddings from decoder contexts (#13797) ggml-ci
b5488
`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3…
b5479
server: fix/test add_generation_prompt
b5478
`server`: streaming of tool calls and thoughts when `--jinja` is on (…
b5470
ci : enable winget package updates (#13734)
b5465
CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705) * [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <[email protected]> * codestyle adjustment Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]>
b5410
llguidance : official v0.7.20 release (no actual changes) [noci] (#13…
b5401
minja: sync (qwen3) (#13573) * minja: sync https://github.com/google/minja/commit/f06140fa52fd140fe38e531ec373d8dc9c86aa06 - https://github.com/google/minja/pull/67 (@grf53) - https://github.com/google/minja/pull/66 (@taha-yassine) - https://github.com/google/minja/pull/63 (@grf53) - https://github.com/google/minja/pull/58 --------- Co-authored-by: ochafik <[email protected]>