Skip to content

Releases: ochafik/llama.cpp

b5495

26 May 14:22
d74e94c

Choose a tag to compare

`server`: fix format of streamed tool call deltas (diff name, fix id …

b5494

26 May 14:03
f13847c

Choose a tag to compare

server: fix regression on streamed non-chat completion w/ stops (#13785)

* more forgiving message diffs: partial stop words aren't erased, full stops are

* Add (slow) server test for completion + stream + stop

b5493

26 May 11:57
79c137f

Choose a tag to compare

examples : allow extracting embeddings from decoder contexts (#13797)

ggml-ci

b5488

25 May 23:42
e121edc

Choose a tag to compare

`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3…

b5479

25 May 07:21

Choose a tag to compare

server: fix/test add_generation_prompt

b5478

25 May 07:18
f5cd27b

Choose a tag to compare

`server`: streaming of tool calls and thoughts when `--jinja` is on (…

b5470

24 May 08:30
b775345

Choose a tag to compare

ci : enable winget package updates (#13734)

b5465

23 May 09:53
faaaff5

Choose a tag to compare

CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705)

* [CANN]Support MUL_MAT_ID Q8 && Q4

Signed-off-by: noemotiovon <[email protected]>

* codestyle adjustment

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

b5410

16 May 22:36
3e0be1c

Choose a tag to compare

llguidance : official v0.7.20 release (no actual changes) [noci] (#13…

b5401

15 May 22:49
bc098c3

Choose a tag to compare

minja: sync (qwen3) (#13573)

* minja: sync https://github.com/google/minja/commit/f06140fa52fd140fe38e531ec373d8dc9c86aa06

- https://github.com/google/minja/pull/67 (@grf53)
- https://github.com/google/minja/pull/66 (@taha-yassine)
- https://github.com/google/minja/pull/63 (@grf53)
- https://github.com/google/minja/pull/58

---------

Co-authored-by: ochafik <[email protected]>