Skip to content

Releases: ggml-org/llama.cpp

b6138

12 Aug 10:17
60a7658
Compare
Choose a tag to compare
opencl: allow mixed f16/f32 `add` (#15140)

b6137

12 Aug 10:02
efe3a90
Compare
Choose a tag to compare
CUDA cmake: add `-lineinfo` for easier debug (#15260)

b6136

12 Aug 08:26
bbd57b7
Compare
Choose a tag to compare
CANN: GGML_OP_CPY optimization (#15070)

Signed-off-by: noemotiovon <[email protected]>

b6135

12 Aug 03:01
25ff6f7
Compare
Choose a tag to compare
musa: fix failures in test-backend-ops for mul_mat_id op (#15236)

* musa: fix failures in test-backend-ops for mul_mat_id op

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>

b6134

11 Aug 15:25
be48528
Compare
Choose a tag to compare
CANN: Add broadcast for softmax and FA (#15208)

* refactor softmax

* fix fa

* fix mask shape

* format

* add comments

* Remove whitespace

b6133

11 Aug 15:30
cf9e564
Compare
Choose a tag to compare
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode.…

b6132

11 Aug 14:42
fba5c0d
Compare
Choose a tag to compare
chat : hotfix gpt-oss jinja raising an exception (#15243)

* chat : hotfix gpt-oss jinja raising an exception

* fix

b6131

11 Aug 13:06
53d0a12
Compare
Choose a tag to compare
server : allow specifying reasoning_format in HTTP request (#15238)

b6129

11 Aug 11:28
228f724
Compare
Choose a tag to compare
kv-cache : fix seq_rm with seq_id == -1 (#15226)

* kv-cache : fix seq_rm with seq_id == -1

ggml-ci

* cont : iterate over streams

ggml-ci

b6128

11 Aug 10:16
cd3069d
Compare
Choose a tag to compare
kv-cache : log (debug) all streams in find_slot (#15176)

This commit updates `llama_kv_cache_unified::find_slot` to log
information for all streams when debug is enabled.

The motivation for this change is that currently if a non-unified
kv-cache is used, then only one stream will be logged because the
code was currently uses `seq_to_stream[1]`.