Skip to content

Releases: ggml-org/llama.cpp

b6132

11 Aug 14:42
fba5c0d
Compare
Choose a tag to compare
chat : hotfix gpt-oss jinja raising an exception (#15243)

* chat : hotfix gpt-oss jinja raising an exception

* fix

b6131

11 Aug 13:06
53d0a12
Compare
Choose a tag to compare
server : allow specifying reasoning_format in HTTP request (#15238)

b6129

11 Aug 11:28
228f724
Compare
Choose a tag to compare
kv-cache : fix seq_rm with seq_id == -1 (#15226)

* kv-cache : fix seq_rm with seq_id == -1

ggml-ci

* cont : iterate over streams

ggml-ci

b6128

11 Aug 10:16
cd3069d
Compare
Choose a tag to compare
kv-cache : log (debug) all streams in find_slot (#15176)

This commit updates `llama_kv_cache_unified::find_slot` to log
information for all streams when debug is enabled.

The motivation for this change is that currently if a non-unified
kv-cache is used, then only one stream will be logged because the
code was currently uses `seq_to_stream[1]`.

b6124

11 Aug 09:52
002cb1b
Compare
Choose a tag to compare
kleidiai: fix unsigned overflow bug (#15150)

* kleidiai: fix unsigned overflow bug

* address review comments

b6123

09 Aug 18:42
79c1160
Compare
Choose a tag to compare
cuda: refactored ssm_scan and use CUB (#13291)

* cuda: refactored ssm_scan to use CUB

* fixed compilation error when when not using CUB

* assign L to constant and use size_t instead of int

* deduplicated functions

* change min blocks per mp to 1

* Use cub load and store warp transpose

* suppress clang warning

b6122

09 Aug 12:14
34c9d76
Compare
Choose a tag to compare
CUDA: add attention sinks for tile and wmma (#15178)

* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma

b6121

08 Aug 22:36
e54d41b
Compare
Choose a tag to compare
gguf-py : add Numpy MXFP4 de/quantization support (#15111)

* gguf-py : add MXFP4 de/quantization support

* ggml-quants : handle zero amax for MXFP4

b6119

08 Aug 12:54
cd6983d
Compare
Choose a tag to compare
ggml : fix field name when new ggml_backend (#14944)

b6118

08 Aug 10:07
6c7e9a5
Compare
Choose a tag to compare
vendor: sync minja (#15161)

* vendor: sync minja

* Update minja.hpp

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>