Releases · ggml-org/llama.cpp

11 Aug 14:42

fba5c0d

b6132

chat : hotfix gpt-oss jinja raising an exception (#15243)

* chat : hotfix gpt-oss jinja raising an exception

* fix

Assets 15

11 Aug 13:06

github-actions

b6131

53d0a12

b6131

server : allow specifying reasoning_format in HTTP request (#15238)

Assets 15

11 Aug 11:28

github-actions

b6129

228f724

b6129

kv-cache : fix seq_rm with seq_id == -1 (#15226)

* kv-cache : fix seq_rm with seq_id == -1

ggml-ci

* cont : iterate over streams

ggml-ci

Assets 15

11 Aug 10:16

github-actions

b6128

cd3069d

b6128

kv-cache : log (debug) all streams in find_slot (#15176)

This commit updates `llama_kv_cache_unified::find_slot` to log
information for all streams when debug is enabled.

The motivation for this change is that currently if a non-unified
kv-cache is used, then only one stream will be logged because the
code was currently uses `seq_to_stream[1]`.

Assets 15

11 Aug 09:52

github-actions

b6124

002cb1b

b6124

kleidiai: fix unsigned overflow bug (#15150)

* kleidiai: fix unsigned overflow bug

* address review comments

Assets 15

09 Aug 18:42

github-actions

b6123

79c1160

b6123

cuda: refactored ssm_scan and use CUB (#13291)

* cuda: refactored ssm_scan to use CUB

* fixed compilation error when when not using CUB

* assign L to constant and use size_t instead of int

* deduplicated functions

* change min blocks per mp to 1

* Use cub load and store warp transpose

* suppress clang warning

Assets 15

09 Aug 12:14

github-actions

b6122

34c9d76

b6122

CUDA: add attention sinks for tile and wmma (#15178)

* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma

Assets 15

08 Aug 22:36

github-actions

b6121

e54d41b

b6121

gguf-py : add Numpy MXFP4 de/quantization support (#15111)

* gguf-py : add MXFP4 de/quantization support

* ggml-quants : handle zero amax for MXFP4

Assets 15

08 Aug 12:54

github-actions

b6119

cd6983d

b6119

ggml : fix field name when new ggml_backend (#14944)

Assets 15

08 Aug 10:07

github-actions

b6118

6c7e9a5

b6118

vendor: sync minja (#15161)

* vendor: sync minja

* Update minja.hpp

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6132

Uh oh!

b6131

Uh oh!

b6129

Uh oh!

b6128

Uh oh!

b6124

Uh oh!

b6123

Uh oh!

b6122

Uh oh!

b6121

Uh oh!

b6119

Uh oh!

b6118

Uh oh!