Releases · ggml-org/llama.cpp

11 Aug 09:52

002cb1b

b6124

kleidiai: fix unsigned overflow bug (#15150)

* kleidiai: fix unsigned overflow bug

* address review comments

Assets 15

09 Aug 18:42

github-actions

b6123

79c1160

b6123

cuda: refactored ssm_scan and use CUB (#13291)

* cuda: refactored ssm_scan to use CUB

* fixed compilation error when when not using CUB

* assign L to constant and use size_t instead of int

* deduplicated functions

* change min blocks per mp to 1

* Use cub load and store warp transpose

* suppress clang warning

Assets 15

09 Aug 12:14

github-actions

b6122

34c9d76

b6122

CUDA: add attention sinks for tile and wmma (#15178)

* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma

Assets 15

08 Aug 22:36

github-actions

b6121

e54d41b

b6121

gguf-py : add Numpy MXFP4 de/quantization support (#15111)

* gguf-py : add MXFP4 de/quantization support

* ggml-quants : handle zero amax for MXFP4

Assets 15

08 Aug 12:54

github-actions

b6119

cd6983d

b6119

ggml : fix field name when new ggml_backend (#14944)

Assets 15

08 Aug 10:07

github-actions

b6118

6c7e9a5

b6118

vendor: sync minja (#15161)

* vendor: sync minja

* Update minja.hpp

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

08 Aug 06:35

github-actions

b6117

1425f58

b6117

CUDA: attention sinks for mma FlashAttention (#15157)

Assets 15

08 Aug 05:06

github-actions

b6116

aaa3d07

b6116

opencl: support sink in `soft_max` (attn sinks) (#15152)

Assets 15

07 Aug 21:40

github-actions

b6115

50aa938

b6115

convert : support non-mxfp4 HF model (#15153)

* convert : support non-mxfp4 HF model

* rm redundant check

* disable debug check

Assets 15

07 Aug 21:04

github-actions

b6114

c4f5356

b6114

vulkan: support fattn sinks (#15126)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6124

Uh oh!

b6123

Uh oh!

b6122

Uh oh!

b6121

Uh oh!

b6119

Uh oh!

b6118

Uh oh!

b6117

Uh oh!

b6116

Uh oh!

b6115

Uh oh!

b6114

Uh oh!