Releases · ggml-org/llama.cpp

14 Aug 06:43

3ea913f

b6153 Latest

Latest

perplexity: give more information about constraints on failure (#15303)

* perplexity: give more information about constraints on failure

This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.

* log formatting

* log error and return instead of storing max_seq_exceeded int

* check if s0 is zero for -np check

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-14T06:43:58Z
llama-b6153-bin-macos-arm64.zip

sha256:3cabf9a67bb62142d82d1c4c9ea72a385e45488bdbdce5c855bc6f27babe8026

10.8 MB 2025-08-14T06:44:12Z
llama-b6153-bin-macos-x64.zip

sha256:526f2e36a25fa063967e15e16c5e19a47931a95d9180fa426db5de8cc89845e5

27.7 MB 2025-08-14T06:44:13Z
llama-b6153-bin-ubuntu-vulkan-x64.zip

sha256:6dedf805fe2ad5d1038a6d533c4d58ee1fdb4525c9da9a78de9c80fe99683417

21.5 MB 2025-08-14T06:44:15Z
llama-b6153-bin-ubuntu-x64.zip

sha256:c87c86e5e8388f453481a5483ce4e381b76ccd173d113e32d94ffff607e91d33

12.8 MB 2025-08-14T06:44:17Z
llama-b6153-bin-win-cpu-arm64.zip

sha256:52c08bde1fb27d26ecca9a693fe17cc7f794c7599cec69994c3f7fbc109d9761

11 MB 2025-08-14T06:44:18Z
llama-b6153-bin-win-cpu-x64.zip

sha256:24fb4e8782c6af8d4e811692d556bd02b127e0088330c6a8df538dbe6c2014d4

13.9 MB 2025-08-14T06:44:19Z
llama-b6153-bin-win-cuda-12.4-x64.zip

sha256:68d2d683a6e3b82e7d95d77956642ba8bbe6709b5b15f9ad0ebc5e98e7d864ab

139 MB 2025-08-14T06:44:20Z
llama-b6153-bin-win-hip-radeon-x64.zip

sha256:799c45b7d75793ab46d673f90a562766e281496ac9f82bbf6afe82a574550ccb

287 MB 2025-08-14T06:44:27Z
llama-b6153-bin-win-opencl-adreno-arm64.zip

sha256:7a96745733d84552d8a40263ccae9d712b2a0ccf22b79d3c7f9ca25d774b5a1d

11.4 MB 2025-08-14T06:44:38Z
Source code (zip)

2025-08-14T06:16:32Z
Source code (tar.gz)

2025-08-14T06:16:32Z

13 Aug 19:52

github-actions

b6152

29c8fbe

b6152

HIP: bump requirement to rocm 6.1 (#15296)

Assets 15

13 Aug 14:09

github-actions

b6150

b3e1666

b6150

server : enable -td and -tbd parameters (#15172)

Assets 15

13 Aug 12:23

github-actions

b6149

c24f4e2

b6149

ggml : update `ggml_rope_multi` (#12665)

* update `rope_multi`:

1. add `ggml_rope_multi_inplace`;
1. use `GGML_MROPE_SECTIONS` instead of 4.

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

13 Aug 11:50

github-actions

b6148

d8914fc

b6148

 common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-mo…

Assets 15

13 Aug 09:29

github-actions

b6144

00f35d5

b6144

ggml : repack block_iq4_nlx8 (#14904)

ggml-ci

Assets 15

13 Aug 09:19

github-actions

b6143

6028bf7

b6143

CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf impro…

Assets 15

13 Aug 06:20

github-actions

b6141

e71d48e

b6141

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …

Assets 15

12 Aug 20:38

github-actions

b6140

b049315

b6140

HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

Assets 15

12 Aug 12:16

github-actions

b6139

f4586ee

b6139

sycl: Fix and disable more configurations of mul_mat (#15151)

* sycl: Fix and disable more configurations of mul_mat

* Disable more configurations

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6153

Uh oh!

b6152

Uh oh!

b6150

Uh oh!

b6149

Uh oh!

b6148

Uh oh!

b6144

Uh oh!

b6143

Uh oh!

b6141

Uh oh!

b6140

Uh oh!

b6139

Uh oh!