Skip to content

Releases: ggml-org/llama.cpp

b6153

14 Aug 06:43
3ea913f
Compare
Choose a tag to compare
perplexity: give more information about constraints on failure (#15303)

* perplexity: give more information about constraints on failure

This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.

* log formatting

* log error and return instead of storing max_seq_exceeded int

* check if s0 is zero for -np check

b6152

13 Aug 19:52
29c8fbe
Compare
Choose a tag to compare
HIP: bump requirement to rocm 6.1 (#15296)

b6150

13 Aug 14:09
b3e1666
Compare
Choose a tag to compare
server : enable -td and -tbd parameters (#15172)

b6149

13 Aug 12:23
c24f4e2
Compare
Choose a tag to compare
ggml : update `ggml_rope_multi` (#12665)

* update `rope_multi`:

1. add `ggml_rope_multi_inplace`;
1. use `GGML_MROPE_SECTIONS` instead of 4.

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6148

13 Aug 11:50
d8914fc
Compare
Choose a tag to compare
 common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-mo…

b6144

13 Aug 09:29
00f35d5
Compare
Choose a tag to compare
ggml : repack block_iq4_nlx8 (#14904)

ggml-ci

b6143

13 Aug 09:19
6028bf7
Compare
Choose a tag to compare
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf impro…

b6141

13 Aug 06:20
e71d48e
Compare
Choose a tag to compare
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …

b6140

12 Aug 20:38
b049315
Compare
Choose a tag to compare
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

b6139

12 Aug 12:16
f4586ee
Compare
Choose a tag to compare
sycl: Fix and disable more configurations of mul_mat (#15151)

* sycl: Fix and disable more configurations of mul_mat

* Disable more configurations