Skip to content

Releases: Xarbirus/llama.cpp

b4513

20 Jan 10:34
ef6dada

Choose a tag to compare

cont : fix whitespaces (#11305)

b4493

16 Jan 18:26
9c8dcef

Choose a tag to compare

CUDA: backwards pass for misc. ops, add tests (#11257)

* CUDA: backwards pass for misc. ops, add tests

* remove restrict from pointers

b4393

27 Dec 00:20
d79d8f3

Choose a tag to compare

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

b4320

16 Dec 23:52
64ae065

Choose a tag to compare

vulkan: small mul_mat_vec optimizations (#10665)

* double the number of rows per workgroup

* Update ggml-vulkan.cpp

* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats

* only increase the number of rows for amd and subgroup size 64

* fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested

* use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)

* manual merge ggml-vulkan.cpp

* set min and max subgroup size in any case

* Also double the number of rows for Intel GPUs

b4240

02 Dec 22:45
64ed209

Choose a tag to compare

server: Add "tokens per second" information in the backend (#10548)

* add cmake rvv support

* add timings

* remove space

* update readme

* fix

* fix code

* remove empty line

* add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b4061

09 Nov 17:33
6423c65

Choose a tag to compare

metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

b3969

23 Oct 20:35
190a37d

Choose a tag to compare

sync : ggml

b3917

14 Oct 14:22
a89f75e

Choose a tag to compare

server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

b3810

23 Sep 19:53
1d48e98

Choose a tag to compare

readme : add programmable prompt engine language CLI (#9599)

b3767

16 Sep 08:44
5c3d0f1

Choose a tag to compare

ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)

* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before