Skip to content

Releases: ggml-org/llama.cpp

b5038

03 Apr 08:23
193c3e0
Compare
Choose a tag to compare
fix MUSA compiler warning (#12704)

* fix MUSA compiler warning

* replace (void) with GGML_UNUSED

b5037

03 Apr 08:10
65cfe13
Compare
Choose a tag to compare
CANN: Support operator SIN COS ARGMAX (#12709)

* [CANN]support sin cos argmax

Signed-off-by: noemotiovon <[email protected]>

* [CANN]codestyle adjustment

Signed-off-by: noemotiovon <[email protected]>

* [CANN]Remove redundant code

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
Co-authored-by: noemotiovon <[email protected]>

b5036

03 Apr 02:34
3f9da22
Compare
Choose a tag to compare
Simplify and improve CUDA graphs through use of indirect copy pointer…

b5035

03 Apr 01:34
2a0dc97
Compare
Choose a tag to compare
CANN: Fix failed test cases (#12708)

* CANN: Fix memory waste in aclnn_tensor

* CANN: fix backend ops fail

* CANN: fix acl_tensor memory alloc.

* CANN: format

* CANN: remove trailing whitespace

b5034

03 Apr 00:42
97a20c0
Compare
Choose a tag to compare
opencl: use `max_alloc_size` in backend ctx instead of querying again…

b5033

02 Apr 20:15
f01bd02
Compare
Choose a tag to compare
vulkan: Implement split_k for coopmat2 flash attention. (#12627)

When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.

b5032

02 Apr 18:54
6f3bd38
Compare
Choose a tag to compare
cmake: remove caching from vulkan coopmat checks (#12719)

b5031

02 Apr 18:33
be0a0f8
Compare
Choose a tag to compare
vulkan: Implement grouped query attention in the coopmat2 FA shader (…

b5030

02 Apr 18:23
92e3006
Compare
Choose a tag to compare
Vulkan: Fix mmq int dot float cache size (#12722)

b5029

02 Apr 14:40
833e2b7
Compare
Choose a tag to compare
model : print tensor size during load (#12711)

* model : print tensor size during load

* cont : fix units MB -> MiB

Co-authored-by: Diego Devesa <[email protected]>

---------

Co-authored-by: Diego Devesa <[email protected]>