Releases · ggml-org/llama.cpp

03 Apr 08:23

193c3e0

b5038

fix MUSA compiler warning (#12704)

* fix MUSA compiler warning

* replace (void) with GGML_UNUSED

Assets 26

03 Apr 08:10

github-actions

b5037

65cfe13

b5037

CANN: Support operator SIN COS ARGMAX (#12709)

* [CANN]support sin cos argmax

Signed-off-by: noemotiovon <[email protected]>

* [CANN]codestyle adjustment

Signed-off-by: noemotiovon <[email protected]>

* [CANN]Remove redundant code

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
Co-authored-by: noemotiovon <[email protected]>

Assets 26

03 Apr 02:34

github-actions

b5036

3f9da22

b5036

Simplify and improve CUDA graphs through use of indirect copy pointer…

Assets 26

03 Apr 01:34

github-actions

b5035

2a0dc97

b5035

CANN: Fix failed test cases (#12708)

* CANN: Fix memory waste in aclnn_tensor

* CANN: fix backend ops fail

* CANN: fix acl_tensor memory alloc.

* CANN: format

* CANN: remove trailing whitespace

Assets 26

03 Apr 00:42

github-actions

b5034

97a20c0

b5034

opencl: use `max_alloc_size` in backend ctx instead of querying again…

Assets 26

02 Apr 20:15

github-actions

b5033

f01bd02

b5033

vulkan: Implement split_k for coopmat2 flash attention. (#12627)

When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.

Assets 26

02 Apr 18:54

github-actions

b5032

6f3bd38

b5032

cmake: remove caching from vulkan coopmat checks (#12719)

Assets 25

02 Apr 18:33

github-actions

b5031

be0a0f8

b5031

vulkan: Implement grouped query attention in the coopmat2 FA shader (…

Assets 25

02 Apr 18:23

github-actions

b5030

92e3006

b5030

Vulkan: Fix mmq int dot float cache size (#12722)

Assets 26

02 Apr 14:40

github-actions

b5029

833e2b7

b5029

model : print tensor size during load (#12711)

* model : print tensor size during load

* cont : fix units MB -> MiB

Co-authored-by: Diego Devesa <[email protected]>

---------

Co-authored-by: Diego Devesa <[email protected]>

Assets 25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5038

Uh oh!

b5037

Uh oh!

b5036

Uh oh!

b5035

Uh oh!

b5034

Uh oh!

b5033

Uh oh!

b5032

Uh oh!

b5031

Uh oh!

b5030

Uh oh!

b5029

Uh oh!