Skip to content

Releases: ngxson/llama.cpp

b4909

18 Mar 07:06
fd123cf
Compare
Choose a tag to compare
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…

b4908

18 Mar 01:38
a53f7f7
Compare
Choose a tag to compare
fixed compilation warnings in ggml-sycl (#12424)

b4907

18 Mar 00:22
7dfad38
Compare
Choose a tag to compare
llama: Add support for RWKV v7 architecture (#12412)

* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <[email protected]>

* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <[email protected]>

* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <[email protected]>

* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <[email protected]>

* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <[email protected]>

* Apply code-format changes

Signed-off-by: Molly Sophia <[email protected]>

* fix MUSA build

Signed-off-by: Molly Sophia <[email protected]>

* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>

b4905

17 Mar 19:12
b1b132e
Compare
Choose a tag to compare
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394)

* Enable CUDA Graph on CTK < 12.x

`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.

* Fix compilation errors with MUSA

* Disable CUDA Graph for MUSA

b4904

17 Mar 17:22
01e8f21
Compare
Choose a tag to compare
ggml-vulkan: remove unused find_program(glslc) (#12416)

It's already found by FindVulkan.cmake in the parent CMakeLists

b4903

17 Mar 15:13
484a8ab
Compare
Choose a tag to compare
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312)

b4902

17 Mar 12:25
cf2270e
Compare
Choose a tag to compare
vulkan: subgroup size tuning (#12087)

* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <[email protected]>

b4901

17 Mar 10:42
f07690c
Compare
Choose a tag to compare
vulkan: use fp32 in coopmat2 q4_k dequant function (#12309)

b4900

17 Mar 10:33
891c639
Compare
Choose a tag to compare
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…

b4899

17 Mar 10:29
2f21123
Compare
Choose a tag to compare
vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258)