Releases · ngxson/llama.cpp

18 Mar 07:06

fd123cf

b4909

Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…

Assets 26

18 Mar 01:38

github-actions

b4908

a53f7f7

b4908

fixed compilation warnings in ggml-sycl (#12424)

Assets 26

18 Mar 00:22

github-actions

b4907

7dfad38

b4907

llama: Add support for RWKV v7 architecture (#12412)

* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <[email protected]>

* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <[email protected]>

* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <[email protected]>

* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <[email protected]>

* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <[email protected]>

* Apply code-format changes

Signed-off-by: Molly Sophia <[email protected]>

* fix MUSA build

Signed-off-by: Molly Sophia <[email protected]>

* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>

Assets 26

17 Mar 19:12

github-actions

b4905

b1b132e

b4905

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394)

* Enable CUDA Graph on CTK < 12.x

`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.

* Fix compilation errors with MUSA

* Disable CUDA Graph for MUSA

Assets 25

17 Mar 17:22

github-actions

b4904

01e8f21

b4904

ggml-vulkan: remove unused find_program(glslc) (#12416)

It's already found by FindVulkan.cmake in the parent CMakeLists

Assets 26

17 Mar 15:13

github-actions

b4903

484a8ab

b4903

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312)

Assets 26

17 Mar 12:25

github-actions

b4902

cf2270e

b4902

vulkan: subgroup size tuning (#12087)

* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <[email protected]>

Assets 26

17 Mar 10:42

github-actions

b4901

f07690c

b4901

vulkan: use fp32 in coopmat2 q4_k dequant function (#12309)

Assets 25

17 Mar 10:33

github-actions

b4900

891c639

b4900

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…

Assets 25

17 Mar 10:29

github-actions

b4899

2f21123

b4899

vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258)

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b4909

Uh oh!

b4908

Uh oh!

b4907

Uh oh!

b4905

Uh oh!

b4904

Uh oh!

b4903

Uh oh!

b4902

Uh oh!

b4901

Uh oh!

b4900

Uh oh!

b4899

Uh oh!