Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4909
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentat…
b4908
fixed compilation warnings in ggml-sycl (#12424)
b4907
llama: Add support for RWKV v7 architecture (#12412) * ggml: Add op l2_norm Signed-off-by: Molly Sophia <[email protected]> * ggml: Add op rwkv_wkv7 Signed-off-by: Molly Sophia <[email protected]> * llama: Add support for RWKV7 and ARWKV7 models Signed-off-by: Molly Sophia <[email protected]> * llama: fix inference with RWKV6Qwen2 Signed-off-by: Molly Sophia <[email protected]> * llama: add more (a)rwkv7 variants in size Signed-off-by: Molly Sophia <[email protected]> * Apply code-format changes Signed-off-by: Molly Sophia <[email protected]> * fix MUSA build Signed-off-by: Molly Sophia <[email protected]> * llama: fix shape error with rwkv using llama-parallel Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>
b4905
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394) * Enable CUDA Graph on CTK < 12.x `cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x. * Fix compilation errors with MUSA * Disable CUDA Graph for MUSA
b4904
ggml-vulkan: remove unused find_program(glslc) (#12416) It's already found by FindVulkan.cmake in the parent CMakeLists
b4903
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312)
b4902
vulkan: subgroup size tuning (#12087) * vulkan: subgroup size test * Vulkan: Add device architecture enum and logic to recognize AMD generations * vulkan: use new architecture logic to specify subgroup size * Initial vulkan subgroup size tuning for RDNA3 * vulkan: commonize RDNA subgroup tuning * vulkan: override subgroup size if required_subgroup_size = 0 * vulkan: disable warp 32 for RDNA3 * vulkan: fine tuned RDNA1 subgroup sizes * vulkan: adjusted subgroup size map * vulkan: fixed RDNA2 subgroup map --------- Co-authored-by: 0cc4m <[email protected]>
b4901
vulkan: use fp32 in coopmat2 q4_k dequant function (#12309)
b4900
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bound…
b4899
vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258)