Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4881
llama : refactor llama_context, llama_kv_cache, llm_build_context (#1…
b4880
server : fix crash when using verbose output with input tokens that a…
b4879
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK…
b4877
sycl : variable sg_size support for mmvq kernels (#12336)
b4876
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315) When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need to avoid launching them with parameters for warp64
b4875
llama : Add Gemma 3 support (+ experimental vision capability) (#12343) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64
b4874
vulkan: fix bug in coopmat1 mul_mat_id (#12316) * tests: run mul_mat_id with a larger N * vulkan: fix bug in coopmat1 mul_mat_id
b4873
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows …
b4872
ggml-backend : fix backend search path (#12330) * Fix backend search path * replace .native() with '/' * reverted .native()
b4871
metal : Cache the Metal library at the device context level (#12265)