Releases: agray3/llama.cpp
Releases · agray3/llama.cpp
b6428
CUDA: Add mul_mat_id support for the mmf kernel (#15767) * CUDA: Add mul_mat_id support the mmf Add support for mul_mat_id for bs < 16 * Review: use warp_size, fix should_use_mmf condition * Launch one block per expert, stride along n_expert_used * templatize mul_mat_id * Pad shmem to 16 bytes, add helper function mul_mat_f_switch_ids * Reduce compile times by dividing mmf into f16, bf16 and f32 variants * Divide mmf by ncols_dst * Add missing files * Fix MUSA/HIP builds
b6206
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local
b6144
ggml : repack block_iq4_nlx8 (#14904) ggml-ci
b6082
vulkan: fix build when using glslang that does not support coopmat2 (…
b6019
opencl : add ops docs (#14910)
b5967
ggml: fix loongarch quantize_row_q8_1 error (#14827)
b5958
opencl: remove unreachable `return` (#14806)
b5707
sycl: Cleanup codepaths in Get Rows in sycl backend (#14215) Addresses unused reorder path
b5622
Vulkan: Don't default to CPU device (like llvmpipe), even if no other…
b5166
llava : update documentations (#13055) * llava : update documentations * fix typo