Releases: Xarbirus/llama.cpp
Releases · Xarbirus/llama.cpp
b4513
b4493
CUDA: backwards pass for misc. ops, add tests (#11257) * CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers
b4393
vulkan: multi-row k quants (#10846) * multi row k quant shaders! * better row selection * more row choices * readjust row selection * rm_kq=2 by default
b4320
vulkan: small mul_mat_vec optimizations (#10665) * double the number of rows per workgroup * Update ggml-vulkan.cpp * Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats * only increase the number of rows for amd and subgroup size 64 * fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested * use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721) * manual merge ggml-vulkan.cpp * set min and max subgroup size in any case * Also double the number of rows for Intel GPUs
b4240
server: Add "tokens per second" information in the backend (#10548) * add cmake rvv support * add timings * remove space * update readme * fix * fix code * remove empty line * add test --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b4061
metal : reorder write loop in mul mat kernel + style (#10231) * metal : reorder write loop * metal : int -> short, style ggml-ci
b3969
sync : ggml
b3917
server : handle "logprobs" field with false value (#9871) Co-authored-by: Gimling <[email protected]>
b3810
readme : add programmable prompt engine language CLI (#9599)
b3767
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422) * squashed readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049 have ggml_vec_dot_q4_0 do two blocks per loop for avx try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue * shuffle * remove f16c iq4_nl as i cant make it faster than before