Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6148
common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-mo…
b6144
ggml : repack block_iq4_nlx8 (#14904) ggml-ci
b6143
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf impro…
b6141
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …
b6140
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…
b6139
sycl: Fix and disable more configurations of mul_mat (#15151) * sycl: Fix and disable more configurations of mul_mat * Disable more configurations
b6138
opencl: allow mixed f16/f32 `add` (#15140)
b6137
CUDA cmake: add `-lineinfo` for easier debug (#15260)
b6136
CANN: GGML_OP_CPY optimization (#15070) Signed-off-by: noemotiovon <[email protected]>
b6135
musa: fix failures in test-backend-ops for mul_mat_id op (#15236) * musa: fix failures in test-backend-ops for mul_mat_id op Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>