Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6141
b6140
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…
b6139
sycl: Fix and disable more configurations of mul_mat (#15151) * sycl: Fix and disable more configurations of mul_mat * Disable more configurations
b6138
opencl: allow mixed f16/f32 `add` (#15140)
b6137
CUDA cmake: add `-lineinfo` for easier debug (#15260)
b6136
CANN: GGML_OP_CPY optimization (#15070) Signed-off-by: noemotiovon <[email protected]>
b6135
musa: fix failures in test-backend-ops for mul_mat_id op (#15236) * musa: fix failures in test-backend-ops for mul_mat_id op Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>
b6134
CANN: Add broadcast for softmax and FA (#15208) * refactor softmax * fix fa * fix mask shape * format * add comments * Remove whitespace
b6133
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode.…
b6132
chat : hotfix gpt-oss jinja raising an exception (#15243) * chat : hotfix gpt-oss jinja raising an exception * fix