Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6138
opencl: allow mixed f16/f32 `add` (#15140)
b6137
CUDA cmake: add `-lineinfo` for easier debug (#15260)
b6136
CANN: GGML_OP_CPY optimization (#15070) Signed-off-by: noemotiovon <[email protected]>
b6135
musa: fix failures in test-backend-ops for mul_mat_id op (#15236) * musa: fix failures in test-backend-ops for mul_mat_id op Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>
b6134
CANN: Add broadcast for softmax and FA (#15208) * refactor softmax * fix fa * fix mask shape * format * add comments * Remove whitespace
b6133
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode.…
b6132
chat : hotfix gpt-oss jinja raising an exception (#15243) * chat : hotfix gpt-oss jinja raising an exception * fix
b6131
server : allow specifying reasoning_format in HTTP request (#15238)
b6129
kv-cache : fix seq_rm with seq_id == -1 (#15226) * kv-cache : fix seq_rm with seq_id == -1 ggml-ci * cont : iterate over streams ggml-ci
b6128
kv-cache : log (debug) all streams in find_slot (#15176) This commit updates `llama_kv_cache_unified::find_slot` to log information for all streams when debug is enabled. The motivation for this change is that currently if a non-unified kv-cache is used, then only one stream will be logged because the code was currently uses `seq_to_stream[1]`.