Releases: ServeurpersoCom/llama.cpp
Releases · ServeurpersoCom/llama.cpp
b6689
rpc : check src buffer when copying tensor (#16421) Only dst buffer is guaranteed to be an RPC buffer. Add check for the src one.
b6688
rpc : add support for multiple devices (#16276) * rpc : add support for multiple devices Allow rpc-server to expose multiple devices from a single endpoint. Change RPC protocol to include device identifier where needed. closes: #15210 * fixes * use ggml_backend_reg_t * address review comments * fix llama-bench backend report * address review comments, change device naming * fix cmd order
b6686
chat : support Magistral thinking (#16413) * feat: added a dedicated Magistral chat format that preserves [THINK] spans, parses reasoning before tool calls * feat: new flow in the chat template test suite for Magistral
b6684
metal : fix loop bound in ggml_mem_ranges (#16412)
b6683
llama : fix shapes for bert/mpt q/k norm (#16409)
b6679
vulkan: Fix FA coopmat1 invalid array indexing (#16365) When computing sinks, the cm1 shader was looping r from 0 to Br rather than to rows_per_thread. I must have copied this from the scalar path (where it is correct), and somehow it wasn't causing failures on current drivers.
b6676
vulkan: in flash attention, bounds check against nem1 (don't rely on …
b6673
test-barrier : do not use more threads than physically available (#16…
b6670
musa: update compile flags (#16265) Signed-off-by: Xiaodong Ye <[email protected]>
b6668
ci: update vulkan ci (#16294)