Releases: jeffbolznv/llama.cpp
Releases · jeffbolznv/llama.cpp
b6407
b6405
vulkan: Support pad_ext (#15794)
b6401
kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16…
b6381
CANN: Refactor ND to NZ workspace to be per-device (#15763) * CANN:Refactor ND to NZ workspace to be per-device in Ascend backend - Replaced the previous single global ND→NZ workspace with a per-device cache using unordered_map keyed by device ID. - Functions `release_nz_workspace`, `relloc_nz_workspace`, and `get_nz_workspace` now manage workspace independently for each device, preventing memory conflicts in multi-device / pipeline parallel scenarios. - This change fixes potential precision issues caused by workspace overwrites when multiple devices perform ND→NZ conversions concurrently. Co-authored-by: hipudding <[email protected]> * refactor Signed-off-by: noemotiovon <[email protected]> * rename Signed-off-by: noemotiovon <[email protected]> * fix review comments Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]> Co-authored-by: hipudding <[email protected]>
b6373
vulkan: don't use std::string in load_shaders, to improve compile tim…
b6350
vulkan: use memory budget extension to read memory usage (#15545) * vulkan: use memory budget extension to read memory usage * fix: formatting and names * formatting * fix: detect and cache memory budget extension availability on init * fix: read `budgetprops.heapBudget` instead of `heap.size` when memory budget extension is available * style: lints
b6337
server : enable /slots by default and make it secure (#15630) * server : enable /slots by default and make it secure ggml-ci * server : fix tests to pass `--no-slots` when necessary * server : extend /props with info about enabled endpoints
b6319
server : add documentation for `parallel_tool_calls` param (#15647) Co-authored-by: Pierre F <[email protected]>
b6314
fix: Compute the full sum in llama-eval-callback, not just the sum of…
b6285
mtmd : support Kimi VL model (#15458) * convert : fix tensor naming conflict for llama 4 vision * convert ok * support kimi vision model * clean up * fix style * fix calc number of output tokens * refactor resize_position_embeddings * add test case * rename build fn * correct a small bug