Skip to content

Releases: jeffbolznv/llama.cpp

b6407

07 Sep 21:46
3976dfb
Compare
Choose a tag to compare
vulkan: support im2col_3d (#15795)

b6405

07 Sep 17:41
c97b5e5
Compare
Choose a tag to compare
vulkan: Support pad_ext (#15794)

b6401

06 Sep 19:26
c4df49a
Compare
Choose a tag to compare
kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16…

b6381

04 Sep 13:34
c1c354e
Compare
Choose a tag to compare
CANN: Refactor ND to NZ workspace to be per-device (#15763)

* CANN:Refactor ND to NZ workspace to be per-device in Ascend backend

- Replaced the previous single global ND→NZ workspace with a per-device
  cache using unordered_map keyed by device ID.
- Functions `release_nz_workspace`, `relloc_nz_workspace`, and
  `get_nz_workspace` now manage workspace independently for each device,
  preventing memory conflicts in multi-device / pipeline parallel scenarios.
- This change fixes potential precision issues caused by workspace
  overwrites when multiple devices perform ND→NZ conversions concurrently.

Co-authored-by: hipudding <[email protected]>

* refactor

Signed-off-by: noemotiovon <[email protected]>

* rename

Signed-off-by: noemotiovon <[email protected]>

* fix review comments

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
Co-authored-by: hipudding <[email protected]>

b6373

03 Sep 19:45
0fce7a1
Compare
Choose a tag to compare
vulkan: don't use std::string in load_shaders, to improve compile tim…

b6350

01 Sep 23:03
d4d8dbe
Compare
Choose a tag to compare
vulkan: use memory budget extension to read memory usage (#15545)

* vulkan: use memory budget extension to read memory usage

* fix: formatting and names

* formatting

* fix: detect and cache memory budget extension availability on init

* fix: read `budgetprops.heapBudget` instead of `heap.size` when memory budget extension is available

* style: lints

b6337

31 Aug 18:24
0d161f0
Compare
Choose a tag to compare
server : enable /slots by default and make it secure (#15630)

* server : enable /slots by default and make it secure

ggml-ci

* server : fix tests to pass `--no-slots` when necessary

* server : extend /props with info about enabled endpoints

b6319

29 Aug 18:50
792b44f
Compare
Choose a tag to compare
server : add documentation for `parallel_tool_calls` param (#15647)

Co-authored-by: Pierre F <[email protected]>

b6314

28 Aug 23:13
a8bca68
Compare
Choose a tag to compare
fix: Compute the full sum in llama-eval-callback, not just the sum of…

b6285

26 Aug 11:40
79a5462
Compare
Choose a tag to compare
mtmd : support Kimi VL model (#15458)

* convert : fix tensor naming conflict for llama 4 vision

* convert ok

* support kimi vision model

* clean up

* fix style

* fix calc number of output tokens

* refactor resize_position_embeddings

* add test case

* rename build fn

* correct a small bug