Releases · jeffbolznv/llama.cpp

07 Sep 21:46

3976dfb

b6407 Latest

Latest

vulkan: support im2col_3d (#15795)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-07T21:46:17Z
llama-b6407-bin-macos-arm64.zip

sha256:3f9db6699f2e95748525afe043b2b88d8b06d1f2173fb3ce0c8a849529e9740a

11.1 MB 2025-09-07T21:46:35Z
llama-b6407-bin-macos-x64.zip

sha256:429f74602c7655f2ff971fca0e75ed65b769eba9219257fb1d0d998a26db5928

28.6 MB 2025-09-07T21:46:36Z
llama-b6407-bin-ubuntu-vulkan-x64.zip

sha256:b247858b7aac15b3b7223b8729efb4a99da81f8ae8b88512658abf66cf99208a

25.8 MB 2025-09-07T21:46:38Z
llama-b6407-bin-ubuntu-x64.zip

sha256:634cf618c6f89fc29bdc07b801c695c00f15ef4151ebe9a1573a6f32bff21314

13.1 MB 2025-09-07T21:46:39Z
llama-b6407-bin-win-cpu-arm64.zip

sha256:c9fe8289a92bc28f451ba6036bcad63ec351849b15f6b587b6141a9079536999

11.3 MB 2025-09-07T21:46:41Z
llama-b6407-bin-win-cpu-x64.zip

sha256:d8afa9842f099ec257266ef3488408b2165db45934cfd7973993729be5cff249

14.3 MB 2025-09-07T21:46:41Z
llama-b6407-bin-win-cuda-12.4-x64.zip

sha256:01e7be36ee8735cb526f9bd2ab2ac9c073c6cf13add2ab5dc22f86aad010709f

138 MB 2025-09-07T21:46:43Z
llama-b6407-bin-win-hip-radeon-x64.zip

sha256:7ad9cda600bf7771bf05dc07364fd3a59336e9cc8dcd4c60597d72ab91d38a50

287 MB 2025-09-07T21:46:49Z
llama-b6407-bin-win-opencl-adreno-arm64.zip

sha256:40df1cc350e8cc0bdfe059ae3e86145e7ff47f5a6a68d3153189db04a219b043

11.7 MB 2025-09-07T21:46:59Z
Source code (zip)

2025-09-07T18:50:26Z
Source code (tar.gz)

2025-09-07T18:50:26Z

07 Sep 17:41

github-actions

b6405

c97b5e5

b6405

vulkan: Support pad_ext (#15794)

Assets 15

06 Sep 19:26

github-actions

b6401

c4df49a

b6401

kleidiai: generalize compute_forward_kv_cache to compute_forward_fp16…

Assets 15

04 Sep 13:34

github-actions

b6381

c1c354e

b6381

CANN: Refactor ND to NZ workspace to be per-device (#15763)

* CANN:Refactor ND to NZ workspace to be per-device in Ascend backend

- Replaced the previous single global ND→NZ workspace with a per-device
  cache using unordered_map keyed by device ID.
- Functions `release_nz_workspace`, `relloc_nz_workspace`, and
  `get_nz_workspace` now manage workspace independently for each device,
  preventing memory conflicts in multi-device / pipeline parallel scenarios.
- This change fixes potential precision issues caused by workspace
  overwrites when multiple devices perform ND→NZ conversions concurrently.

Co-authored-by: hipudding <[email protected]>

* refactor

Signed-off-by: noemotiovon <[email protected]>

* rename

Signed-off-by: noemotiovon <[email protected]>

* fix review comments

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>
Co-authored-by: hipudding <[email protected]>

Assets 15

03 Sep 19:45

github-actions

b6373

0fce7a1

b6373

vulkan: don't use std::string in load_shaders, to improve compile tim…

Assets 15

01 Sep 23:03

github-actions

b6350

d4d8dbe

b6350

vulkan: use memory budget extension to read memory usage (#15545)

* vulkan: use memory budget extension to read memory usage

* fix: formatting and names

* formatting

* fix: detect and cache memory budget extension availability on init

* fix: read `budgetprops.heapBudget` instead of `heap.size` when memory budget extension is available

* style: lints

Assets 15

31 Aug 18:24

github-actions

b6337

0d161f0

b6337

server : enable /slots by default and make it secure (#15630)

* server : enable /slots by default and make it secure

ggml-ci

* server : fix tests to pass `--no-slots` when necessary

* server : extend /props with info about enabled endpoints

Assets 15

29 Aug 18:50

github-actions

b6319

792b44f

b6319

server : add documentation for `parallel_tool_calls` param (#15647)

Co-authored-by: Pierre F <[email protected]>

Assets 15

28 Aug 23:13

github-actions

b6314

a8bca68

b6314

fix: Compute the full sum in llama-eval-callback, not just the sum of…

Assets 15

26 Aug 11:40

github-actions

b6285

79a5462

b6285

mtmd : support Kimi VL model (#15458)

* convert : fix tensor naming conflict for llama 4 vision

* convert ok

* support kimi vision model

* clean up

* fix style

* fix calc number of output tokens

* refactor resize_position_embeddings

* add test case

* rename build fn

* correct a small bug

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: jeffbolznv/llama.cpp

b6407

Uh oh!

b6405

Uh oh!

b6401

Uh oh!

b6381

Uh oh!

b6373

Uh oh!

b6350

Uh oh!

b6337

Uh oh!

b6319

Uh oh!

b6314

Uh oh!

b6285

Uh oh!