Releases · ngxson/llama.cpp

01 Sep 19:48

d4d8dbe

b6350 Latest

Latest

vulkan: use memory budget extension to read memory usage (#15545)

* vulkan: use memory budget extension to read memory usage

* fix: formatting and names

* formatting

* fix: detect and cache memory budget extension availability on init

* fix: read `budgetprops.heapBudget` instead of `heap.size` when memory budget extension is available

* style: lints

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-01T19:48:35Z
llama-b6350-bin-macos-arm64.zip

sha256:777c3edbe3eb078bfd48487f337877bb379de67c74a1d7a43226468ccb19eab7

11 MB 2025-09-01T19:48:45Z
llama-b6350-bin-macos-x64.zip

sha256:504d17c00504106cf7eedb7fcd98de57022d876c26d2b5610a9e208b3dcd9a8c

28.4 MB 2025-09-01T19:48:46Z
llama-b6350-bin-ubuntu-vulkan-x64.zip

sha256:4e6d025496d02e834c3c6399493b45b9356986b26878e54591924f1b4c29a70f

25.8 MB 2025-09-01T19:48:47Z
llama-b6350-bin-ubuntu-x64.zip

sha256:c7d93f68da334d25247464c819379d3124ab4add3df786ec28bbe09ec0f8f458

13 MB 2025-09-01T19:48:49Z
llama-b6350-bin-win-cpu-arm64.zip

sha256:0a34b50fe61e6b305d5d06bdc7fa1cf7654a005f810f88058b87fd4dfd337236

11.2 MB 2025-09-01T19:48:50Z
llama-b6350-bin-win-cpu-x64.zip

sha256:02991ca515782a10a39328788107b060b7a47834868c1b79d3397271e6e77236

14.2 MB 2025-09-01T19:48:51Z
llama-b6350-bin-win-cuda-12.4-x64.zip

sha256:13dbfe03bcf299874c629214610f7db944b3733b934a7e0547c8d532e4611d6f

138 MB 2025-09-01T19:48:52Z
llama-b6350-bin-win-hip-radeon-x64.zip

sha256:303bbe5fede445213ce414171e17c7e38d647a7ec300ad08cfcc1365fef57778

287 MB 2025-09-01T19:48:55Z
llama-b6350-bin-win-opencl-adreno-arm64.zip

sha256:cd9521a6473c95f3c1651f7c0ac77b203b811d757f06025dd2da2e50f82d548d

11.6 MB 2025-09-01T19:49:02Z
Source code (zip)

2025-09-01T19:17:42Z
Source code (tar.gz)

2025-09-01T19:17:42Z

01 Sep 19:42

github-actions

b6349

35a42ed

b6349

vulkan: add missing clamps in new mul_mat_id paths (#15702)

This is a missing interaction between #15546 and #15652

Assets 15

01 Sep 19:19

github-actions

b6348

fec7911

b6348

vulkan: disable large mmv subgroups on older Nvidia GPUs (#15717)

Assets 15

01 Sep 18:30

github-actions

b6347

078ce23

b6347

ggml: SVE support for exponential functions (#15145)

* SVE support for exponential functions

Add const notation to variable pg

* Update ggml/src/ggml-cpu/vec.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Add const

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

01 Sep 14:45

github-actions

b6344

02c1813

b6344

Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants …

Assets 15

01 Sep 12:47

github-actions

b6343

77dee9d

b6343

ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695)

* ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops

This commit adds support for the TRANSPOSE and RESHAPE operations in the
ggml webgpu backend.

Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

01 Sep 01:20

github-actions

b6340

b9382c3

b6340

CANN: Optimize MUL_MAT_ID (#15658)

Assets 15

31 Aug 17:28

github-actions

b6337

0d161f0

b6337

server : enable /slots by default and make it secure (#15630)

* server : enable /slots by default and make it secure

ggml-ci

* server : fix tests to pass `--no-slots` when necessary

* server : extend /props with info about enabled endpoints

Assets 15

31 Aug 16:14

github-actions

b6335

2749662

b6335

llama : fix fattn reserve call n_seqs parameter (#15699)

ggml-ci

Assets 15

31 Aug 14:39

github-actions

b6334

9777032

b6334

llama : separate compute buffer reserve from fattn check (#15696)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6350

Uh oh!

b6349

Uh oh!

b6348

Uh oh!

b6347

Uh oh!

b6344

Uh oh!

b6343

Uh oh!

b6340

Uh oh!

b6337

Uh oh!

b6335

Uh oh!

b6334

Uh oh!