Skip to content

Releases: ngxson/llama.cpp

b6350

01 Sep 19:48
d4d8dbe
Compare
Choose a tag to compare
vulkan: use memory budget extension to read memory usage (#15545)

* vulkan: use memory budget extension to read memory usage

* fix: formatting and names

* formatting

* fix: detect and cache memory budget extension availability on init

* fix: read `budgetprops.heapBudget` instead of `heap.size` when memory budget extension is available

* style: lints

b6349

01 Sep 19:42
35a42ed
Compare
Choose a tag to compare
vulkan: add missing clamps in new mul_mat_id paths (#15702)

This is a missing interaction between #15546 and #15652

b6348

01 Sep 19:19
fec7911
Compare
Choose a tag to compare
vulkan: disable large mmv subgroups on older Nvidia GPUs (#15717)

b6347

01 Sep 18:30
078ce23
Compare
Choose a tag to compare
ggml: SVE support for exponential functions (#15145)

* SVE support for exponential functions

Add const notation to variable pg

* Update ggml/src/ggml-cpu/vec.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Add const

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b6344

01 Sep 14:45
02c1813
Compare
Choose a tag to compare
Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants …

b6343

01 Sep 12:47
77dee9d
Compare
Choose a tag to compare
ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695)

* ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops

This commit adds support for the TRANSPOSE and RESHAPE operations in the
ggml webgpu backend.

Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

b6340

01 Sep 01:20
b9382c3
Compare
Choose a tag to compare
CANN: Optimize MUL_MAT_ID (#15658)

b6337

31 Aug 17:28
0d161f0
Compare
Choose a tag to compare
server : enable /slots by default and make it secure (#15630)

* server : enable /slots by default and make it secure

ggml-ci

* server : fix tests to pass `--no-slots` when necessary

* server : extend /props with info about enabled endpoints

b6335

31 Aug 16:14
2749662
Compare
Choose a tag to compare
llama : fix fattn reserve call n_seqs parameter (#15699)

ggml-ci

b6334

31 Aug 14:39
9777032
Compare
Choose a tag to compare
llama : separate compute buffer reserve from fattn check (#15696)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.