Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b6350
vulkan: use memory budget extension to read memory usage (#15545) * vulkan: use memory budget extension to read memory usage * fix: formatting and names * formatting * fix: detect and cache memory budget extension availability on init * fix: read `budgetprops.heapBudget` instead of `heap.size` when memory budget extension is available * style: lints
b6349
vulkan: add missing clamps in new mul_mat_id paths (#15702) This is a missing interaction between #15546 and #15652
b6348
vulkan: disable large mmv subgroups on older Nvidia GPUs (#15717)
b6347
ggml: SVE support for exponential functions (#15145) * SVE support for exponential functions Add const notation to variable pg * Update ggml/src/ggml-cpu/vec.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Add const --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6344
Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants …
b6343
ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695) * ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops This commit adds support for the TRANSPOSE and RESHAPE operations in the ggml webgpu backend. Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
b6340
CANN: Optimize MUL_MAT_ID (#15658)
b6337
server : enable /slots by default and make it secure (#15630) * server : enable /slots by default and make it secure ggml-ci * server : fix tests to pass `--no-slots` when necessary * server : extend /props with info about enabled endpoints
b6335
llama : fix fattn reserve call n_seqs parameter (#15699) ggml-ci
b6334
llama : separate compute buffer reserve from fattn check (#15696) Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.