Skip to content

Releases: ggml-org/llama.cpp

b6109

07 Aug 10:00
1d72c84
Compare
Choose a tag to compare
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131)

* CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16

b6106

06 Aug 22:40
5fd160b
Compare
Choose a tag to compare
ggml: Add basic SET_ROWS support in WebGPU (#15137)

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

b6105

06 Aug 21:33
756cfea
Compare
Choose a tag to compare
fix profiling crash (#15072)

b6104

06 Aug 19:39
e725a1a
Compare
Choose a tag to compare
opencl: add `swiglu_oai` and  `add_id` (#15121)

* opencl: add `swiglu-oai`

* opencl: add `add_id`

* opencl: add missing `add_id.cl`

b6103

06 Aug 18:44
3db4da5
Compare
Choose a tag to compare
chat : support Granite model reasoning and tool call (#14864)

b6102

06 Aug 17:02
476aa3f
Compare
Choose a tag to compare
Fixed name `-override-tensors` to `-override-tensor` (#15129)

b6101

06 Aug 12:56
0d88315
Compare
Choose a tag to compare
ggml : fix fallback to CPU for ununsupported ops (#15118)

b6100

06 Aug 11:46
65c797c
Compare
Choose a tag to compare
chat : fix yandex chat template (#15116)

b6099

06 Aug 10:10
2572689
Compare
Choose a tag to compare
chat : fix hunyuan auto-detection (#15114)

Signed-off-by: stevenkuang <[email protected]>

b6098

06 Aug 06:32
2241453
Compare
Choose a tag to compare
CANN: add support for ACL Graph (#15065)

* feat(cann): add optional support for ACL Graph execution

This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:

    -DUSE_CANN_GRAPH=ON

By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.

Key additions:
- CMake option  to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
  is unset or invalid

This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.

Signed-off-by: noemotiovon <[email protected]>

* Fix review comments

Signed-off-by: noemotiovon <[email protected]>

* remane USE_CANN_GRAPH to USE_ACL_GRAPH

Signed-off-by: noemotiovon <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>