Releases · stevenkuang-tencent/llama.cpp

06 Aug 08:05

2241453

b6098 Latest

Latest

CANN: add support for ACL Graph (#15065)

* feat(cann): add optional support for ACL Graph execution

This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:

    -DUSE_CANN_GRAPH=ON

By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.

Key additions:
- CMake option  to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
  is unset or invalid

This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.

Signed-off-by: noemotiovon <[email protected]>

* Fix review comments

Signed-off-by: noemotiovon <[email protected]>

* remane USE_CANN_GRAPH to USE_ACL_GRAPH

Signed-off-by: noemotiovon <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-06T08:05:09Z
llama-b6098-bin-macos-arm64.zip

sha256:fbb29587573e245b0e9190ce355a087ce818d3c57ac5eabec8973d15e6665ccd

10.7 MB 2025-08-06T08:05:17Z
llama-b6098-bin-macos-x64.zip

sha256:33e98ba6f8bceed75a77ee7baf56f2db7ba9bce67a456e12701f16e37523e12b

27.4 MB 2025-08-06T08:05:17Z
llama-b6098-bin-ubuntu-vulkan-x64.zip

sha256:f4c062be19c734342118da975fe93bff6b8a59b94455bd3adf770eabcadfc5e8

21.4 MB 2025-08-06T08:05:19Z
llama-b6098-bin-ubuntu-x64.zip

sha256:84cf8e29e27ee7c2958857e87604cc41cf79be1f2b80f1d99f8a342686420bd6

12.7 MB 2025-08-06T08:05:20Z
llama-b6098-bin-win-cpu-arm64.zip

sha256:a58837f0df6c52f825f53c41c0bb962c648e1e8007f5cbf9ade2a4c941720a41

10.9 MB 2025-08-06T08:05:21Z
llama-b6098-bin-win-cpu-x64.zip

sha256:273b0a7c3e34289557fd293d1fdfc07c3bdd4fe813df4353588b4ee476c4fb06

13.8 MB 2025-08-06T08:05:22Z
llama-b6098-bin-win-cuda-12.4-x64.zip

sha256:c9d0ebca47b9a9db18b873db84681cc41451cadfc2cbe22ca406d30a97b37fbb

135 MB 2025-08-06T08:05:23Z
llama-b6098-bin-win-hip-radeon-x64.zip

sha256:f8442b588260e23043c598856464ee07073ab44c68af6e69f05fdead6cbe5190

286 MB 2025-08-06T08:05:27Z
llama-b6098-bin-win-opencl-adreno-arm64.zip

sha256:2a13bfeb22db77174007167b42cec4b736c53596f42b3a8856c3212e8c333273

11.3 MB 2025-08-06T08:05:37Z
Source code (zip)

2025-08-06T06:12:42Z
Source code (tar.gz)

2025-08-06T06:12:42Z

25 Jul 11:31

github-actions

b5988

749e0d2

b5988

mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503)

* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip

* Update export-lora.cpp

* Update clip.cpp

* Update export-lora.cpp

* format: use space to replace tab

Assets 15

24 Jul 10:55

github-actions

b5977

39cffdf

b5977

docs: add libcurl-dev install hint for Linux distros (#14801)

* docs: add libcurl-dev install hint for Linux distros

Signed-off-by: PouyaGhahramanian <[email protected]>

* Update docs/build.md

---------

Signed-off-by: PouyaGhahramanian <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>

Assets 15

21 Jul 16:23

github-actions

b5952

9220426

b5952

kleidiai: add support for get_rows (#14676)

* kleidiai: add support for get_rows

* apply fixes based on code review

* apply more fixes based on code review

Assets 15

18 Jul 06:32

github-actions

b5929

8f974bc

b5929

graph : refactor context to not pass gf explicitly (#14629)

ggml-ci

Assets 15

14 Jul 15:18

github-actions

b5896

55c509d

b5896

ggml : refactor llamafile_sgemm PPC code (#14673)

Remove un-necessary templates from class definition and packing functions
Reduce deeply nested conditionals, if-else switching in mnapck function
Replace repetitive code with inline functions in Packing functions

2 ~ 7% improvement in Q8 Model
15 ~ 50% improvement in Q4 Model

Signed-off-by: Shalini Salomi Bodapati <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: stevenkuang-tencent/llama.cpp

b6098

Uh oh!

b5988

Uh oh!

b5977

Uh oh!

b5952

Uh oh!

b5929

Uh oh!

b5896

Uh oh!