Releases · ggml-org/llama.cpp

07 Aug 10:00

1d72c84

b6109

CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131)

* CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16

Assets 15

06 Aug 22:40

github-actions

b6106

5fd160b

b6106

ggml: Add basic SET_ROWS support in WebGPU (#15137)

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

Assets 15

06 Aug 21:33

github-actions

b6105

756cfea

b6105

fix profiling crash (#15072)

Assets 15

06 Aug 19:39

github-actions

b6104

e725a1a

b6104

opencl: add `swiglu_oai` and  `add_id` (#15121)

* opencl: add `swiglu-oai`

* opencl: add `add_id`

* opencl: add missing `add_id.cl`

Assets 15

06 Aug 18:44

github-actions

b6103

3db4da5

b6103

chat : support Granite model reasoning and tool call (#14864)

Assets 15

06 Aug 17:02

github-actions

b6102

476aa3f

b6102

Fixed name `-override-tensors` to `-override-tensor` (#15129)

Assets 15

06 Aug 12:56

github-actions

b6101

0d88315

b6101

ggml : fix fallback to CPU for ununsupported ops (#15118)

Assets 15

06 Aug 11:46

github-actions

b6100

65c797c

b6100

chat : fix yandex chat template (#15116)

Assets 15

06 Aug 10:10

github-actions

b6099

2572689

b6099

chat : fix hunyuan auto-detection (#15114)

Signed-off-by: stevenkuang <[email protected]>

Assets 15

06 Aug 06:32

github-actions

b6098

2241453

b6098

CANN: add support for ACL Graph (#15065)

* feat(cann): add optional support for ACL Graph execution

This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:

    -DUSE_CANN_GRAPH=ON

By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.

Key additions:
- CMake option  to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
  is unset or invalid

This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.

Signed-off-by: noemotiovon <[email protected]>

* Fix review comments

Signed-off-by: noemotiovon <[email protected]>

* remane USE_CANN_GRAPH to USE_ACL_GRAPH

Signed-off-by: noemotiovon <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6109

Uh oh!

b6106

Uh oh!

b6105

Uh oh!

b6104

Uh oh!

b6103

Uh oh!

b6102

Uh oh!

b6101

Uh oh!

b6100

Uh oh!

b6099

Uh oh!

b6098

Uh oh!