Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6109
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (#15131) * CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16
b6106
ggml: Add basic SET_ROWS support in WebGPU (#15137) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments
b6105
fix profiling crash (#15072)
b6104
opencl: add `swiglu_oai` and `add_id` (#15121) * opencl: add `swiglu-oai` * opencl: add `add_id` * opencl: add missing `add_id.cl`
b6103
chat : support Granite model reasoning and tool call (#14864)
b6102
Fixed name `-override-tensors` to `-override-tensor` (#15129)
b6101
ggml : fix fallback to CPU for ununsupported ops (#15118)
b6100
chat : fix yandex chat template (#15116)
b6099
chat : fix hunyuan auto-detection (#15114) Signed-off-by: stevenkuang <[email protected]>
b6098
CANN: add support for ACL Graph (#15065) * feat(cann): add optional support for ACL Graph execution This commit adds support for executing ggml computational graphs using Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be enabled at compile time using the CMake option: -DUSE_CANN_GRAPH=ON By default, ACL graph execution is **disabled**, and the fallback path uses node-by-node execution. Key additions: - CMake option to toggle graph mode - Graph capture and execution logic using - Tensor property matching to determine whether graph update is required - Safe fallback and logging if the environment variable LLAMA_SET_ROWS is unset or invalid This prepares the backend for performance improvements in repetitive graph execution scenarios on Ascend devices. Signed-off-by: noemotiovon <[email protected]> * Fix review comments Signed-off-by: noemotiovon <[email protected]> * remane USE_CANN_GRAPH to USE_ACL_GRAPH Signed-off-by: noemotiovon <[email protected]> * fix typo Signed-off-by: noemotiovon <[email protected]> --------- Signed-off-by: noemotiovon <[email protected]>