Skip to content

Releases: ggml-org/llama.cpp

b6183

16 Aug 21:48
65349f2
Compare
Choose a tag to compare
model : support vision LiquidAI LFM2-VL family (#15347)

* wip lfm2 vision model

* Fix conv weight

* Implement dynamic resolution

* Fix cuda

* support LFM2-VL-450M

* happy CI

* Remove extra `ggml_conv` and put others into the right place

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

b6182

16 Aug 17:20
1fe0029
Compare
Choose a tag to compare
vulkan: fuse adds (#15252)

* vulkan: fuse adds

Fuse adds that have the same shape, which are common in MoE models.
It will currently fuse up to 6 adds, because we assume no more than
8 descriptors per dispatch. But this could be changed.

* check runtimeDescriptorArray feature

* disable multi_add for Intel due to likely driver bug

b6181

16 Aug 09:32
de21927
Compare
Choose a tag to compare
vulkan: Support mul_mat_id with f32 accumulators (#15337)

* vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id

* vulkan: Support mul_mat_id with f32 accumulators, but they are not hooked up

- There's no explicit way to request f32 precision for mul_mat_id, but there
probably should be, and this gets the code in place for that.
- A couple fixes to check_results.
- Remove casts to fp16 in coopmat1 FA shader (found by inspection).

b6180

16 Aug 09:31
2e2b22b
Compare
Choose a tag to compare
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#1…

b6179

16 Aug 08:29
912ff8c
Compare
Choose a tag to compare
OpenCL: add initial FA support (#14987)

* add F16/F16 fa support

* fix kernel init

* use mad instead of fma

* use inline function

* mark FA with sinks as unsupported for now

* add pragma unroll to loops

b6178

15 Aug 18:47
5e6229a
Compare
Choose a tag to compare
common : fix double bos, use common_chat_templates for add_bos and ad…

b6177

15 Aug 17:17
e2c1bff
Compare
Choose a tag to compare
opencl: add initial mxfp4 support via mv (#15270)

* opencl: add reference `mul_mv_mxfp4_f32`

* opencl: add reference `mul_mv_id` for mxfp4

* Q4_0 tranpose fix for Adreno

---------

Co-authored-by: shawngu-quic <[email protected]>

b6176

15 Aug 14:43
5edf159
Compare
Choose a tag to compare
vulkan : fix out-of-bounds access in argmax kernel (#15342)

ggml-ci

b6175

15 Aug 14:05
db3010b
Compare
Choose a tag to compare
vulkan : fix compile warnings on macos (#15340)

ggml-ci

b6174

15 Aug 13:45
ff27f80
Compare
Choose a tag to compare
ggml: initial IBM zDNN backend (#14975)

* ggml-zdnn: inital backend impl

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: temp change z17 to arch15

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix build bugs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: tensor->extra logging check

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add layout name mapping, ztensor information

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: separate logging into its own line

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add shape comparison

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add ggml_tensor shape log

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix incorrect shape logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add output buffer check

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: run compute and store into tensor->extra

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add set_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more loggers

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update set_tensor logging to check only for matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: last working matmul version

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add comments to prevent accidentally deleting lines

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: support op out_prod

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update op out_prod to use tensor->extra

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rewrite the backend implementation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bugfix new impl

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix compiler warnings and bugfixes

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: test ztensor finding in init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: implement at least 1 op to test

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: assign tensor->extra to buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add check for view tensors to prevent init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rework init_tensor to create new buffers

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to std vector instead of array

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch buffers back and set to arbitrary number

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: impl init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update supports_op matmul matrix

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: impl matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix compiler error missing type

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing data transform call

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: tighten memory usage, change string allocation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias ztensor and data free

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias data transform

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more debug info for extra buffer transform

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add logger to check if mat mul ops go through set_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: activate bias transform in matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move weights transform into mulmat

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more safeguards in matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix sequencing of transforms

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bugfix transform ztensor vs origtensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: figure out why sigtrap is happening

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix sigsegv

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move everything back to local declaration

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move bias data to local also

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bring back working matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rewrite into mre

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing vector import

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing vector import in header

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt to fix sigsegv

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing load tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix invalid ztensor buffer release

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add logging to debug free buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: remove free_buffer debug info

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add parmblkformat detections

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add nnpa installed detection

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add zdnn_init call for static libs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at fixing invalid buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to using deque to fix pointer deref problem

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add weights logging to check

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt to use unique ptr

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add tensor to pre_tfm_desc logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add inputs logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable op_none initialisation for testing

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing return from init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: load ztensors in cgraph exec

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: work on moving output ztensor as well

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable logging and breakpoints for full test

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at manually changing the layout

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at using default nwhc format instead

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable global load ztensor for now

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix errorenous output load tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add guards to prevent loading ztensor if transformed

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bring load ztensor back to init routine

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix ztensor deallocation abort

stabilise ggml <-> zdnn api

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: clean up matmul selection

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: clean up project structure

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update documentation, prepare for upstream

Signed-off-by: Aaron Teo <[email protected]>

* chore: add codeowners

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable batched matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at fixing tensor views during matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: deny all view tensors directly

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix pr comments

Signed-off-by: Aaron Teo <[email protected]>

* docs: update ops docs for zdnn

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: redo test-backend-ops for ops.md

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix typo in build-s390x.md

Signed-off-by: Aaron Teo <[email protected]>

* codeowners: remove taronaeo for now

Signed-off-by: Aaron Teo <[email protected]>

* Revert "codeowners: remove taronaeo for now"

This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f.

* ggml-zdnn: remove unused ggml_zdnn macro

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>