Releases · ggml-org/llama.cpp

16 Aug 21:48

65349f2

b6183

model : support vision LiquidAI LFM2-VL family (#15347)

* wip lfm2 vision model

* Fix conv weight

* Implement dynamic resolution

* Fix cuda

* support LFM2-VL-450M

* happy CI

* Remove extra `ggml_conv` and put others into the right place

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

16 Aug 17:20

github-actions

b6182

1fe0029

b6182

vulkan: fuse adds (#15252)

* vulkan: fuse adds

Fuse adds that have the same shape, which are common in MoE models.
It will currently fuse up to 6 adds, because we assume no more than
8 descriptors per dispatch. But this could be changed.

* check runtimeDescriptorArray feature

* disable multi_add for Intel due to likely driver bug

Assets 15

16 Aug 09:32

github-actions

b6181

de21927

b6181

vulkan: Support mul_mat_id with f32 accumulators (#15337)

* vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id

* vulkan: Support mul_mat_id with f32 accumulators, but they are not hooked up

- There's no explicit way to request f32 precision for mul_mat_id, but there
probably should be, and this gets the code in place for that.
- A couple fixes to check_results.
- Remove casts to fp16 in coopmat1 FA shader (found by inspection).

Assets 15

16 Aug 09:31

github-actions

b6180

2e2b22b

b6180

vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#1…

Assets 15

16 Aug 08:29

github-actions

b6179

912ff8c

b6179

OpenCL: add initial FA support (#14987)

* add F16/F16 fa support

* fix kernel init

* use mad instead of fma

* use inline function

* mark FA with sinks as unsupported for now

* add pragma unroll to loops

Assets 15

15 Aug 18:47

github-actions

b6178

5e6229a

b6178

common : fix double bos, use common_chat_templates for add_bos and ad…

Assets 15

15 Aug 17:17

github-actions

b6177

e2c1bff

b6177

opencl: add initial mxfp4 support via mv (#15270)

* opencl: add reference `mul_mv_mxfp4_f32`

* opencl: add reference `mul_mv_id` for mxfp4

* Q4_0 tranpose fix for Adreno

---------

Co-authored-by: shawngu-quic <[email protected]>

Assets 15

15 Aug 14:43

github-actions

b6176

5edf159

b6176

vulkan : fix out-of-bounds access in argmax kernel (#15342)

ggml-ci

Assets 15

15 Aug 14:05

github-actions

b6175

db3010b

b6175

vulkan : fix compile warnings on macos (#15340)

ggml-ci

Assets 15

15 Aug 13:45

github-actions

b6174

ff27f80

b6174

ggml: initial IBM zDNN backend (#14975)

* ggml-zdnn: inital backend impl

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: temp change z17 to arch15

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix build bugs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: tensor->extra logging check

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add layout name mapping, ztensor information

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: separate logging into its own line

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add shape comparison

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add ggml_tensor shape log

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix incorrect shape logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add output buffer check

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: run compute and store into tensor->extra

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add set_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more loggers

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update set_tensor logging to check only for matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: last working matmul version

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add comments to prevent accidentally deleting lines

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: support op out_prod

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update op out_prod to use tensor->extra

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rewrite the backend implementation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bugfix new impl

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix compiler warnings and bugfixes

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: test ztensor finding in init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: implement at least 1 op to test

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: assign tensor->extra to buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add check for view tensors to prevent init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rework init_tensor to create new buffers

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to std vector instead of array

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch buffers back and set to arbitrary number

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: impl init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update supports_op matmul matrix

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: impl matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix compiler error missing type

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing data transform call

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: tighten memory usage, change string allocation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias ztensor and data free

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias data transform

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more debug info for extra buffer transform

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add logger to check if mat mul ops go through set_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: activate bias transform in matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move weights transform into mulmat

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more safeguards in matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix sequencing of transforms

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bugfix transform ztensor vs origtensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: figure out why sigtrap is happening

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix sigsegv

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move everything back to local declaration

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move bias data to local also

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bring back working matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rewrite into mre

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing vector import

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing vector import in header

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt to fix sigsegv

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing load tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix invalid ztensor buffer release

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add logging to debug free buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: remove free_buffer debug info

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add parmblkformat detections

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add nnpa installed detection

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add zdnn_init call for static libs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at fixing invalid buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to using deque to fix pointer deref problem

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add weights logging to check

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt to use unique ptr

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add tensor to pre_tfm_desc logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add inputs logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable op_none initialisation for testing

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing return from init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: load ztensors in cgraph exec

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: work on moving output ztensor as well

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable logging and breakpoints for full test

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at manually changing the layout

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at using default nwhc format instead

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable global load ztensor for now

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix errorenous output load tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add guards to prevent loading ztensor if transformed

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bring load ztensor back to init routine

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix ztensor deallocation abort

stabilise ggml <-> zdnn api

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: clean up matmul selection

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: clean up project structure

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update documentation, prepare for upstream

Signed-off-by: Aaron Teo <[email protected]>

* chore: add codeowners

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable batched matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at fixing tensor views during matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: deny all view tensors directly

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix pr comments

Signed-off-by: Aaron Teo <[email protected]>

* docs: update ops docs for zdnn

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: redo test-backend-ops for ops.md

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix typo in build-s390x.md

Signed-off-by: Aaron Teo <[email protected]>

* codeowners: remove taronaeo for now

Signed-off-by: Aaron Teo <[email protected]>

* Revert "codeowners: remove taronaeo for now"

This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f.

* ggml-zdnn: remove unused ggml_zdnn macro

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6183

Uh oh!

b6182

Uh oh!

b6181

Uh oh!

b6180

Uh oh!

b6179

Uh oh!

b6178

Uh oh!

b6177

Uh oh!

b6176

Uh oh!

b6175

Uh oh!

b6174

Uh oh!