Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6183
model : support vision LiquidAI LFM2-VL family (#15347) * wip lfm2 vision model * Fix conv weight * Implement dynamic resolution * Fix cuda * support LFM2-VL-450M * happy CI * Remove extra `ggml_conv` and put others into the right place Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
b6182
vulkan: fuse adds (#15252) * vulkan: fuse adds Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed. * check runtimeDescriptorArray feature * disable multi_add for Intel due to likely driver bug
b6181
vulkan: Support mul_mat_id with f32 accumulators (#15337) * vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id * vulkan: Support mul_mat_id with f32 accumulators, but they are not hooked up - There's no explicit way to request f32 precision for mul_mat_id, but there probably should be, and this gets the code in place for that. - A couple fixes to check_results. - Remove casts to fp16 in coopmat1 FA shader (found by inspection).
b6180
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#1…
b6179
OpenCL: add initial FA support (#14987) * add F16/F16 fa support * fix kernel init * use mad instead of fma * use inline function * mark FA with sinks as unsupported for now * add pragma unroll to loops
b6178
common : fix double bos, use common_chat_templates for add_bos and ad…
b6177
opencl: add initial mxfp4 support via mv (#15270) * opencl: add reference `mul_mv_mxfp4_f32` * opencl: add reference `mul_mv_id` for mxfp4 * Q4_0 tranpose fix for Adreno --------- Co-authored-by: shawngu-quic <[email protected]>
b6176
vulkan : fix out-of-bounds access in argmax kernel (#15342) ggml-ci
b6175
vulkan : fix compile warnings on macos (#15340) ggml-ci
b6174
ggml: initial IBM zDNN backend (#14975) * ggml-zdnn: inital backend impl Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: temp change z17 to arch15 Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: fix build bugs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: tensor->extra logging check Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add layout name mapping, ztensor information Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: separate logging into its own line Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add shape comparison Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add ggml_tensor shape log Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: fix incorrect shape logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add output buffer check Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: run compute and store into tensor->extra Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add set_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more loggers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update set_tensor logging to check only for matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: last working matmul version Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add comments to prevent accidentally deleting lines Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: support op out_prod Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update op out_prod to use tensor->extra Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rewrite the backend implementation Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bugfix new impl Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix compiler warnings and bugfixes Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: test ztensor finding in init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: implement at least 1 op to test Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: assign tensor->extra to buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add check for view tensors to prevent init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rework init_tensor to create new buffers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to std vector instead of array Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch buffers back and set to arbitrary number Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: impl init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update supports_op matmul matrix Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix incorrect ztensor shape, reduce memory padding Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: impl matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix compiler error missing type Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing data transform call Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: tighten memory usage, change string allocation Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias ztensor and data free Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias data transform Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more debug info for extra buffer transform Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add logger to check if mat mul ops go through set_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: activate bias transform in matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move weights transform into mulmat Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more safeguards in matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix sequencing of transforms Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bugfix transform ztensor vs origtensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: figure out why sigtrap is happening Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix sigsegv Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move everything back to local declaration Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move bias data to local also Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bring back working matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rewrite into mre Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing vector import Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing vector import in header Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to fix sigsegv Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing load tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix invalid ztensor buffer release Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <[email protected]> * chore: add codeowners Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable batched matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at fixing tensor views during matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: deny all view tensors directly Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix pr comments Signed-off-by: Aaron Teo <[email protected]> * docs: update ops docs for zdnn Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: redo test-backend-ops for ops.md Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix typo in build-s390x.md Signed-off-by: Aaron Teo <[email protected]> * codeowners: remove taronaeo for now Signed-off-by: Aaron Teo <[email protected]> * Revert "codeowners: remove taronaeo for now" This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f. * ggml-zdnn: remove unused ggml_zdnn macro Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>