Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6177
opencl: add initial mxfp4 support via mv (#15270) * opencl: add reference `mul_mv_mxfp4_f32` * opencl: add reference `mul_mv_id` for mxfp4 * Q4_0 tranpose fix for Adreno --------- Co-authored-by: shawngu-quic <[email protected]>
b6176
vulkan : fix out-of-bounds access in argmax kernel (#15342) ggml-ci
b6175
vulkan : fix compile warnings on macos (#15340) ggml-ci
b6174
ggml: initial IBM zDNN backend (#14975) * ggml-zdnn: inital backend impl Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: temp change z17 to arch15 Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: fix build bugs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: tensor->extra logging check Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add layout name mapping, ztensor information Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: separate logging into its own line Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add shape comparison Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: add ggml_tensor shape log Signed-off-by: Aaron Teo <[email protected]> ggml-zdnn: fix incorrect shape logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add output buffer check Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: run compute and store into tensor->extra Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add set_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more loggers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update set_tensor logging to check only for matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: last working matmul version Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add comments to prevent accidentally deleting lines Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: support op out_prod Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update op out_prod to use tensor->extra Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rewrite the backend implementation Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bugfix new impl Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix compiler warnings and bugfixes Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: test ztensor finding in init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: implement at least 1 op to test Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: assign tensor->extra to buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add check for view tensors to prevent init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rework init_tensor to create new buffers Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to std vector instead of array Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch buffers back and set to arbitrary number Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: impl init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update supports_op matmul matrix Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix incorrect ztensor shape, reduce memory padding Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: impl matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix compiler error missing type Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing data transform call Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: tighten memory usage, change string allocation Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias ztensor and data free Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add bias data transform Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more debug info for extra buffer transform Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add logger to check if mat mul ops go through set_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: activate bias transform in matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move weights transform into mulmat Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add more safeguards in matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix sequencing of transforms Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bugfix transform ztensor vs origtensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: figure out why sigtrap is happening Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix sigsegv Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move everything back to local declaration Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: move bias data to local also Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bring back working matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: rewrite into mre Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing vector import Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing vector import in header Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to fix sigsegv Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing load tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix invalid ztensor buffer release Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <[email protected]> * chore: add codeowners Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: disable batched matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: attempt at fixing tensor views during matmul Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: deny all view tensors directly Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix pr comments Signed-off-by: Aaron Teo <[email protected]> * docs: update ops docs for zdnn Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: redo test-backend-ops for ops.md Signed-off-by: Aaron Teo <[email protected]> * ggml-zdnn: fix typo in build-s390x.md Signed-off-by: Aaron Teo <[email protected]> * codeowners: remove taronaeo for now Signed-off-by: Aaron Teo <[email protected]> * Revert "codeowners: remove taronaeo for now" This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f. * ggml-zdnn: remove unused ggml_zdnn macro Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
b6173
ci : fix ios-xcode-build (#15324) * fix ios-xcode-build * use xcode-select with fixed version * switch to macos-15 to get xcode 16.4
b6153
perplexity: give more information about constraints on failure (#15303) * perplexity: give more information about constraints on failure This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each. * log formatting * log error and return instead of storing max_seq_exceeded int * check if s0 is zero for -np check
b6152
HIP: bump requirement to rocm 6.1 (#15296)
b6150
server : enable -td and -tbd parameters (#15172)
b6149
ggml : update `ggml_rope_multi` (#12665) * update `rope_multi`: 1. add `ggml_rope_multi_inplace`; 1. use `GGML_MROPE_SECTIONS` instead of 4. * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6148
common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-mo…