Skip to content

Releases: ngxson/llama.cpp

b6278

26 Aug 05:03
34bdbbd
Compare
Choose a tag to compare
vulkan: Remove splitting for mul_mat_id (#15568)

row_ids only needs to hold the BN rows for the current tile.

b6277

25 Aug 22:14
74f52f7
Compare
Choose a tag to compare
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)

* CUDA: optimize get_int_from_table_16

* CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs

* revise documentation

---------

Co-authored-by: xix <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>

b6276

25 Aug 21:38
f7207b0
Compare
Choose a tag to compare
opencl: fix support ops condition for `rms_norm` (#15560)

b6275

25 Aug 16:50
4d917cd
Compare
Choose a tag to compare
vulkan: fix min subgroup 16 condition for mmid subgroup optimization …

b6269

25 Aug 11:25
6b64f74
Compare
Choose a tag to compare
batched-bench : fix unified KV cache handling + pp timing (#15562)

* batched-bench : fix unified KV cache handling + pp timing

* cont : run dummy token only with split KV cache

b6267

25 Aug 07:43
b0ba31f
Compare
Choose a tag to compare
metal : add FA kernels for HS=40 (#15559)

ggml-ci

b6265

25 Aug 02:49
c247d06
Compare
Choose a tag to compare
CANN: ROPE cache sin/cos repeat (#15501)

Signed-off-by: noemotiovon <[email protected]>

b6264

24 Aug 17:57
043fb27
Compare
Choose a tag to compare
vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices…

b6262

24 Aug 09:47
c9a24fb
Compare
Choose a tag to compare
vulkan: Support FA with any multiple of 8 head sizes (#15537)

The scalar FA shader already handled multiples of 8. The coopmat1 FA
shader assumed 16x16x16 and the shared memory allocations need the HSK
dimensions padded to a multiple of 16. NVIDIA's coopmat2 implementation
requires multiples of 16 for N and K, and needs the matrix dimensions
padded and loads clamped.

Store the FA pipelines in a map, indexed by the pipeline state.

b6261

24 Aug 09:06
a9c6ffc
Compare
Choose a tag to compare
vulkan: enable Conv2D for Apple after MoltenVK fixed the bug (#15526)