Skip to content

Releases: ngxson/llama.cpp

b6287

26 Aug 14:59
8f5afa9
Compare
Choose a tag to compare
CUDA: return -1 for nonexistent compiled arch (#15587)

b6286

26 Aug 11:39
b3964c1
Compare
Choose a tag to compare
metal : optimize FA vec for large sequences and BS <= 8 (#15566)

* metal : optmize FA vec for large heads and sequences

* metal : adjust small-batch mul mv kernels

ggml-ci

* batched-bench : fix total speed computation

ggml-ci

* cont : add comments

ggml-ci

b6284

26 Aug 10:03
85cc1ae
Compare
Choose a tag to compare
context : print graph stats for memory-less contexts (#15586)

ggml-ci

b6282

26 Aug 08:22
c4e9239
Compare
Choose a tag to compare
model : support MiniCPM-V 4.5 (#15575)

b6280

26 Aug 07:15
0fd90db
Compare
Choose a tag to compare
metal : remove contiguous assertion for src0 in IM2COL (#15577)

* remove contiguous assertion for src0 in IM2COL

* add contiguous check in supports_op

b6279

26 Aug 06:37
4c37636
Compare
Choose a tag to compare
Add a warning for special devices (#15563)

* Add warning

* Print the devices names

* Add newlines

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <[email protected]>

* Fix vector names

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b6278

26 Aug 05:03
34bdbbd
Compare
Choose a tag to compare
vulkan: Remove splitting for mul_mat_id (#15568)

row_ids only needs to hold the BN rows for the current tile.

b6277

25 Aug 22:14
74f52f7
Compare
Choose a tag to compare
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)

* CUDA: optimize get_int_from_table_16

* CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs

* revise documentation

---------

Co-authored-by: xix <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>

b6276

25 Aug 21:38
f7207b0
Compare
Choose a tag to compare
opencl: fix support ops condition for `rms_norm` (#15560)

b6275

25 Aug 16:50
4d917cd
Compare
Choose a tag to compare
vulkan: fix min subgroup 16 condition for mmid subgroup optimization …