Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b6287
CUDA: return -1 for nonexistent compiled arch (#15587)
b6286
metal : optimize FA vec for large sequences and BS <= 8 (#15566) * metal : optmize FA vec for large heads and sequences * metal : adjust small-batch mul mv kernels ggml-ci * batched-bench : fix total speed computation ggml-ci * cont : add comments ggml-ci
b6284
context : print graph stats for memory-less contexts (#15586) ggml-ci
b6282
model : support MiniCPM-V 4.5 (#15575)
b6280
metal : remove contiguous assertion for src0 in IM2COL (#15577) * remove contiguous assertion for src0 in IM2COL * add contiguous check in supports_op
b6279
Add a warning for special devices (#15563) * Add warning * Print the devices names * Add newlines * Apply suggestions from code review Co-authored-by: Johannes Gäßler <[email protected]> * Fix vector names --------- Co-authored-by: Johannes Gäßler <[email protected]>
b6278
vulkan: Remove splitting for mul_mat_id (#15568) row_ids only needs to hold the BN rows for the current tile.
b6277
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451) * CUDA: optimize get_int_from_table_16 * CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs * revise documentation --------- Co-authored-by: xix <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]>
b6276
opencl: fix support ops condition for `rms_norm` (#15560)
b6275
vulkan: fix min subgroup 16 condition for mmid subgroup optimization …