Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b6153
perplexity: give more information about constraints on failure (#15303) * perplexity: give more information about constraints on failure This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each. * log formatting * log error and return instead of storing max_seq_exceeded int * check if s0 is zero for -np check
b6152
HIP: bump requirement to rocm 6.1 (#15296)
b6150
server : enable -td and -tbd parameters (#15172)
b6149
ggml : update `ggml_rope_multi` (#12665) * update `rope_multi`: 1. add `ggml_rope_multi_inplace`; 1. use `GGML_MROPE_SECTIONS` instead of 4. * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
b6148
common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-mo…
b6144
ggml : repack block_iq4_nlx8 (#14904) ggml-ci
b6143
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf impro…
b6141
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …
b6140
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…
b6139
sycl: Fix and disable more configurations of mul_mat (#15151) * sycl: Fix and disable more configurations of mul_mat * Disable more configurations