Skip to content

Releases: ngxson/llama.cpp

b4893

15 Mar 17:07
f4c3dd5
Compare
Choose a tag to compare
llama-tts : add '-o' option (#12398)

* added -o option to specify an output file name

* llama-tts returns ENOENT in case of file write error

note : PR #12042 is closed as superseded with this one.

b4892

15 Mar 15:36
3d35d87
Compare
Choose a tag to compare
SYCL: Delete redundant plus sign and space (#12391)

b4891

15 Mar 15:08
b19bd06
Compare
Choose a tag to compare
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (…

b4890

15 Mar 02:21
92a3913
Compare
Choose a tag to compare
[CANN]MUL_MAT optimization (#12382)

b4889

14 Mar 17:24
9f2250b
Compare
Choose a tag to compare
Add CLI arg to llama-run to adjust the number of threads used (#12370)

We default to 4, sometimes we want to manually adjust this

Signed-off-by: Eric Curtin <[email protected]>

b4888

14 Mar 16:40
774973b
Compare
Choose a tag to compare
main : add -sysf / --system-prompt-file (#12249) (#12250)

* add system_prompt_file

* add -sysf / --system-prompt-file

* remove system_prompt_file

b4887

14 Mar 13:35
8fcb563
Compare
Choose a tag to compare
Load all MoE experts during warmup (#11571)

* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup

* common : use new API to enable warmup mode during model warmup

---------

Co-authored-by: Stanisław Szymczyk <[email protected]>

b4886

14 Mar 11:17
add2a3a
Compare
Choose a tag to compare
server: fix "--grammar-file" parameter (#12285)

b4885

14 Mar 09:35
c522ce4
Compare
Choose a tag to compare
graph : simplify attn input build for unified KV cache (#12381)

ggml-ci

b4884

14 Mar 07:53
081bee8
Compare
Choose a tag to compare
hparams : add SWA rope parameters (#12374)

ggml-ci