Skip to content

Releases: bssrdf/llama.cpp

b6445

11 Sep 01:40
00681df

Choose a tag to compare

CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#…

b6419

08 Sep 16:11
8802156

Choose a tag to compare

chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style…

b6356

02 Sep 14:00
9961d24

Choose a tag to compare

CANN: Resolve soft_max precision issue (#15730)

Previously, the slope tensor was set to fp16 to improve efficiency.
While this worked correctly in FA, it caused precision issues in soft_max.
This change applies different data types for different operators
to balance both accuracy and performance.

b6351

02 Sep 01:52
5d804a4

Choose a tag to compare

ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722)

b6286

26 Aug 13:08
b3964c1

Choose a tag to compare

metal : optimize FA vec for large sequences and BS <= 8 (#15566)

* metal : optmize FA vec for large heads and sequences

* metal : adjust small-batch mul mv kernels

ggml-ci

* batched-bench : fix total speed computation

ggml-ci

* cont : add comments

ggml-ci

b4681

10 Feb 14:00
d7b31a9

Choose a tag to compare

sync: minja (https://github.com/google/minja/commit/a72057e5190de2c61…

b4524

22 Jan 03:48
6171c9d

Choose a tag to compare

Add Jinja template support (#11016)

* Copy minja from https://github.com/google/minja/commit/58f0ca6dd74bcbfbd4e71229736640322b31c7f9

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to https://github.com/google/minja/commit/b8437df626ac6cd0ce3b333b3c74ed1129c19f25

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b2703

21 Apr 12:58
89b0bf0

Choose a tag to compare

llava : use logger in llava-cli (#6797)

This change removes printf() logging so llava-cli is shell scriptable.

b2699

20 Apr 01:21
0e4802b

Choose a tag to compare

ci: add ubuntu latest release and fix missing build number (mac & ubu…

b2251

23 Feb 23:23
fd43d66

Choose a tag to compare

server : add KV cache quantization options (#5684)