Skip to content

Releases: ggml-org/llama.cpp

b5010

31 Mar 13:25
a8a1f33
Compare
Choose a tag to compare
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135)

* Vulkan: Add DP4A MMQ and Q8_1 quantization shader

* Add q4_0 x q8_1 matrix matrix multiplication support

* Vulkan: Add int8 coopmat MMQ support

* Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code

* Add GL_EXT_integer_dot_product check

* Remove ggml changes, fix mmq pipeline picker

* Remove ggml changes, restore Intel coopmat behaviour

* Fix glsl compile attempt when integer vec dot is not supported

* Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq

* Remove redundant comment

* Fix integer dot check

* Fix compile issue with unsupported int dot glslc

* Update Windows build Vulkan SDK version

b5009

31 Mar 13:01
Compare
Choose a tag to compare
cmake : fix whitespace (#0)

b5006

31 Mar 11:36
1a85949
Compare
Choose a tag to compare
llava : proper description fix (#12668)

b5005

31 Mar 11:08
6c02a03
Compare
Choose a tag to compare
SYCL: Remove misleading ggml_sycl_op_flatten function (#12387)

* SYCL: Remove misleading ggml_sycl_op_flatten function

* remove trailing whitespace

* Fix L2 norm from rebase

* remove try catch block from element_wise.cpp

* remove comment from common.hp

* ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward

* norm.cpp: remove try catch exception block

b5004

31 Mar 10:51
f52d59d
Compare
Choose a tag to compare
llava : fix clip loading GGUFs with missing description (#12660)

b5003

31 Mar 10:18
52de2e5
Compare
Choose a tag to compare
tts : remove printfs (#12640)

* tts.cpp : llama tokens console output is done using LOG_INF instead of printf(). Therefore the options '--log-disable' and '--log-file' have now uniform impact on all output.

b5002

30 Mar 21:11
2c3f8b8
Compare
Choose a tag to compare
llama : support BailingMoE (Ling) (#12634)

b5001

30 Mar 20:40
4663bd3
Compare
Choose a tag to compare
metal : use constexpr in FA kernels + fix typedef (#12659)

* metal : use constexpr in FA kernels

ggml-ci

* cont

ggml-ci

* cont : fix typedef

ggml-ci

b4999

30 Mar 19:20
7242dd9
Compare
Choose a tag to compare
llama-chat : Add Yandex instruct model template support (#12621)

* add yandex template

* update yandex chat template

* fix tests

* adjust chat template

* fix style

* fix tool macro in template

* add clarify comment

---------

Co-authored-by: Sergei Vorobev <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>

b4998

30 Mar 10:20
492d7f1
Compare
Choose a tag to compare
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci a…