Releases · EAddario/llama.cpp

27 Jul 14:28

bf78f54

b6005

vulkan: add ops docs (#14900)

Assets 15

26 Jul 13:16

github-actions

b5996

11dd5a4

b5996

CANN: Implement GLU ops (#14884)

Implement REGLU, GEGLU, SWIGLU ops according to #14158

Assets 15

26 Jul 06:52

github-actions

b5995

9b8f3c6

b5995

musa: fix build warnings (unused variable) (#14869)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 15

25 Jul 22:12

github-actions

b5994

c7f3169

b5994

ggml-cpu : disable GGML_NNPA by default due to instability (#14880)

* docs: update s390x document for sentencepiece

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit e086c5e3a7ab3463d8e0906efcfa39352db0a48d)

* docs: update huggingface links + reword

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 8410b085ea8c46e22be38266147a1e94757ef108)

* ggml-cpu: disable ggml-nnpa compile flag by default

fixes #14877

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit 412f4c7c88894b8f55846b4719c76892a23cfe09)

* docs: update s390x build docs to reflect nnpa disable

Signed-off-by: Aaron Teo <[email protected]>
(cherry picked from commit c1eeae1d0c2edc74ab9fbeff2707b0d357cf0b4d)

---------

Signed-off-by: Aaron Teo <[email protected]>

Assets 15

23 Jul 14:07

github-actions

b5971

221c0e0

b5971

ci : correct label refactor->refactoring (#14832)

Assets 15

22 Jul 22:27

github-actions

b5964

acd6cb1

b5964

ggml : model card yaml tab->2xspace (#14819)

Assets 15

19 Jul 17:16

github-actions

b5942

9008328

b5942

imatrix : use GGUF to store importance matrices (#9400)

* imatrix : allow processing multiple chunks per batch

* perplexity : simplify filling the batch

* imatrix : fix segfault when using a single chunk per batch

* imatrix : use GGUF to store imatrix data

* imatrix : fix conversion problems

* imatrix : use FMA and sort tensor names

* py : add requirements for legacy imatrix convert script

* perplexity : revert changes

* py : include imatrix converter requirements in toplevel requirements

* imatrix : avoid using designated initializers in C++

* imatrix : remove unused n_entries

* imatrix : allow loading mis-ordered tensors

Sums and counts tensors no longer need to be consecutive.

* imatrix : more sanity checks when loading multiple imatrix files

* imatrix : use ggml_format_name instead of std::string concatenation

Co-authored-by: Xuan Son Nguyen <[email protected]>

* quantize : use unused imatrix chunk_size with LLAMA_TRACE

* common : use GGUF for imatrix output by default

* imatrix : two-way conversion between old format and GGUF

* convert : remove imatrix to gguf python script

* imatrix : use the function name in more error messages

* imatrix : don't use FMA explicitly

This should make comparisons between the formats easier
because this matches the behavior of the previous version.

* imatrix : avoid returning from void function save_imatrix

* imatrix : support 3d tensors with MUL_MAT

* quantize : fix dataset name loading from gguf imatrix

* common : move string_remove_suffix from quantize and imatrix

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* imatrix : add warning when legacy format is written

* imatrix : warn when writing partial data, to help guess dataset coverage

Also make the legacy format store partial data
by using neutral values for missing data.
This matches what is done at read-time for the new format,
and so should get the same quality in case the old format is still used.

* imatrix : avoid loading model to convert or combine imatrix

* imatrix : avoid using imatrix.dat in README

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

19 Jul 12:23

github-actions

b5939

f0d4d17

b5939

Documentation: Update build.md's Vulkan section (#14736)

* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation.

* Documentation: Reorganize build.md's Vulkan section.

Assets 15

17 Jul 11:15

github-actions

b5921

086cf81

b5921

llama : fix parallel processing for lfm2 (#14705)

Assets 15

16 Jul 08:07

github-actions

b5906

4b91d6f

b5906

convert : only check for tokenizer folder if we need it (#14704)

Assets 15

Releases: EAddario/llama.cpp

b6005

Uh oh!

b5996

Uh oh!

b5995

Uh oh!

b5994

Uh oh!

b5971

Uh oh!

b5964

Uh oh!

b5942

Uh oh!

b5939

Uh oh!

b5921

Uh oh!

b5906

Uh oh!