Skip to content

Releases: srogmann/llama.cpp

b6791

17 Oct 21:21
66b0dbc

Choose a tag to compare

llama-model: fix insonsistent ctxs <-> bufs order (#16581)

b6745

12 Oct 22:35
a31cf36

Choose a tag to compare

metal : add opt_step_adamw and op_sum (#16529)

* scaffold to support opt step adamw on metal (not written so far)

* add opt-step-adamw kernel for metal

* pass op->src[4] as a separate buffer to the pipeline

* add bounds check to opt-step-adamw kernel

* complete scaffold for GGML_OP_SUM

* naive GGML_OP_SUM kernel

* remove unwanted comment

* change OP_SUM capability gate

* Add has_simdgroup_reduction to both ops to pass CI

b6729

10 Oct 19:08
81086cd

Choose a tag to compare

vocab : mark EOT token for Granite models (#16499)

* vocab : mark EOT token for Granite models

* sampling : fallback to EOS when EOT is not found

b6692

05 Oct 22:47
ca71fb9

Choose a tag to compare

model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)

* feat: Add granite-docling conversion using trillion pretokenizer

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add granite-docling vocab pre enum

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use granite-docling pre

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add clip_is_idefics3

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Allow multi-token boundary sequences for image templating

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add tiling support for idefices3 in clip.cpp

This should likely be moved into llava_uhd::get_slice_instructions, but for
now this avoids disrupting the logic there.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Partial support for full templating for idefics3 in mtmd

There are still errors encoding some of the image chunks, but the token
sequence now matches transformers _almost_ perfectly, except for the double
newline before the global image which shows up as two consecutive newline
tokens instead of a single double-newline token. I think this is happening
because the blocks are tokenized separately then concatenated.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Fully working image preprocessing for idefics3 w/ resize and slicing

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Parse the preprocessor config's longest side and add it to the mmproj hparams

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use the longest side instead of size * scale_factor

For Granite Docling, these come out to the same value, but that was just a
conicidence.

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Allow batch encoding and remove clip_is_idefics3

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Remove unnecessary conditionals for empty token vectors

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Use image_manipulation util

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* add test model

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>

b6686

04 Oct 09:33
128d522

Choose a tag to compare

chat : support Magistral thinking (#16413)

* feat: added a dedicated Magistral chat format that preserves [THINK] spans, parses reasoning before tool calls

* feat: new flow in the chat template test suite for Magistral

b6651

30 Sep 19:53
bf6f3b3

Choose a tag to compare

common : disable progress bar without a tty (#16352)

* common : disable progress bar without a tty

Signed-off-by: Adrien Gallouët <[email protected]>

* Add missing headers

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>

b6601

26 Sep 22:20
624207e

Choose a tag to compare

devops: add s390x & ppc64le CI (#15925)

* devops: move s390x and ppc64le ci build

we have access to ubuntu-24.04-s390x and ppc64le images now

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le for now since they have compiler errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: stop warnings as errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: switch to non-macro flag

Signed-off-by: Aaron Teo <[email protected]>

* devops: going the llama macro route

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian gguf test models

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le to test s390x, check test build

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.inp files for big-endian tests

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.out files for big-endian too

Signed-off-by: Aaron Teo <[email protected]>

* devops: add python setup and endian byteswap

Signed-off-by: Aaron Teo <[email protected]>

* devops: pooring thing does not have s390x python3

Signed-off-by: Aaron Teo <[email protected]>

* devops: add missing rust compiler for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: try rust actions runner

Signed-off-by: Aaron Teo <[email protected]>

* Revert "devops: try rust actions runner"

This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c.

Signed-off-by: Aaron Teo <[email protected]>

* devops: try a different path for rust

Signed-off-by: Aaron Teo <[email protected]>

* devops: dump home directory and user info

Signed-off-by: Aaron Teo <[email protected]>

* devops: install gguf-py only

Signed-off-by: Aaron Teo <[email protected]>

* devops: missed relative path

Signed-off-by: Aaron Teo <[email protected]>

* devops: remove big-endian files since local swapping is working

Signed-off-by: Aaron Teo <[email protected]>

* devops: revert test-tokenizer-0 cmakelists

Signed-off-by: Aaron Teo <[email protected]>

* Fix unicode flags conversion from and to uint16_t

Bitfields are allocated in different order on s390x

Signed-off-by: Aaron Teo <[email protected]>

* Simplify byteswap command

Signed-off-by: Aaron Teo <[email protected]>

* Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs

Signed-off-by: Aaron Teo <[email protected]>

* Fix endianness detection in vocab loader

Signed-off-by: Aaron Teo <[email protected]>

* Disable test-thread-safety on s390x

In this test a model is downloaded,
then immediately loaded to check if more downloads are needed,
and then used for test.

There is no clean way to separate all those steps
 to add byteswapping between them, so just skip this test.

Signed-off-by: Aaron Teo <[email protected]>

* Fix q8_0 test in test-quantize-fns

vec_signed uses unexpected rounding mode.
Explicitly use different rounding function.

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian stories260K

Signed-off-by: Aaron Teo <[email protected]>

* devops: add s390x test-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix test does not exist

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix model not found llama-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* Fix q3_K dot product error in test-quantize-fns on s390x

Array q8bytes had only 4 elements allocated, but 8 elements accessed.
This lead to write out of bounds and later read of overwritten values out of bounds
and incorrect result.

Signed-off-by: Aaron Teo <[email protected]>

* devops: re-enable ppc64le for testing

Signed-off-by: Aaron Teo <[email protected]>

* devops: activate test-thread-safety for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le tests

for some reason it keeps failing test-thread-safety tests and I do not
    have a machine that is able to replicate the tests.

Signed-off-by: Aaron Teo <[email protected]>

* devops: LLAMA_FATAL_WARNINGS=ON

Signed-off-by: Aaron Teo <[email protected]>

* Correct repository URL for s390x for test-thread-safety model

Signed-off-by: Aaron Teo <[email protected]>

* Fix fs_get_cache_directory

Ensure it works even if both XDG_CACHE_HOME and HOME are unset.
This might happen in containers.

Signed-off-by: Aaron Teo <[email protected]>

* Re-enable CI for ppc64le

Signed-off-by: Aaron Teo <[email protected]>

* Fortify ggml_rope_impl

Only memcpy data from sections argument if it's non-NULL.

Signed-off-by: Aaron Teo <[email protected]>

* Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way

* Update URL for big-endian model

* Update .github/workflows/build.yml

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update remaining mentions of BE models to ggml-org/models repo

---------

Signed-off-by: Aaron Teo <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>