Releases · ngxson/llama.cpp

21 Aug 15:34

96452a3

b6235

vulkan: Reuse conversion results in prealloc_y (#15410)

* vulkan: Reuse conversion results in prealloc_y

Cache the pipeline and tensor that were most recently used to fill prealloc_y,
and skip the conversion if the current pipeline/tensor match.

* don't use shared pointer for prealloc_y_last_pipeline_used

Assets 15

21 Aug 10:44

github-actions

b6229

2758fa1

b6229

examples : add model conversion tool/example (#15455)

* examples : add model conversion tool/example

This commit adds an "example/tool" that is intended to help in the
process of converting models to GGUF. Currently it supports normal
causal models and embedding models. The readme contains instructions and
command to guide through the process.

The motivation for this to have a structured and repeatable process for
model conversions and hopefully with time improve upon it to make the
process easier and more reliable. We have started to use this for new
model conversions internally and will continue doing so and improve it
as we go along. Perhaps with time this should be placed in a different
directory than the examples directory, but for now it seems like a good
place to keep it while we are still developing it.

* squash! examples : add model conversion tool/example

Remove dependency on scikit-learn in model conversion example.

* squash! examples : add model conversion tool/example

Update transformer dep to use non-dev version. And also import
`AutoModelForCausalLM` instead of `AutoModel` to ensure compatibility
with the latest version.

* squash! examples : add model conversion tool/example

Remove the logits requirements file from the all requirements file.

Assets 15

21 Aug 10:32

github-actions

b6228

b108e42

b6228

ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without…

Assets 15

21 Aug 09:13

github-actions

b6225

2f3dbff

b6225

common : fix incorrect print of non-ascii characters in the logging (…

Assets 15

21 Aug 05:53

github-actions

b6224

945e1f1

b6224

ggml : fix condition of im2col on Metal backend (#15460)

Assets 15

21 Aug 03:56

github-actions

b6221

8ad038c

b6221

musa: add GGML_UNUSED_VARS (#15446)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 15

20 Aug 23:57

github-actions

b6220

5682a37

b6220

sched : copy only the used experts when offloading prompt processing …

Assets 15

20 Aug 16:00

github-actions

b6217

7a6e91a

b6217

CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433)

Assets 15

20 Aug 14:56

github-actions

b6216

fec9519

b6216

vulkan: shorten pipeline name strings (#15431)

These detailed strings were causing increased build time on gcc.

Assets 15

20 Aug 12:48

github-actions

b6215

657b8a7

b6215

chat: handle gpt-oss return/end token inconsistency (#15421)

This commit addresses an inconsistency during inference by adding a new
member to the `templates_params` struct to indicate whether the chat is
in inference mode. This allows the gpt-oss specific function
`common_chat_params_init_gpt_oss` to check this flag and the
`add_generation_prompt` flag to determine if it should replace the
`<|return|>` token with the `<|end|>` token in the prompt.

The motivation for this change is to ensure that the formatted prompt of
past messages in `common_chat_format_single` matches the output of the
formatted new message. The issue is that the gpt-oss template returns
different end tags: `<|return|>` when `add_generation_prompt` is false,
and `<|end|>` when `add_generation_prompt` is true. This causes the
substring function to start at an incorrect position, resulting in
tokenization starting with 'tart|>' instead of '<|start|>'.

Resolves: https://github.com/ggml-org/llama.cpp/issues/15417

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6235

Uh oh!

b6229

Uh oh!

b6228

Uh oh!

b6225

Uh oh!

b6224

Uh oh!

b6221

Uh oh!

b6220

Uh oh!

b6217

Uh oh!

b6216

Uh oh!

b6215

Uh oh!