Releases · ddwkim/llama.cpp

20 Aug 14:53

657b8a7

b6215 Latest

Latest

chat: handle gpt-oss return/end token inconsistency (#15421)

This commit addresses an inconsistency during inference by adding a new
member to the `templates_params` struct to indicate whether the chat is
in inference mode. This allows the gpt-oss specific function
`common_chat_params_init_gpt_oss` to check this flag and the
`add_generation_prompt` flag to determine if it should replace the
`<|return|>` token with the `<|end|>` token in the prompt.

The motivation for this change is to ensure that the formatted prompt of
past messages in `common_chat_format_single` matches the output of the
formatted new message. The issue is that the gpt-oss template returns
different end tags: `<|return|>` when `add_generation_prompt` is false,
and `<|end|>` when `add_generation_prompt` is true. This causes the
substring function to start at an incorrect position, resulting in
tokenization starting with 'tart|>' instead of '<|start|>'.

Resolves: https://github.com/ggml-org/llama.cpp/issues/15417

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-20T14:53:16Z
llama-b6215-bin-macos-arm64.zip

sha256:51f8e2ca3b9cedd482be680074f1e791c20a584bb8573d52e1a5315629c45ba4

10.9 MB 2025-08-20T14:53:26Z
llama-b6215-bin-macos-x64.zip

sha256:ad0203db9b772bcce05a834115e4860fc1746b26cc21421a368cd064d26e1250

28 MB 2025-08-20T14:53:27Z
llama-b6215-bin-ubuntu-vulkan-x64.zip

sha256:9ba6240092b6652a6071523e3ec05b4e7b9bdd1e8137513e09eb4931cac09bd9

22.1 MB 2025-08-20T14:53:28Z
llama-b6215-bin-ubuntu-x64.zip

sha256:b9af6cf672209e1311fa753219a917b5024ed95b6578acd352ae6044f9deb546

12.9 MB 2025-08-20T14:53:29Z
llama-b6215-bin-win-cpu-arm64.zip

sha256:e69780201af5137e409eb556e6adc1a2d511531a68585280fb7aa9197443ac63

11.1 MB 2025-08-20T14:53:30Z
llama-b6215-bin-win-cpu-x64.zip

sha256:080702092f6320e6372dd902f52d5b4b1e9eabbaa42a80a35f216e98acc945fd

14 MB 2025-08-20T14:53:31Z
llama-b6215-bin-win-cuda-12.4-x64.zip

sha256:28004adfa0b788b31973d50c37e15f7595384c42b2e2b6dbf8bb25741999bec5

139 MB 2025-08-20T14:53:32Z
llama-b6215-bin-win-hip-radeon-x64.zip

sha256:cc4901372943f51cc917fbd39c2c1cb3d75a9c8def3488701890f5eabb3c0374

288 MB 2025-08-20T14:53:36Z
llama-b6215-bin-win-opencl-adreno-arm64.zip

sha256:af99fa2c04f217d6e6cc79113f9899b601d85ae43b01ffdeb41224b650c485aa

11.5 MB 2025-08-20T14:53:43Z
Source code (zip)

2025-08-20T12:26:01Z
Source code (tar.gz)

2025-08-20T12:26:01Z

20 Aug 03:25

github-actions

b6210

a094f38

b6210

musa: fix build warnings (#15258)

* musa: fix build warnings

Signed-off-by: Xiaodong Ye <[email protected]>

* fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare]

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 15

16 Aug 19:06

github-actions

b6182

1fe0029

b6182

vulkan: fuse adds (#15252)

* vulkan: fuse adds

Fuse adds that have the same shape, which are common in MoE models.
It will currently fuse up to 6 adds, because we assume no more than
8 descriptors per dispatch. But this could be changed.

* check runtimeDescriptorArray feature

* disable multi_add for Intel due to likely driver bug

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: ddwkim/llama.cpp

b6215

Uh oh!

b6210

Uh oh!

b6182

Uh oh!