Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b6215
chat: handle gpt-oss return/end token inconsistency (#15421) This commit addresses an inconsistency during inference by adding a new member to the `templates_params` struct to indicate whether the chat is in inference mode. This allows the gpt-oss specific function `common_chat_params_init_gpt_oss` to check this flag and the `add_generation_prompt` flag to determine if it should replace the `<|return|>` token with the `<|end|>` token in the prompt. The motivation for this change is to ensure that the formatted prompt of past messages in `common_chat_format_single` matches the output of the formatted new message. The issue is that the gpt-oss template returns different end tags: `<|return|>` when `add_generation_prompt` is false, and `<|end|>` when `add_generation_prompt` is true. This causes the substring function to start at an incorrect position, resulting in tokenization starting with 'tart|>' instead of '<|start|>'. Resolves: https://github.com/ggml-org/llama.cpp/issues/15417
b6213
cmake : fix target include directories (#15450) * Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动 * feat:Modify the header file include path 1. There's no llava directory in the tools directory. 2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`. * Restore the docker.yml file
b6210
musa: fix build warnings (#15258) * musa: fix build warnings Signed-off-by: Xiaodong Ye <[email protected]> * fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare] Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>
b6209
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#1…
b6208
model : add gpt-oss type strings (#15424)
b6206
server : disable context shift by default (#15416) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local
b6203
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (#1…
b6202
chat : clarify the meaning of reasoning_format (#15408) * chat : clarify the meaning of reasoning_format * add link to this PR
b6201
server : remove swa_full warning (#15399)
b6199
mtmd : clean up clip_n_output_tokens (#15391)