Sync master with upstream release b6240 #211

jan-service-account · 2025-08-22T00:11:37Z

Updates dev branch with latest release (b6240) from ggml-org/llama.cpp

* musa: fix build warnings Signed-off-by: Xiaodong Ye <[email protected]> * fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare] Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>

* lookahead : add sample command to readme * cont : build-agnostic command

* Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动 * feat:Modify the header file include path 1. There's no llava directory in the tools directory. 2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`. * Restore the docker.yml file

Signed-off-by: Jie Fu <[email protected]>

This commit addresses an inconsistency during inference by adding a new member to the `templates_params` struct to indicate whether the chat is in inference mode. This allows the gpt-oss specific function `common_chat_params_init_gpt_oss` to check this flag and the `add_generation_prompt` flag to determine if it should replace the `<|return|>` token with the `<|end|>` token in the prompt. The motivation for this change is to ensure that the formatted prompt of past messages in `common_chat_format_single` matches the output of the formatted new message. The issue is that the gpt-oss template returns different end tags: `<|return|>` when `add_generation_prompt` is false, and `<|end|>` when `add_generation_prompt` is true. This causes the substring function to start at an incorrect position, resulting in tokenization starting with 'tart|>' instead of '<|start|>'. Resolves: ggml-org#15417

These detailed strings were causing increased build time on gcc.

…eams (ggml-org#15444)

…gml-org#15346)

Signed-off-by: Xiaodong Ye <[email protected]>

…15457) This commit removes references to `make` in the examples, as the build system has been updated to use CMake directly and using `make` will now generate an error since Commit 37f10f9 ("make : remove make in favor of CMake (ggml-org#15449)").

* Fix webui crash after streaming * build webui

…gml-org#15466) Signed-off-by: Jie Fu <[email protected]>

ggml-org#15420) * Make Mistral community chat templates optional * Change the flag arg to disable instead of enable community chat templates * Improve error message * Improve help message * Tone down the logger messages

* Initial plan * Initialize copilot instructions exploration * Add comprehensive .github/copilot-instructions.md file * Update Python environment and tools directory documentation - Add instructions for using .venv Python environment - Include flake8 and pyright linting tools from virtual environment - Add tools/ as core directory in project layout - Reference existing configuration files (.flake8, pyrightconfig.json) * add more python dependencies to .venv * Update copilot instructions: add backend hardware note and server testing * Apply suggestions from code review * Apply suggestions from code review * Replace clang-format with git clang-format to format only changed code * Minor formatting improvements: remove extra blank line and add trailing newline * try installing git-clang-format * try just clang-format * Remove --binary flag from git clang-format and add git-clang-format installation to CI * download 18.x release * typo-- * remove --binary flag --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>

… issue (ggml-org#15221) * Fix -Werror=return-type so ci/run.sh can run * Update tools/mtmd/clip.cpp Co-authored-by: Diego Devesa <[email protected]> * Remove false now that we have abort --------- Co-authored-by: Diego Devesa <[email protected]>

* examples : add model conversion tool/example This commit adds an "example/tool" that is intended to help in the process of converting models to GGUF. Currently it supports normal causal models and embedding models. The readme contains instructions and command to guide through the process. The motivation for this to have a structured and repeatable process for model conversions and hopefully with time improve upon it to make the process easier and more reliable. We have started to use this for new model conversions internally and will continue doing so and improve it as we go along. Perhaps with time this should be placed in a different directory than the examples directory, but for now it seems like a good place to keep it while we are still developing it. * squash! examples : add model conversion tool/example Remove dependency on scikit-learn in model conversion example. * squash! examples : add model conversion tool/example Update transformer dep to use non-dev version. And also import `AutoModelForCausalLM` instead of `AutoModel` to ensure compatibility with the latest version. * squash! examples : add model conversion tool/example Remove the logits requirements file from the all requirements file.

ggml-ci

* Changed the CI file to hw * Changed the CI file to hw * Added to sudoers for apt * Removed the clone command and used checkout * Added libcurl * Added gcc-14 * Checking gcc --version * added gcc-14 symlink * added CC and C++ variables * Added the gguf weight * Changed the weights path * Added system specification * Removed white spaces * ci: Replace Jenkins riscv native build Cloud-V pipeline with GitHub Actions workflow Removed the legacy .devops/cloud-v-pipeline Jenkins CI configuration and introduced .github/workflows/build-riscv-native.yml for native RISC-V builds using GitHub Actions. * removed trailing whitespaces * Added the trigger at PR creation * Corrected OS name * Added ccache as setup package * Added ccache for self-hosted runner * Added directory for ccache size storage Co-authored-by: Sigbjørn Skjæret <[email protected]> * Changed the build command and added ccache debug log * Added the base dir for the ccache * Re-trigger CI * Cleanup and refactored ccache steps * Cleanup and refactored ccache steps --------- Co-authored-by: Akif Ejaz <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

…org#15475) Signed-off-by: Jie Fu <[email protected]>

* kv-cache : drop the "unified" prefix ggml-ci * cont : fix comment [no ci]

…l-org#15477) Signed-off-by: Jie Fu <[email protected]>

* vulkan: Reuse conversion results in prealloc_y Cache the pipeline and tensor that were most recently used to fill prealloc_y, and skip the conversion if the current pipeline/tensor match. * don't use shared pointer for prealloc_y_last_pipeline_used

Co-authored-by: aeseulgi <[email protected]>

ggml-ci

…pt processing (ggml-org#15488)

yeahdongcn and others added 30 commits August 22, 2025 11:37

lookahead : add sample command to readme (ggml-org#15447)

c82d593

* lookahead : add sample command to readme * cont : build-agnostic command

common : fix context shift help message (ggml-org#15448)

a884477

Signed-off-by: Jie Fu <[email protected]>

vulkan: shorten pipeline name strings (ggml-org#15431)

6420449

These detailed strings were causing increased build time on gcc.

CUDA: replace GGML_CUDA_F16 with CUDA arch checks (ggml-org#15433)

2b76bf5

CUDA: refactor FA support/selection code (ggml-org#15454)

dfdbd58

server: fix OpenAI API compatibility for usage statistics in chat str…

d051f99

…eams (ggml-org#15444)

sched : copy only the used experts when offloading prompt processing (g…

b161e6a

…gml-org#15346)

musa: add GGML_UNUSED_VARS (ggml-org#15446)

a79a154

Signed-off-by: Xiaodong Ye <[email protected]>

server : fix webui (ggml-org#15462)

4d96a6e

* Fix webui crash after streaming * build webui

ggml : fix condition of im2col on Metal backend (ggml-org#15460)

7b366f3

common : fix incorrect print of non-ascii characters in the logging (g…

39e03e4

…gml-org#15466) Signed-off-by: Jie Fu <[email protected]>

ci : continue file download with wget (ggml-org#15471)

78e0011

ggml-ci

examples : install torch-cpu for model conversion tool/example (ggml-…

bf30afd

…org#15475) Signed-off-by: Jie Fu <[email protected]>

kv-cache : drop the "unified" prefix (ggml-org#15467)

bfcbd9f

* kv-cache : drop the "unified" prefix ggml-ci * cont : fix comment [no ci]

examples : fix some typos in examples/model-conversion/README.md (ggm…

e335571

…l-org#15477) Signed-off-by: Jie Fu <[email protected]>

vulkan: add exp operation (ggml-org#15456)

be33148

Co-authored-by: aeseulgi <[email protected]>

vulkan : support conv_2d_dw with f16 weights (ggml-org#15392)

ea94286

graph : remove build_attn_with_sinks overload (ggml-org#15469)

d31ebcb

ggml-ci

llama : remove deprecated llama_kv_self API (ggml-org#15472)

b9dce05

ggml-ci

sched : fix possible use of wrong ids tensor when offloading moe prom…

d0a2a10

…pt processing (ggml-org#15488)

qnixsynapse force-pushed the update-dev-from-master-2025-08-22-00-11 branch from 54a241f to d0a2a10 Compare August 22, 2025 06:09

Minh141120 merged commit 6ef69ba into dev Aug 22, 2025
13 of 14 checks passed

Minh141120 deleted the update-dev-from-master-2025-08-22-00-11 branch August 22, 2025 06:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b6240 #211

Sync master with upstream release b6240 #211

Uh oh!

jan-service-account commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Sync master with upstream release b6240 #211

Sync master with upstream release b6240 #211

Uh oh!

Conversation

jan-service-account commented Aug 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants