forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 3
Sync master with upstream release b6240 #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* musa: fix build warnings Signed-off-by: Xiaodong Ye <[email protected]> * fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare] Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>
* lookahead : add sample command to readme * cont : build-agnostic command
* Update docker.yml
修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动
* feat:Modify the header file include path
1. There's no llava directory in the tools directory.
2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`.
* Restore the docker.yml file
Signed-off-by: Jie Fu <[email protected]>
This commit addresses an inconsistency during inference by adding a new member to the `templates_params` struct to indicate whether the chat is in inference mode. This allows the gpt-oss specific function `common_chat_params_init_gpt_oss` to check this flag and the `add_generation_prompt` flag to determine if it should replace the `<|return|>` token with the `<|end|>` token in the prompt. The motivation for this change is to ensure that the formatted prompt of past messages in `common_chat_format_single` matches the output of the formatted new message. The issue is that the gpt-oss template returns different end tags: `<|return|>` when `add_generation_prompt` is false, and `<|end|>` when `add_generation_prompt` is true. This causes the substring function to start at an incorrect position, resulting in tokenization starting with 'tart|>' instead of '<|start|>'. Resolves: ggml-org#15417
These detailed strings were causing increased build time on gcc.
Signed-off-by: Xiaodong Ye <[email protected]>
…15457) This commit removes references to `make` in the examples, as the build system has been updated to use CMake directly and using `make` will now generate an error since Commit 37f10f9 ("make : remove make in favor of CMake (ggml-org#15449)").
* Fix webui crash after streaming * build webui
…gml-org#15466) Signed-off-by: Jie Fu <[email protected]>
ggml-org#15420) * Make Mistral community chat templates optional * Change the flag arg to disable instead of enable community chat templates * Improve error message * Improve help message * Tone down the logger messages
* Initial plan * Initialize copilot instructions exploration * Add comprehensive .github/copilot-instructions.md file * Update Python environment and tools directory documentation - Add instructions for using .venv Python environment - Include flake8 and pyright linting tools from virtual environment - Add tools/ as core directory in project layout - Reference existing configuration files (.flake8, pyrightconfig.json) * add more python dependencies to .venv * Update copilot instructions: add backend hardware note and server testing * Apply suggestions from code review * Apply suggestions from code review * Replace clang-format with git clang-format to format only changed code * Minor formatting improvements: remove extra blank line and add trailing newline * try installing git-clang-format * try just clang-format * Remove --binary flag from git clang-format and add git-clang-format installation to CI * download 18.x release * typo-- * remove --binary flag --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>
… issue (ggml-org#15221) * Fix -Werror=return-type so ci/run.sh can run * Update tools/mtmd/clip.cpp Co-authored-by: Diego Devesa <[email protected]> * Remove false now that we have abort --------- Co-authored-by: Diego Devesa <[email protected]>
* examples : add model conversion tool/example This commit adds an "example/tool" that is intended to help in the process of converting models to GGUF. Currently it supports normal causal models and embedding models. The readme contains instructions and command to guide through the process. The motivation for this to have a structured and repeatable process for model conversions and hopefully with time improve upon it to make the process easier and more reliable. We have started to use this for new model conversions internally and will continue doing so and improve it as we go along. Perhaps with time this should be placed in a different directory than the examples directory, but for now it seems like a good place to keep it while we are still developing it. * squash! examples : add model conversion tool/example Remove dependency on scikit-learn in model conversion example. * squash! examples : add model conversion tool/example Update transformer dep to use non-dev version. And also import `AutoModelForCausalLM` instead of `AutoModel` to ensure compatibility with the latest version. * squash! examples : add model conversion tool/example Remove the logits requirements file from the all requirements file.
* Changed the CI file to hw * Changed the CI file to hw * Added to sudoers for apt * Removed the clone command and used checkout * Added libcurl * Added gcc-14 * Checking gcc --version * added gcc-14 symlink * added CC and C++ variables * Added the gguf weight * Changed the weights path * Added system specification * Removed white spaces * ci: Replace Jenkins riscv native build Cloud-V pipeline with GitHub Actions workflow Removed the legacy .devops/cloud-v-pipeline Jenkins CI configuration and introduced .github/workflows/build-riscv-native.yml for native RISC-V builds using GitHub Actions. * removed trailing whitespaces * Added the trigger at PR creation * Corrected OS name * Added ccache as setup package * Added ccache for self-hosted runner * Added directory for ccache size storage Co-authored-by: Sigbjørn Skjæret <[email protected]> * Changed the build command and added ccache debug log * Added the base dir for the ccache * Re-trigger CI * Cleanup and refactored ccache steps * Cleanup and refactored ccache steps --------- Co-authored-by: Akif Ejaz <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
…org#15475) Signed-off-by: Jie Fu <[email protected]>
* kv-cache : drop the "unified" prefix ggml-ci * cont : fix comment [no ci]
…l-org#15477) Signed-off-by: Jie Fu <[email protected]>
* vulkan: Reuse conversion results in prealloc_y Cache the pipeline and tensor that were most recently used to fill prealloc_y, and skip the conversion if the current pipeline/tensor match. * don't use shared pointer for prealloc_y_last_pipeline_used
Co-authored-by: aeseulgi <[email protected]>
54a241f to
d0a2a10
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates dev branch with latest release (b6240) from ggml-org/llama.cpp