forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 3
Sync master with upstream release b6811 #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jan-service-account
merged 11 commits into
dev
from
update-dev-from-master-2025-10-21-00-34
Oct 21, 2025
Merged
Sync master with upstream release b6811 #299
jan-service-account
merged 11 commits into
dev
from
update-dev-from-master-2025-10-21-00-34
Oct 21, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…g#16674) The unexpeced pooling_type warning was incorrectly shown when users did not specify the --pooling-type parameter. In this case, the parameter defaults to `LLAMA_POOLING_TYPE_UNSPECIFIED (-1)`, and the code automatically applies the model's default pooling type. Example of spurious warning: ``` $ llama-embedding -hf ggml-org/bge-m3-Q8_0-GGUF -p "hello" ... llama_init_from_model: model default pooling_type is [2], but [-1] was specified ... ``` This fix ensures the warning only appears when users explicitly specify a pooling type that differs from the model's default (e.g., using --pooling-type mean on a model that expects CLS pooling).
…l-org#16613) * SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators Clean up unrelated changes from previous commit * Chore: remove empty lines and fix indentation * Clean up: remove leftover blank lines and fix spacing * chore: fix trailing whitespace and ensure final newline * Cleanup: remove redundant declarations already defined in header * Sync docs/ops.md with updated backend operation support * docs: update ops.md after rebase * docs: update ops.md - Vulkan supports SSM_CONV and SSM_SCAN
Signed-off-by: deadprogram <[email protected]>
…16614) ## Why it failed When compiling with strict compiler flags (-Wmissing-braces -Werror=missing-braces), the build fails with the following error: ``` cmake \ -S . \ -B ../llama.cpp.build \ --preset=x64-linux-gcc-debug \ -DCMAKE_INSTALL_PREFIX=/tmp/local \ -DCMAKE_CXX_FLAGS="-Wmissing-braces -Werror=missing-braces" && \ cmake --build ../llama.cpp.build/ ... In file included from /home/otegami/work/cpp/llama.cpp/src/llama-graph.h:4, from /home/otegami/work/cpp/llama.cpp/src/llama-model.h:5, from /home/otegami/work/cpp/llama.cpp/src/llama.cpp:8: /home/otegami/work/cpp/llama.cpp/src/llama-batch.h:126:48: error: missing braces around initializer for 'std::__array_traits<int, 1>::_Type' {aka 'int [1]'} [-Werror=missing-braces] 126 | std::array<llama_seq_id, 1> seq_id_0 = { 0 }; // default sequence id | ^ cc1plus: some warnings being treated as errors ``` The issue is that std::array initialization requires double braces. ## How to fix This PR changes `{ 0 }` to `{{ 0 }}` for std::array initialization. This is part of a series of commits to fix missing braces warnings across the codebase. - src/llama-batch.h <- This PR is here. - src/llama-context.cpp - tests/test-backend-ops.cpp - tests/test-gguf.cpp - tools/mtmd/clip.cpp Benefits: - std::array is a struct containing a C-style array, requiring nested braces - Enables stricter compiler warnings to catch potential issues
…rsations (ggml-org#16327) * feat: Per-conversation loading states and tracking streaming stats * chore: update webui build output * refactor: Chat state management Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution. * feat: Adds loading indicator to conversation items * chore: update webui build output * fix: Fix aborting chat streaming Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion. * refactor: Remove redundant comments * chore: build webui static output * refactor: Cleanup * chore: update webui build output * chore: update webui build output * fix: Conversation loading indicator for regenerating messages * chore: update webui static build * feat: Improve configuration * feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI
* webui : added download action (ggml-org#13552) * webui : import and export (for all conversations) * webui : fixed download-format, import of one conversation * webui : add ExportedConversations type for chat import/export * feat: Update naming & order * chore: Linting * feat: Import/Export UX improvements * chore: update webui build output * feat: Update UI placement of Import/Export tab in Chat Settings Dialog * refactor: Cleanup chore: update webui build output * feat: Enable shift-click multiple conversation items selection * chore: update webui static build * chore: update webui static build --------- Co-authored-by: Sascha Rogmann <[email protected]>
* fix: Prevent premature submission on IME input * chore: update webui static build * refactor: Put IME completion checker in a helper function and add checking for `KeyboardEvent.eventKey === 229` * chore: update webui static build * chore: update webui static build * chore: update webui static build
* add BailingMoeV2 support * update llm types * undo * undo * update llm types * add model collection link * update * almost working * correct group selection and rename n_group_exp * avoid large top_k and use argmax instead for now if we had something like argmax2 that would be equivalent, but this works fine until then * poke * skip group selection when there are no tokens * fix 1T conversion * hopefully fixed expert group selection third time's the charm? * make expert group selection generally available The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture. * allow n_expert_groups to be 1 (Kimi K2) * address review suggestions
* sycl: add PAD_REFLECT_D1 operator support * docs(ops): regenerate docs/ops.md * remove trailing whitespaces * style: fix editorconfig issues — trim trailing spaces and normalize EOLs * fix: move PAD_REFLECT_1D case outside of fall-through block
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates dev branch with latest release (b6811) from ggml-org/llama.cpp