Skip to content

Commit d894998

Browse files
firecoperanaallozaurBramasisaac-mcfadyenServeurpersoCom
authored
Add --webui arg to launch llama.cpp new webui (ikawrakow#786)
* Add new webui from llama.cpp * Add new webui * feat: Improve mobile UI for Settings Dialog (#16084) * feat: Improve mobile UI for Settings Dialog * chore: update webui build output * fix: Linting errors * chore: update webui build output # Conflicts: # examples/server/webui_llamacpp/src/lib/components/app/chat/ChatSettings/ChatSettingsFields.svelte # examples/server/webui_llamacpp/src/lib/components/app/chat/ChatSettings/ChatSettingsSection.svelte # tools/server/public/index.html.gz * webui : fix handling incomplete chunks (#16107) * Always show message actions for mobile UI + improvements for user message sizing (#16076) # Conflicts: # .gitignore # examples/server/webui_llamacpp/package.json # examples/server/webui_llamacpp/scripts/dev.sh # tools/server/webui/scripts/post-build.sh * webui: switch to hash-based routing (alternative of #16079) (#16157) * Switched web UI to hash-based routing * Added hash to missed goto function call * Removed outdated SPA handling code * Fixed broken sidebar home link # Conflicts: # examples/server/webui_llamacpp/src/routes/+layout.ts # tools/server/server.cpp * Allow viewing conversations even when llama server is down (#16255) * webui: allow viewing conversations and sending messages even if llama-server is down - Cached llama.cpp server properties in browser localStorage on startup, persisting successful fetches and reloading them when refresh attempts fail so the chat UI continues to render while the backend is unavailable. - Cleared the stored server properties when resetting the store to prevent stale capability data after cache-backed operation. - Kept the original error-splash behavior when no cached props exist so fresh installs still surface a clear failure state instead of rendering stale data. * feat: Add UI for `props` endpoint unavailable + cleanup logic * webui: extend cached props fallback to offline errors Treat connection failures (refused, DNS, timeout, fetch) the same way as server 5xx so the warning banner shows up when cache is available, instead of falling back to a full error screen. * webui: Left the chat form enabled when a server warning is present so operators can keep sending messages e.g., to restart the backend over llama-swap, even while cached /props data is in use * chore: update webui build output --------- Co-authored-by: Pascal <[email protected]> # Conflicts: # examples/server/webui_llamacpp/src/lib/components/app/chat/ChatScreen/ChatScreenWarning.svelte # examples/server/webui_llamacpp/src/lib/constants/localstorage-keys.ts * Enhance text file detection logic for file attachments (#16199) * feat: Enhances text file detection logic * chore: Build static `webui` output * chore: update webui build output # Conflicts: # examples/server/webui_llamacpp/src/lib/constants/binary-detection.ts * Show message actions by default (#16289) * fix: preserved zero values in chat settings inputs and textareas by switching to nullish coalescing for field values and default placeholders (#16312) * Improve Mobile UI for dialogs and action dropdowns (#16222) * fix: Always show conversation item actions * feat: Improve Alert Dialog and Dialog mobile UI * feat: Add settings reset to default confirmation * fix: Close Edit dialog on save * chore: update webui build output * webui: implement proper z-index system and scroll management - Add CSS variable for centralized z-index control - Fix dropdown positioning with Settings dialog conflicts - Prevent external scroll interference with proper event handling - Clean up hardcoded z-index values for maintainable architecture * webui: ensured the settings dialog enforces dynamic viewport height on mobile while retaining existing desktop sizing overrides * feat: Use `dvh` instead of computed px height for dialogs max height on mobile * chore: update webui build output * feat: Improve Settings fields UI * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Pascal <[email protected]> * Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks (#16326) * fix: prevent reasoning blocks with quotes from being truncated * chore: update webui build output * feat: Improve thinking content parsing * test: Adds ChatMessage component stories for different thinking blocks * chore: update webui build output * fix: ChatMessage story fix --------- Co-authored-by: Aleksander Grygier <[email protected]> * Chatapi ignore empty sampling (#16330) * fix: skip empty sampling fields instead of coercing to 0 in chat API options * chore: update webui build output * webui: Remove running `llama-server` within WebUI `dev.sh` script (#16363) * Add optional setting for showing "Model used:" information (#16337) * feat: Add a setting to include model name used to generate the message * feat: UI improvements * feat: Save model info along with the database message entry creation * chore: Build webui static output * Improve code block color theming (#16325) * feat: Improve code block theming * chore: update webui build output * chore: Update webui static build * Conversation action dialogs as singletons from Chat Sidebar + apply conditional rendering for Actions Dropdown for Chat Conversation Items (#16369) * fix: Render Conversation action dialogs as singletons from Chat Sidebar level * chore: update webui build output * fix: Render Actions Dropdown conditionally only when user hovers conversation item + remove unused markup * chore: Update webui static build * fix: Always truncate conversation names * chore: Update webui static build * fix: track viewportHeight via window.innerHeight to avoid unwanted scrolling (#16356) Use <svelte:window bind:innerHeight> instead of manual resize listener Co-authored-by: Aleksander Grygier <[email protected]> * webui : Fix messages payload sent to chat completions (#16402) * fix: Include just the currently active message branches instead of all in chat completions request * chore: Build webui static output * chore: Formatting * chore: update webui build output * Capture model name only after first token (streaming) or completed request (#16405) * feat: Capture model name only after first token (streaming) or completed request (non-streaming) * chore: update webui build output * chore: update webui build output * Fix missing messages on sibling navigation (#16408) * fix: resolve message disappearing issue when navigating between regenerated siblings by using current leaf nodes instead of cached sibling IDs * chore: update webui build output * chore: update webui build output * webui : added download action (#13552) (#16282) * webui : added download action (#13552) * webui : import and export (for all conversations) * webui : fixed download-format, import of one conversation * webui : add ExportedConversations type for chat import/export * feat: Update naming & order * chore: Linting * webui : Updated static build output --------- Co-authored-by: Aleksander Grygier <[email protected]> * refactor: centralize CoT parsing in backend for streaming mode (#16394) * refactor: unify reasoning handling via backend reasoning_content, drop frontend tag parsing - Updated the chat message component to surface backend-supplied reasoning via message.thinking while showing the raw assistant content without inline tag scrubbing - Simplified chat streaming to append content chunks directly, stream reasoning into the message model, and persist any partial reasoning when generation stops - Refactored the chat service SSE handler to rely on server-provided reasoning_content, removing legacy <think> parsing logic - Refreshed Storybook data and streaming flows to populate the thinking field explicitly for static and streaming assistant messages * refactor: implement streaming-aware universal reasoning parser Remove the streaming mode limitation from --reasoning-format by refactoring try_parse_reasoning() to handle incremental parsing of <think> tags across all formats. - Rework try_parse_reasoning() to track whitespace, partial tags, and multiple reasoning segments, allowing proper separation of reasoning_content and content in streaming mode - Parse reasoning tags before tool call handling in content-only and Llama 3.x formats to ensure inline <think> blocks are captured correctly - Change default reasoning_format from 'auto' to 'deepseek' for consistent behavior - Add 'deepseek-legacy' option to preserve old inline behavior when needed - Update CLI help and documentation to reflect streaming support - Add parser tests for inline <think>...</think> segments The parser now continues processing content after </think> closes instead of stopping, enabling proper message.reasoning_content and message.content separation in both streaming and non-streaming modes. Fixes the issue where streaming responses would dump everything (including post-thinking content) into reasoning_content while leaving content empty. * refactor: address review feedback from allozaur - Passed the assistant message content directly to ChatMessageAssistant to drop the redundant derived state in the chat message component - Simplified chat streaming updates by removing unused partial-thinking handling and persisting partial responses straight from currentResponse - Refreshed the ChatMessage stories to cover standard and reasoning scenarios without the old THINK-tag parsing examples Co-authored-by: Aleksander Grygier <[email protected]> * refactor: restore forced reasoning prefix to pass test-chat ([chat] All tests passed) - store the exact sequence seen on input when 'thinking_forced_open' enforces a reasoning block - inject this prefix before the first accumulated segment in 'reasoning_content', then clear it to avoid duplication - repeat the capture on every new 'start_think' detection to properly handle partial/streaming flows * refactor: address review feedback from ngxson * debug: say goodbye to curl -N, hello one-click raw stream - adds a new checkbox in the WebUI to display raw LLM output without backend parsing or frontend Markdown rendering * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessage.svelte Co-authored-by: Aleksander Grygier <[email protected]> * webui: add Storybook example for raw LLM output and scope reasoning format toggle per story - Added a Storybook example that showcases the chat message component in raw LLM output mode with the provided trace sample - Updated every ChatMessage story to toggle the disableReasoningFormat setting so the raw-output rendering remains scoped to its own example * npm run format * chat-parser: address review feedback from ngxson Co-authored-by: Xuan Son Nguyen <[email protected]> --------- Co-authored-by: Aleksander Grygier <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]> # Conflicts: # common/arg.cpp # examples/server/webui_llamacpp/src/lib/utils/thinking.ts # tools/server/README.md * No markdown in cot (#16483) * fix: let the model think in plaintext * chore: npm run format + npm run build * webui: updated the chat service to only include max_tokens in the req… (#16489) * webui: updated the chat service to only include max_tokens in the request payload when the setting is explicitly provided, while still mapping explicit zero or null values to the infinite-token sentinel * chore: update webui build output * feat: render user content as markdown option (#16358) * feat: render user content as markdown option - Add a persisted 'renderUserContentAsMarkdown' preference to the settings defaults and info metadata so the choice survives reloads like other options - Surface the new 'Render user content as Markdown' checkbox in the General section of the chat settings dialog, beneath the PDF toggle - Render user chat messages with 'MarkdownContent' when the new setting is enabled, matching assistant formatting while preserving the existing card styling otherwise - chore: update webui build output * chore: update webui build output * webui: remove client-side context pre-check and rely on backend for limits (#16506) * fix: make SSE client robust to premature [DONE] in agentic proxy chains * webui: remove client-side context pre-check and rely on backend for limits Removed the client-side context window pre-check and now simply sends messages while keeping the dialog imports limited to core components, eliminating the maximum context alert path Simplified streaming and non-streaming chat error handling to surface a generic 'No response received from server' error whenever the backend returns no content Removed the obsolete maxContextError plumbing from the chat store so state management now focuses on the core message flow without special context-limit cases * webui: cosmetic rename of error messages * Update tools/server/webui/src/lib/stores/chat.svelte.ts Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/stores/chat.svelte.ts Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte Co-authored-by: Aleksander Grygier <[email protected]> * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <[email protected]> # Conflicts: # examples/server/webui_llamacpp/src/lib/components/app/dialogs/ChatErrorDialog.svelte # examples/server/webui_llamacpp/src/lib/components/app/dialogs/MaximumContextAlertDialog.svelte # examples/server/webui_llamacpp/src/lib/services/context.ts * fix: add remark plugin to render raw HTML as literal text (#16505) * fix: add remark plugin to render raw HTML as literal text Implemented a missing MDAST stage to neutralize raw HTML like major LLM WebUIs do ensuring consistent and safe Markdown rendering Introduced 'remarkLiteralHtml', a plugin that converts raw HTML nodes in the Markdown AST into plain-text equivalents while preserving indentation and line breaks. This ensures consistent rendering and prevents unintended HTML execution, without altering valid Markdown structure Kept 'remarkRehype' in the pipeline since it performs the required conversion from MDAST to HAST for KaTeX, syntax highlighting, and HTML serialization Refined the link-enhancement logic to skip unnecessary DOM rewrites, fixing a subtle bug where extra paragraphs were injected after the first line due to full innerHTML reconstruction, and ensuring links open in new tabs only when required Final pipeline: remarkGfm -> remarkMath -> remarkBreaks -> remarkLiteralHtml -> remarkRehype -> rehypeKatex -> rehypeHighlight -> rehypeStringify * fix: address review feedback from allozaur * chore: update webui build output # Conflicts: # examples/server/webui_llamacpp/src/lib/constants/literal-html.ts * Add server-driven parameter defaults and syncing (#16515) # Conflicts: # examples/server/webui_llamacpp/src/lib/components/app/chat/ChatSettings/ParameterSourceIndicator.svelte # examples/server/webui_llamacpp/src/lib/constants/precision.ts # examples/server/webui_llamacpp/src/lib/services/parameter-sync.spec.ts # examples/server/webui_llamacpp/src/lib/services/parameter-sync.ts # examples/server/webui_llamacpp/src/lib/utils/config-helpers.ts # examples/server/webui_llamacpp/src/lib/utils/precision.ts * fix: added a normalization step for MathJax-style \[\] and \(\) delimiters (#16599) * fix: added a normalization step for MathJax-style \[\] and \(\) delimiters So inline and block equations are converted before KaTeX rendering, enabling proper display of model-generated LaTeX in the WebUI * chore: update webui build output * webui: reorganize settings layout (#16607) * webui: reorganize settings layout * chore: update webui build output * fix: remove unused variable * chore: update webui build output * Enable per-conversation loading states to allow having parallel conversations (#16327) * feat: Per-conversation loading states and tracking streaming stats * chore: update webui build output * refactor: Chat state management Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution. * feat: Adds loading indicator to conversation items * chore: update webui build output * fix: Fix aborting chat streaming Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion. * refactor: Remove redundant comments * chore: build webui static output * refactor: Cleanup * chore: update webui build output * chore: update webui build output * fix: Conversation loading indicator for regenerating messages * chore: update webui static build * feat: Improve configuration * feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI * Import/Export UX improvements (#16619) * webui : added download action (#13552) * webui : import and export (for all conversations) * webui : fixed download-format, import of one conversation * webui : add ExportedConversations type for chat import/export * feat: Update naming & order * chore: Linting * feat: Import/Export UX improvements * chore: update webui build output * feat: Update UI placement of Import/Export tab in Chat Settings Dialog * refactor: Cleanup chore: update webui build output * feat: Enable shift-click multiple conversation items selection * chore: update webui static build * chore: update webui static build --------- Co-authored-by: Sascha Rogmann <[email protected]> # Conflicts: # examples/server/webui_llamacpp/src/lib/components/app/chat/ChatSettings/ConversationSelectionDialog.svelte # examples/server/webui_llamacpp/src/lib/components/app/chat/ChatSettings/ImportExportTab.svelte # examples/server/webui_llamacpp/src/lib/utils/conversation-utils.ts * Prevent premature submission on IME input (#16673) * fix: Prevent premature submission on IME input * chore: update webui static build * refactor: Put IME completion checker in a helper function and add checking for `KeyboardEvent.eventKey === 229` * chore: update webui static build * chore: update webui static build * chore: update webui static build # Conflicts: # examples/server/webui_llamacpp/src/lib/utils/is-ime-composing.ts * Handle legacy 'context' attachments (#16687) * webui: introduce OpenAI-compatible model selector in JSON payload (#16562) * webui: introduce OpenAI-compatible model selector in JSON payload * webui: restore OpenAI-Compatible model source of truth and unify metadata capture This change re-establishes a single, reliable source of truth for the active model: fully aligned with the OpenAI-Compat API behavior It introduces a unified metadata flow that captures the model field from both streaming and non-streaming responses, wiring a new onModel callback through ChatService The model name is now resolved directly from the API payload rather than relying on server /props or UI assumptions ChatStore records and persists the resolved model for each assistant message during streaming, ensuring consistency across the UI and database Type definitions for API and settings were also extended to include model metadata and the onModel callback, completing the alignment with OpenAI-Compat semantics * webui: address review feedback from allozaur * webui: move model selector into ChatForm (idea by @allozaur) * webui: make model selector more subtle and integrated into ChatForm * webui: replaced the Flowbite selector with a native Svelte dropdown * webui: add developer setting to toggle the chat model selector * webui: address review feedback from allozaur Normalized streamed model names during chat updates by trimming input and removing directory components before saving or persisting them, so the conversation UI shows only the filename Forced model names within the chat form selector dropdown to render as a single-line, truncated entry with a tooltip revealing the full name * webui: toggle displayed model source for legacy vs OpenAI-Compat modes When the selector is disabled, it falls back to the active server model name from /props When the model selector is enabled, the displayed model comes from the message metadata (the one explicitly selected and sent in the request) * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormActions.svelte Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/constants/localstorage-keys.ts Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/services/chat.ts Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/services/chat.ts Co-authored-by: Aleksander Grygier <[email protected]> * webui: refactor model selector and persistence helpers - Replace inline portal and event listeners with proper Svelte bindings - Introduce 'persisted' store helper for localStorage sync without runes - Extract 'normalizeModelName' utils + Vitest coverage - Simplify ChatFormModelSelector structure and cleanup logic Replaced the persisted store helper's use of '$state/$effect' runes with a plain TS implementation to prevent orphaned effect runtime errors outside component context Co-authored-by: Aleksander Grygier <[email protected]> * webui: document normalizeModelName usage with inline examples * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/stores/models.svelte.ts Co-authored-by: Aleksander Grygier <[email protected]> * Update tools/server/webui/src/lib/stores/models.svelte.ts Co-authored-by: Aleksander Grygier <[email protected]> * webui: extract ModelOption type into dedicated models.d.ts Co-authored-by: Aleksander Grygier <[email protected]> * webui: refine ChatMessageAssistant displayedModel source logic * webui: stabilize dropdown, simplify model extraction, and init assistant model field * chore: update webui static build * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <[email protected]> * chore: npm format, update webui static build * webui: align sidebar trigger position, remove z-index glitch * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <[email protected]> # Conflicts: # examples/server/webui_llamacpp/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte # examples/server/webui_llamacpp/src/lib/services/models.ts # examples/server/webui_llamacpp/src/lib/stores/models.svelte.ts # examples/server/webui_llamacpp/src/lib/stores/persisted.svelte.ts # examples/server/webui_llamacpp/src/lib/types/models.d.ts # examples/server/webui_llamacpp/src/lib/utils/model-names.test.ts # examples/server/webui_llamacpp/src/lib/utils/model-names.ts # examples/server/webui_llamacpp/src/lib/utils/portal-to-body.ts * webui: support q URL parameter (#16728) * webui: support q URL parameter Fixes #16722 I’ve checked that it works with Firefox’s AI tools * webui: apply suggestions from code review Co-authored-by: Aleksander Grygier <[email protected]> * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <[email protected]> * build fix --------- Co-authored-by: firecoperana <firecoperana> Co-authored-by: Aleksander Grygier <[email protected]> Co-authored-by: Quentin Bramas <[email protected]> Co-authored-by: Isaac McFadyen <[email protected]> Co-authored-by: Pascal <[email protected]> Co-authored-by: Sascha Rogmann <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: Sascha Rogmann <[email protected]> Co-authored-by: Florian Badie <[email protected]>
1 parent 6848a0a commit d894998

File tree

282 files changed

+30961
-460
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

282 files changed

+30961
-460
lines changed

.gitignore

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,3 +130,16 @@ poetry.toml
130130

131131
# Scripts
132132
!/scripts/install-oneapi.bat
133+
/examples/server/webui_llamacpp/.gitignore
134+
135+
# Test models for lora adapters
136+
/lora-tests
137+
138+
# Local scripts
139+
/run-vim.sh
140+
/run-chat.sh
141+
.ccache/
142+
143+
# IDE
144+
*.code-workspace
145+
.windsurf/

common/chat-parser.cpp

Lines changed: 125 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,12 @@
33
#include "log.h"
44
#include "regex-partial.h"
55

6+
#include <algorithm>
7+
#include <cctype>
68
#include <optional>
79
#include <stdexcept>
810
#include <string>
11+
#include <string_view>
912
#include <vector>
1013

1114
using json = nlohmann::ordered_json;
@@ -137,6 +140,27 @@ void common_chat_msg_parser::consume_literal(const std::string & literal) {
137140
}
138141

139142
bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think, const std::string & end_think) {
143+
std::string pending_reasoning_prefix;
144+
145+
if (syntax_.reasoning_format == COMMON_REASONING_FORMAT_NONE) {
146+
return false;
147+
}
148+
149+
auto set_reasoning_prefix = [&](size_t prefix_pos) {
150+
if (!syntax_.thinking_forced_open || syntax_.reasoning_in_content) {
151+
return;
152+
}
153+
if (prefix_pos + start_think.size() > input_.size()) {
154+
pending_reasoning_prefix.clear();
155+
return;
156+
}
157+
// Capture the exact literal that opened the reasoning section so we can
158+
// surface it back to callers. This ensures formats that force the
159+
// reasoning tag open (e.g. DeepSeek R1) retain their original prefix
160+
// instead of dropping it during parsing.
161+
pending_reasoning_prefix = input_.substr(prefix_pos, start_think.size());
162+
};
163+
140164
auto handle_reasoning = [&](const std::string & reasoning, bool closed) {
141165
auto stripped_reasoning = string_strip(reasoning);
142166
if (stripped_reasoning.empty()) {
@@ -149,28 +173,116 @@ bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think
149173
add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "</think>" : end_think);
150174
}
151175
} else {
176+
if (!pending_reasoning_prefix.empty()) {
177+
add_reasoning_content(pending_reasoning_prefix);
178+
pending_reasoning_prefix.clear();
179+
}
152180
add_reasoning_content(stripped_reasoning);
153181
}
154182
};
155-
if (syntax_.reasoning_format != COMMON_REASONING_FORMAT_NONE) {
156-
if (syntax_.thinking_forced_open || try_consume_literal(start_think)) {
157-
if (auto res = try_find_literal(end_think)) {
158-
handle_reasoning(res->prelude, /* closed */ true);
159-
consume_spaces();
160-
return true;
161-
}
162-
auto rest = consume_rest();
183+
184+
const size_t saved_pos = pos_;
185+
const size_t saved_content_size = result_.content.size();
186+
const size_t saved_reasoning_size = result_.reasoning_content.size();
187+
188+
auto restore_state = [&]() {
189+
move_to(saved_pos);
190+
result_.content.resize(saved_content_size);
191+
result_.reasoning_content.resize(saved_reasoning_size);
192+
};
193+
194+
// Allow leading whitespace to be preserved as content when reasoning is present at the start
195+
size_t cursor = pos_;
196+
size_t whitespace_end = cursor;
197+
while (whitespace_end < input_.size() && std::isspace(static_cast<unsigned char>(input_[whitespace_end]))) {
198+
++whitespace_end;
199+
}
200+
201+
if (whitespace_end >= input_.size()) {
202+
restore_state();
203+
if (syntax_.thinking_forced_open) {
204+
auto rest = input_.substr(saved_pos);
163205
if (!rest.empty()) {
164206
handle_reasoning(rest, /* closed */ !is_partial());
165207
}
166-
// Allow unclosed thinking tags, for now (https://github.com/ggml-org/llama.cpp/issues/13812, https://github.com/ggml-org/llama.cpp/issues/13877)
167-
// if (!syntax_.thinking_forced_open) {
168-
// throw common_chat_msg_partial_exception(end_think);
169-
// }
208+
move_to(input_.size());
170209
return true;
171210
}
211+
return false;
212+
}
213+
214+
cursor = whitespace_end;
215+
const size_t remaining = input_.size() - cursor;
216+
const size_t start_prefix = std::min(start_think.size(), remaining);
217+
const bool has_start_tag = input_.compare(cursor, start_prefix, start_think, 0, start_prefix) == 0;
218+
219+
if (has_start_tag && start_prefix < start_think.size()) {
220+
move_to(input_.size());
221+
return true;
222+
}
223+
224+
if (has_start_tag) {
225+
if (whitespace_end > pos_) {
226+
add_content(input_.substr(pos_, whitespace_end - pos_));
227+
}
228+
set_reasoning_prefix(cursor);
229+
cursor += start_think.size();
230+
} else if (syntax_.thinking_forced_open) {
231+
cursor = whitespace_end;
232+
} else {
233+
restore_state();
234+
return false;
235+
}
236+
while (true) {
237+
if (cursor >= input_.size()) {
238+
move_to(input_.size());
239+
return true;
240+
}
241+
242+
size_t end_pos = input_.find(end_think, cursor);
243+
if (end_pos == std::string::npos) {
244+
std::string_view remaining_view(input_.data() + cursor, input_.size() - cursor);
245+
size_t partial_off = string_find_partial_stop(remaining_view, end_think);
246+
size_t reasoning_end = partial_off == std::string::npos ? input_.size() : cursor + partial_off;
247+
if (reasoning_end > cursor) {
248+
handle_reasoning(input_.substr(cursor, reasoning_end - cursor), /* closed */ partial_off == std::string::npos && !is_partial());
249+
}
250+
move_to(input_.size());
251+
return true;
252+
}
253+
254+
if (end_pos > cursor) {
255+
handle_reasoning(input_.substr(cursor, end_pos - cursor), /* closed */ true);
256+
} else {
257+
handle_reasoning("", /* closed */ true);
258+
}
259+
260+
cursor = end_pos + end_think.size();
261+
262+
while (cursor < input_.size() && std::isspace(static_cast<unsigned char>(input_[cursor]))) {
263+
++cursor;
264+
}
265+
266+
const size_t next_remaining = input_.size() - cursor;
267+
if (next_remaining == 0) {
268+
move_to(cursor);
269+
return true;
270+
}
271+
272+
const size_t next_prefix = std::min(start_think.size(), next_remaining);
273+
if (input_.compare(cursor, next_prefix, start_think, 0, next_prefix) == 0) {
274+
if (next_prefix < start_think.size()) {
275+
move_to(input_.size());
276+
return true;
277+
}
278+
set_reasoning_prefix(cursor);
279+
cursor += start_think.size();
280+
continue;
281+
}
282+
283+
move_to(cursor);
284+
return true;
172285
}
173-
return false;
174286
}
175287

176288
std::string common_chat_msg_parser::consume_rest() {

common/chat.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1207,6 +1207,8 @@ static common_chat_params common_chat_params_init_llama_3_x(const common_chat_te
12071207
return data;
12081208
}
12091209
static void common_chat_parse_llama_3_1(common_chat_msg_parser & builder, bool with_builtin_tools = false) {
1210+
builder.try_parse_reasoning("<think>", "</think>");
1211+
12101212
if (!builder.syntax().parse_tool_calls) {
12111213
builder.add_content(builder.consume_rest());
12121214
return;
@@ -2411,6 +2413,7 @@ common_chat_params common_chat_templates_apply(
24112413
}
24122414

24132415
static void common_chat_parse_content_only(common_chat_msg_parser & builder) {
2416+
builder.try_parse_reasoning("<think>", "</think>");
24142417
builder.add_content(builder.consume_rest());
24152418
}
24162419

common/common.cpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,20 @@ int32_t cpu_get_num_math() {
200200
return cpu_get_num_physical_cores();
201201
}
202202

203+
common_webui common_webui_from_name(const std::string& format) {
204+
if (format == "none") {
205+
return COMMON_WEBUI_NONE;
206+
}
207+
else if (format == "auto") {
208+
return COMMON_WEBUI_AUTO;
209+
}
210+
else if (format == "llamacpp") {
211+
return COMMON_WEBUI_LLAMACPP;
212+
}
213+
else {
214+
return COMMON_WEBUI_AUTO;
215+
}
216+
}
203217

204218
static std::string read_file(const std::string& fname) {
205219
std::ifstream file(fname);
@@ -1417,6 +1431,11 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
14171431
params.public_path = argv[i];
14181432
return true;
14191433
}
1434+
if (arg == "--webui") {
1435+
CHECK_ARG
1436+
params.webui = common_webui_from_name(std::string(argv[i]));
1437+
return true;
1438+
}
14201439
if (arg == "--api-key") {
14211440
CHECK_ARG
14221441
params.api_keys.push_back(argv[i]);
@@ -1888,6 +1907,7 @@ void gpt_params_print_usage(int /*argc*/, char ** argv, const gpt_params & param
18881907
"controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:\n"
18891908
"- none: leaves thoughts unparsed in `message.content`\n"
18901909
"- deepseek: puts thoughts in `message.reasoning_content` (except in streaming mode, which behaves as `none`)\n"
1910+
"- deepseek-legacy: keeps `<think>` tags in `message.content` while also populating `message.reasoning_content`\n"
18911911
"(default: none)", });
18921912
options.push_back({ "main", " --chat-template-kwargs JSON", "sets additional params for the json template parser"});
18931913
options.push_back({ "main", " --reasoning-budget N", "controls the amount of thinking allowed; currently only one of: -1 for unrestricted thinking budget, or 0 to disable thinking (default: -1)" });
@@ -2046,6 +2066,12 @@ void gpt_params_print_usage(int /*argc*/, char ** argv, const gpt_params & param
20462066
options.push_back({ "server", " --port PORT", "port to listen (default: %d)", params.port });
20472067
options.push_back({ "server", " --path PATH", "path to serve static files from (default: %s)", params.public_path.c_str() });
20482068
options.push_back({ "server", " --embedding(s)", "restrict to only support embedding use case; use only with dedicated embedding models (default: %s)", params.embedding ? "enabled" : "disabled" });
2069+
options.push_back({ "server", " --webui NAME",
2070+
"controls which webui to server:\n"
2071+
"- none: disable webui\n"
2072+
"- auto: default webui \n"
2073+
"- llamacpp: llamacpp webui \n"
2074+
"(default: auto)", });
20492075
options.push_back({ "server", " --api-key KEY", "API key to use for authentication (default: none)" });
20502076
options.push_back({ "server", " --api-key-file FNAME", "path to file containing API keys (default: none)" });
20512077
options.push_back({ "server", " --ssl-key-file FNAME", "path to file a PEM-encoded SSL private key" });

common/common.h

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,14 @@ enum common_reasoning_format {
109109
COMMON_REASONING_FORMAT_DEEPSEEK, // Extract thinking tag contents and return as `message.reasoning_content`, including in streaming deltas.
110110
};
111111

112+
enum common_webui {
113+
COMMON_WEBUI_NONE,
114+
COMMON_WEBUI_AUTO,
115+
COMMON_WEBUI_LLAMACPP,
116+
};
117+
118+
common_webui common_webui_from_name(const std::string& format);
119+
112120
struct model_paths {
113121
std::string path = ""; // model local path // NOLINT
114122
std::string url = ""; // model url to download // NOLINT
@@ -288,7 +296,7 @@ struct gpt_params {
288296
bool use_jinja = false; // NOLINT
289297
std::string system_prompt = "";
290298
bool enable_chat_template = true;
291-
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_AUTO;
299+
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
292300
int reasoning_budget = -1;
293301
bool prefill_assistant = true;
294302

@@ -300,8 +308,8 @@ struct gpt_params {
300308
std::map<std::string, std::string> default_template_kwargs;
301309

302310
// "advanced" endpoints are disabled by default for better security
303-
bool webui = true;
304-
bool endpoint_slots = false;
311+
common_webui webui = COMMON_WEBUI_AUTO;
312+
bool endpoint_slots = true;
305313
bool endpoint_props = false; // only control POST requests, not GET
306314
bool endpoint_metrics = false;
307315

examples/server/CMakeLists.txt

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ set(TARGET_SRCS
1717
)
1818
set(PUBLIC_ASSETS
1919
index.html.gz
20-
loading.html
20+
2121
)
2222

2323
foreach(asset ${PUBLIC_ASSETS})
@@ -29,10 +29,32 @@ foreach(asset ${PUBLIC_ASSETS})
2929
OUTPUT "${output}"
3030
COMMAND "${CMAKE_COMMAND}" "-DINPUT=${input}" "-DOUTPUT=${output}" -P "${PROJECT_SOURCE_DIR}/scripts/xxd.cmake"
3131
)
32+
message("TARGET_SRCS contains: ${input}")
3233
set_source_files_properties(${output} PROPERTIES GENERATED TRUE)
3334

3435
endforeach()
3536

37+
# include new llamacpp webui
38+
set(ALT_PUBLIC_ASSETS
39+
index_llamacpp.html.gz
40+
loading.html
41+
)
42+
43+
foreach(asset ${ALT_PUBLIC_ASSETS})
44+
set(input "${CMAKE_CURRENT_SOURCE_DIR}/public_llamacpp/${asset}")
45+
set(output "${CMAKE_CURRENT_BINARY_DIR}/${asset}.hpp")
46+
list(APPEND TARGET_SRCS ${output})
47+
add_custom_command(
48+
DEPENDS "${input}"
49+
OUTPUT "${output}"
50+
COMMAND "${CMAKE_COMMAND}" "-DINPUT=${input}" "-DOUTPUT=${output}" -P "${PROJECT_SOURCE_DIR}/scripts/xxd.cmake"
51+
)
52+
message("TARGET_SRCS contains: ${input}")
53+
set_source_files_properties(${output} PROPERTIES GENERATED TRUE)
54+
55+
endforeach()
56+
57+
3658
add_executable(${TARGET} ${TARGET_SRCS})
3759
install(TARGETS ${TARGET} RUNTIME)
3860
target_compile_definitions(${TARGET} PRIVATE

0 commit comments

Comments
 (0)