webui: auto-refresh /props on inference start to resync model metadata #16784

ServeurpersoCom · 2025-10-26T15:01:49Z

Add no-cache headers to /props and /slots
Throttle slot checks to 30s
Prevent concurrent fetches with promise guard
Trigger refresh from chat streaming for legacy and ModelSelector
Show dynamic serverWarning when using cached data
Updated assistant message bubbles to show each message's stored model when available,
falling back to the current server model only when the per-message value is missing
When the model selector is disabled, now fetches /props and prioritizes that model name
over chunk metadata, then persists it with the streamed message so legacy mode properly
reflects the backend configuration

Cmdline used on legacy (Raspberry Pi 5) :

/root/llama.cpp/build/bin/llama-server \
 -m /root/ia/models/mradermacher/OLMoE-1B-7B-0125-Instruct-i1-GGUF/OLMoE-1B-7B-0125-Instruct.i1-Q6_K.gguf \
 -ctk q8_0 -ctv q8_0 -fa on \
 --jinja --ctx-size 8192 --mlock --port 8081

/root/llama.cpp/build/bin/llama-server \
 -m /root/ia/models/mradermacher/OLMoE-1B-7B-0125-SFT-i1-GGUF/OLMoE-1B-7B-0125-SFT.i1-Q6_K.gguf \
 -ctk q8_0 -ctv q8_0 -fa on \
 --jinja --ctx-size 8192 --mlock --port 8081

/root/llama.cpp/build/bin/llama-server \
 -m /root/ia/models/mradermacher/Qwen3-30B-A3B-Instruct-2507-i1-GGUF/Qwen3-30B-A3B-Instruct-2507.i1-Q4_K_M.gguf \
 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0 \
 -ctk q8_0 -ctv q8_0 -fa on \
 --jinja --ctx-size 4096 --port 8081

Fixes #16771

EDIT : I've recorded a video that specifically targets the original issue.
Testing video, Raspberry Pi 5 + master branch + this PR :

CmdLineSwap-RaspberryPi5.mp4

And another one to show there's no regression when the model selector is enabled,
also demonstrating the multimodal function updates:

NonReg-FullSetup.mp4

ServeurpersoCom · 2025-10-27T05:35:24Z

The final fix is done, now that the /props refresh is unified, the legacy mode (without the model selector) behaves correctly.
The message history now properly reflects the backend model used.

ServeurpersoCom · 2025-10-27T08:06:02Z

LEGACY MODE (modelSelectorEnabled = false)

Do not send "model" in requests
Ignore "model" from server responses
Read /props.model as the single source of truth
Refresh /props periodically or on stream start
UI and history reflect the backend model

MODEL SELECTOR MODE (modelSelectorEnabled = true)

Always send "model" in requests
Use per-message "model" from chunks or user selection
/props used only for backend capabilities (context, multimodal, slots)
UI reflects the user-selected model

Summary:
Legacy -> backend decides the model via /props
ModelSelector -> frontend decides the model via request field

allozaur

Yes, @ServeurpersoCom, that is much better than the initial idea for this PR 😄

tools/server/webui/src/lib/stores/chat.svelte.ts

tools/server/webui/src/lib/stores/server.svelte.ts

ServeurpersoCom · 2025-10-31T17:32:41Z

Yes, @ServeurpersoCom, that is much better than the initial idea for this PR 😄

Good thing the merge stashes everything, those intermediate commits were an emotional rollercoaster 😆 Glad we made it out alive 😆

allozaur · 2025-10-31T17:35:35Z

those intermediate commits were an emotional rollercoaster 😆

🤣

ServeurpersoCom · 2025-11-01T00:00:48Z

rebased to retry CI

- Add no-cache headers to /props and /slots - Throttle slot checks to 30s - Prevent concurrent fetches with promise guard - Trigger refresh from chat streaming for legacy and ModelSelector - Show dynamic serverWarning when using cached data

…refresh Updated assistant message bubbles to show each message's stored model when available, falling back to the current server model only when the per-message value is missing When the model selector is disabled, now fetches /props and prioritizes that model name over chunk metadata, then persists it with the streamed message so legacy mode properly reflects the backend configuration

ServeurpersoCom · 2025-11-01T18:01:10Z

Rebased + static build

ServeurpersoCom requested a review from allozaur as a code owner October 26, 2025 15:01

github-actions bot added examples server labels Oct 26, 2025

ServeurpersoCom force-pushed the webui-props-auto-refresh branch from 2e9e0eb to 9397c4d Compare October 26, 2025 15:06

ServeurpersoCom marked this pull request as draft October 27, 2025 05:03

ServeurpersoCom marked this pull request as ready for review October 27, 2025 07:08

ServeurpersoCom closed this Oct 31, 2025

ServeurpersoCom force-pushed the webui-props-auto-refresh branch from 98b76cc to 8da3c0e Compare October 31, 2025 12:23

ServeurpersoCom reopened this Oct 31, 2025

DajanaV mentioned this pull request Oct 31, 2025

UPSTREAM PR #16784: webui: auto-refresh /props on inference start to resync model metadata auroralabs-loci/llama.cpp#22

Closed

allozaur approved these changes Oct 31, 2025

View reviewed changes

tools/server/webui/src/lib/stores/chat.svelte.ts Outdated Show resolved Hide resolved

tools/server/webui/src/lib/stores/chat.svelte.ts Outdated Show resolved Hide resolved

tools/server/webui/src/lib/stores/server.svelte.ts Outdated Show resolved Hide resolved

ServeurpersoCom force-pushed the webui-props-auto-refresh branch from ab8bbf0 to 1ee8d6b Compare October 31, 2025 23:57

ServeurpersoCom added 5 commits November 1, 2025 18:53

fix: detect first valid SSE chunk and refresh server props once

a2f6661

fix: removed the slots availability throttle constant and state

dbae892

webui: purge ai-generated cruft

b9747cc

ServeurpersoCom force-pushed the webui-props-auto-refresh branch from 1ee8d6b to b9747cc Compare November 1, 2025 17:56

chore: update webui static build

d2af7f0

allozaur merged commit 2f68ce7 into ggml-org:master Nov 1, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

webui: auto-refresh /props on inference start to resync model metadata #16784

webui: auto-refresh /props on inference start to resync model metadata #16784

ServeurpersoCom commented Oct 26, 2025 •

edited by allozaur

Loading

Uh oh!

ServeurpersoCom commented Oct 27, 2025 •

edited

Loading

Uh oh!

ServeurpersoCom commented Oct 27, 2025

Uh oh!

allozaur left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 31, 2025 •

edited

Loading

Uh oh!

allozaur commented Oct 31, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

webui: auto-refresh /props on inference start to resync model metadata #16784

webui: auto-refresh /props on inference start to resync model metadata #16784

Conversation

ServeurpersoCom commented Oct 26, 2025 • edited by allozaur Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 27, 2025

Uh oh!

allozaur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Oct 31, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ServeurpersoCom commented Oct 26, 2025 •

edited by allozaur

Loading

ServeurpersoCom commented Oct 27, 2025 •

edited

Loading

ServeurpersoCom commented Oct 31, 2025 •

edited

Loading