Skip to content

Conversation

@ServeurpersoCom
Copy link
Collaborator

@ServeurpersoCom ServeurpersoCom commented Oct 26, 2025

  • Add no-cache headers to /props and /slots

  • Throttle slot checks to 30s

  • Prevent concurrent fetches with promise guard

  • Trigger refresh from chat streaming for legacy and ModelSelector

  • Show dynamic serverWarning when using cached data

  • Updated assistant message bubbles to show each message's stored model when available,
    falling back to the current server model only when the per-message value is missing

  • When the model selector is disabled, now fetches /props and prioritizes that model name
    over chunk metadata, then persists it with the streamed message so legacy mode properly
    reflects the backend configuration

Cmdline used on legacy (Raspberry Pi 5) :

/root/llama.cpp/build/bin/llama-server \
 -m /root/ia/models/mradermacher/OLMoE-1B-7B-0125-Instruct-i1-GGUF/OLMoE-1B-7B-0125-Instruct.i1-Q6_K.gguf \
 -ctk q8_0 -ctv q8_0 -fa on \
 --jinja --ctx-size 8192 --mlock --port 8081

/root/llama.cpp/build/bin/llama-server \
 -m /root/ia/models/mradermacher/OLMoE-1B-7B-0125-SFT-i1-GGUF/OLMoE-1B-7B-0125-SFT.i1-Q6_K.gguf \
 -ctk q8_0 -ctv q8_0 -fa on \
 --jinja --ctx-size 8192 --mlock --port 8081

/root/llama.cpp/build/bin/llama-server \
 -m /root/ia/models/mradermacher/Qwen3-30B-A3B-Instruct-2507-i1-GGUF/Qwen3-30B-A3B-Instruct-2507.i1-Q4_K_M.gguf \
 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0 \
 -ctk q8_0 -ctv q8_0 -fa on \
 --jinja --ctx-size 4096 --port 8081

Fixes #16771

EDIT : I've recorded a video that specifically targets the original issue.
Testing video, Raspberry Pi 5 + master branch + this PR :

CmdLineSwap-RaspberryPi5.mp4

And another one to show there's no regression when the model selector is enabled,
also demonstrating the multimodal function updates:

NonReg-FullSetup.mp4

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 27, 2025

The final fix is done, now that the /props refresh is unified, the legacy mode (without the model selector) behaves correctly.
The message history now properly reflects the backend model used.

@ServeurpersoCom ServeurpersoCom marked this pull request as ready for review October 27, 2025 07:08
@ServeurpersoCom
Copy link
Collaborator Author

LEGACY MODE (modelSelectorEnabled = false)

  • Do not send "model" in requests
  • Ignore "model" from server responses
  • Read /props.model as the single source of truth
  • Refresh /props periodically or on stream start
  • UI and history reflect the backend model

MODEL SELECTOR MODE (modelSelectorEnabled = true)

  • Always send "model" in requests
  • Use per-message "model" from chunks or user selection
  • /props used only for backend capabilities (context, multimodal, slots)
  • UI reflects the user-selected model

Summary:
Legacy -> backend decides the model via /props
ModelSelector -> frontend decides the model via request field

Copy link
Collaborator

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @ServeurpersoCom, that is much better than the initial idea for this PR 😄

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 31, 2025

Yes, @ServeurpersoCom, that is much better than the initial idea for this PR 😄

Good thing the merge stashes everything, those intermediate commits were an emotional rollercoaster 😆 Glad we made it out alive 😆

@allozaur
Copy link
Collaborator

those intermediate commits were an emotional rollercoaster 😆

🤣

@ServeurpersoCom ServeurpersoCom force-pushed the webui-props-auto-refresh branch from ab8bbf0 to 1ee8d6b Compare October 31, 2025 23:57
@ServeurpersoCom
Copy link
Collaborator Author

rebased to retry CI

- Add no-cache headers to /props and /slots
- Throttle slot checks to 30s
- Prevent concurrent fetches with promise guard
- Trigger refresh from chat streaming for legacy and ModelSelector
- Show dynamic serverWarning when using cached data
…refresh

Updated assistant message bubbles to show each message's stored model when available,
falling back to the current server model only when the per-message value is missing

When the model selector is disabled, now fetches /props and prioritizes that model name
over chunk metadata, then persists it with the streamed message so legacy mode properly
reflects the backend configuration
@ServeurpersoCom ServeurpersoCom force-pushed the webui-props-auto-refresh branch from 1ee8d6b to b9747cc Compare November 1, 2025 17:56
@ServeurpersoCom
Copy link
Collaborator Author

Rebased + static build

@allozaur allozaur merged commit 2f68ce7 into ggml-org:master Nov 1, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: WebUI incorrectly displays local model names

2 participants