Skip to content

Conversation

allozaur
Copy link
Collaborator

@allozaur allozaur commented Oct 11, 2025

Close #16227

Demo

Running

build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 --jinja --temp 1.0 --min-p 0.01 --top-p 0.95 --top-k 64
demo.mp4

Three-Tier Priority System

Server defaults (/props) → User overrides (localStorage) → Webui fallbacks

How It Works

On Startup

  1. Fetch /props endpoint for server configuration
  2. Load saved settings from localStorage
  3. Merge: Use server defaults unless user explicitly overrode them
  4. User overrides persist across sessions

When User Changes Settings

  1. Compare new value vs server default (with float precision handling)
  2. Differs? → Mark as user override + save to localStorage
  3. Matches? → Remove override flag (use server default)

Visual Indicators

  • Orange "Custom" badge: Value differs from server default
  • Reset icon: Click to restore server default
  • Updates in real-time as you type

Reset to Default Button

  • Resets ALL parameters to /props values
  • Falls back to webui defaults if not in /props
  • Clears all user overrides

Key Features

Server-first: Always uses /props values when available
Preserves customization: User overrides never auto-reset
Transparent: Clear visual feedback on what's customized
Precision handling: Normalizes floats to avoid false "custom" detection

@allozaur
Copy link
Collaborator Author

@ngxson @ggerganov

I'd use some feedback from you for this one, guys, before markign this as ready for review :)

@ggerganov
Copy link
Member

I don't think I have the latest version:

image

There is no reset button here, like in the video.

@allozaur
Copy link
Collaborator Author

allozaur commented Oct 13, 2025

I don't think I have the latest version:
image

There is no reset button here, like in the video.

@ggerganov sorry for that! For some reason I hadn't succeeded with pushing my latest commits. You can re-check whenever you are ready now.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very useful feature and from some testing is appears to work as expected.

@allozaur allozaur marked this pull request as ready for review October 14, 2025 22:25
@allozaur allozaur requested a review from ngxson October 14, 2025 22:25
@allozaur
Copy link
Collaborator Author

@ngxson waiting also for your feedback and review before merging

@allozaur
Copy link
Collaborator Author

Note

Just dropping a note about what this PR does and how it connects to the previous threads:

@mashdragon and @woof-dog let me know if that helps you and addresses your use cases!

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful feature 🚀

@ddh0
Copy link
Contributor

ddh0 commented Oct 15, 2025

I'm very glad to see this, I've been wanting this for a long time. Thank you!

@allozaur allozaur merged commit f9fb33f into ggml-org:master Oct 15, 2025
14 checks passed
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Oct 15, 2025
* origin/master:
Add server-driven parameter defaults and syncing (ggml-org#16515)
metal: optimise `GGML_OP_SUM` (ggml-org#16559)
server : fix img token logs (ggml-org#16595)
llama-quant: add support for mmproj (ggml-org#16592)
CUDA: Changing the CUDA scheduling strategy to spin (ggml-org#16585)
server : fix mtmd checkpoints (ggml-org#16591)
metal : avoid using Metal's gpuAddress property (ggml-org#16576)
vulkan: Add ACC_TYPE_VEC2 implementation (ggml-org#16203)
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (ggml-org#16577)
vulkan: Support FA with K/V in F32 (ggml-org#16543)
vulkan: Improve build time for MSVC (ggml-org#16545)
CUDA: enable FA for FP32 KV cache (ggml-org#16546)
CUDA: use fastdiv + ggml_cuda_mad for mmvf (ggml-org#16557)
CUDA: add fp kernel for larger batch size MoE (ggml-org#16512)
cuda : remove legacy copy-op pointer indirection code (ggml-org#16485)
server : dynamic token limit for prompt cache (ggml-org#16560)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: UI doesn't sync with llama-server command-line parameters

4 participants