-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add server-driven parameter defaults and syncing #16515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add server-driven parameter defaults and syncing #16515
Conversation
I'd use some feedback from you for this one, guys, before markign this as ready for review :) |
@ggerganov sorry for that! For some reason I hadn't succeeded with pushing my latest commits. You can re-check whenever you are ready now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very useful feature and from some testing is appears to work as expected.
@ngxson waiting also for your feedback and review before merging |
Note Just dropping a note about what this PR does and how it connects to the previous threads:
@mashdragon and @woof-dog let me know if that helps you and addresses your use cases! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very useful feature 🚀
I'm very glad to see this, I've been wanting this for a long time. Thank you! |
* origin/master: Add server-driven parameter defaults and syncing (ggml-org#16515) metal: optimise `GGML_OP_SUM` (ggml-org#16559) server : fix img token logs (ggml-org#16595) llama-quant: add support for mmproj (ggml-org#16592) CUDA: Changing the CUDA scheduling strategy to spin (ggml-org#16585) server : fix mtmd checkpoints (ggml-org#16591) metal : avoid using Metal's gpuAddress property (ggml-org#16576) vulkan: Add ACC_TYPE_VEC2 implementation (ggml-org#16203) CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (ggml-org#16577) vulkan: Support FA with K/V in F32 (ggml-org#16543) vulkan: Improve build time for MSVC (ggml-org#16545) CUDA: enable FA for FP32 KV cache (ggml-org#16546) CUDA: use fastdiv + ggml_cuda_mad for mmvf (ggml-org#16557) CUDA: add fp kernel for larger batch size MoE (ggml-org#16512) cuda : remove legacy copy-op pointer indirection code (ggml-org#16485) server : dynamic token limit for prompt cache (ggml-org#16560)
Close #16227
Demo
Running
demo.mp4
Three-Tier Priority System
Server defaults (
/props
) → User overrides (localStorage) → Webui fallbacksHow It Works
On Startup
/props
endpoint for server configurationWhen User Changes Settings
Visual Indicators
Reset to Default Button
/props
values/props
Key Features
✅ Server-first: Always uses
/props
values when available✅ Preserves customization: User overrides never auto-reset
✅ Transparent: Clear visual feedback on what's customized
✅ Precision handling: Normalizes floats to avoid false "custom" detection