-
Notifications
You must be signed in to change notification settings - Fork 79
Support runner configuration #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
db7c07a to
5aa6d7e
Compare
5bcf3e5 to
6cf3f98
Compare
xenoscopic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, hard to fully assess since we think this is sort of temporary. I think my only comment of consequence would be to avoid baking in any APIs or code refactors that would be hard to undo if they don't serve the later design.
6cf3f98 to
5a4e9a7
Compare
fiam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this locally and it works perfectly, great job.
I'm only a bit concerned about the UX when running multiple clients against the same model runner. It seems to me that changing the configuration of an already loaded model will silently ignore the change until the model is restarted. Is there somehting we could do to make this more apparent?
That's a fair point. We are shooting for a minimal solution, until we can get a proper design for handling configs dialed. Keeping that in mind, we could, for example return a |
5a4e9a7 to
5270e2d
Compare
| return errRunnerAlreadyActive | ||
| } | ||
|
|
||
| l.log.Infof("Configuring %s runner for %s", backendName, model) |
Check failure
Code scanning / CodeQL
Log entries created from user input High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, the user-provided model value should be sanitized before being logged. Since the log entries are plain text, we can remove newline characters (\n and \r) from the model string using strings.ReplaceAll. This ensures that malicious input cannot introduce new log entries or otherwise manipulate the log format.
The fix involves modifying the setRunnerConfig method in loader.go to sanitize the model parameter before logging it. Specifically:
- Use
strings.ReplaceAllto remove\nand\rcharacters from themodelstring. - Log the sanitized version of the
modelstring.
-
Copy modified lines R518-R520
| @@ -517,3 +517,5 @@ | ||
|
|
||
| l.log.Infof("Configuring %s runner for %s", backendName, model) | ||
| sanitizedModel := strings.ReplaceAll(model, "\n", "") | ||
| sanitizedModel = strings.ReplaceAll(sanitizedModel, "\r", "") | ||
| l.log.Infof("Configuring %s runner for %s", backendName, sanitizedModel) | ||
| l.runnerConfigs[runnerId] = runnerConfig |
Should we include that change in this PR? It seems to me it would follow the principle of least surprise by failing loudly. |
Yes :) I pushed it a couple of minutes ago :) |
|
|
||
| if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil { | ||
| if err == errRunnerAlreadyActive { | ||
| w.WriteHeader(http.StatusForbidden) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit: log error in case we have to debug?
fiam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Excellent, thank you! |
5270e2d to
bddf60d
Compare
| runnerConfig.RawFlags = rawFlags | ||
|
|
||
| if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil { | ||
| s.log.Warnf("Failed to configure %s runner for %s: %s", backend.Name(), configureRequest.Model, err) |
Check failure
Code scanning / CodeQL
Log entries created from user input High
user-provided value
Copilot Autofix
AI 7 months ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil { | ||
| s.log.Warnf("Failed to configure %s runner for %s: %s", backend.Name(), configureRequest.Model, err) | ||
| if errors.Is(err, errRunnerAlreadyActive) { | ||
| http.Error(w, err.Error(), http.StatusForbidden) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| http.Error(w, err.Error(), http.StatusForbidden) | |
| http.Error(w, err.Error(), http.StatusConflict) |
Putting on my "fun at parties" hat, 409 might technically be more appropriate: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/409
We need to allow users to configure the model runtime. Whether to control inference settings, or low-level llama.cpp specific settings. In the interest of unblocking users quickly, this patch adds a very simple mechanism to configure the runtime settings. A `_configure` endpoint is added per-engine, and acceps POST requests to set context-size and raw runtime CLI flags. Those settings will be applied to any run of a given model, until unload is called for that model or model-runner is terminated. This is a temporary solution and therefore subject to change once a design for specifying runtime settings is finalised. Signed-off-by: Piotr Stankiewicz <[email protected]>
c320860 to
0e365a1
Compare
Co-authored-by: Dorin-Andrei Geman <[email protected]>
* Support .upper() and .lower() string methods * Add syntax tests for upper and lower methods * Add llava-hf/llava-1.5-7b-hf to supported models
install-runner: Document both default ports for Moby and Cloud
We need to allow users to configure the model runtime. Whether to
control inference settings, or low-level llama.cpp specific settings.
In the interest of unblocking users quickly, this patch adds a very simple
mechanism to configure the runtime settings. A
_configureendpoint isadded per-engine, and acceps POST requests to set context-size and raw
runtime CLI flags. Those settings will be applied to any run of a given
model, until unload is called for that model or model-runner is
terminated.
This is a temporary solution and therefore subject to change once a
design for specifying runtime settings is finalised.