Skip to content

Conversation

@p1-0tr
Copy link

@p1-0tr p1-0tr commented Jun 11, 2025

We need to allow users to configure the model runtime. Whether to
control inference settings, or low-level llama.cpp specific settings.

In the interest of unblocking users quickly, this patch adds a very simple
mechanism to configure the runtime settings. A _configure endpoint is
added per-engine, and acceps POST requests to set context-size and raw
runtime CLI flags. Those settings will be applied to any run of a given
model, until unload is called for that model or model-runner is
terminated.

This is a temporary solution and therefore subject to change once a
design for specifying runtime settings is finalised.

@p1-0tr p1-0tr requested review from doringeman and xenoscopic June 11, 2025 13:20
@p1-0tr p1-0tr force-pushed the ps-runner-configuration branch 3 times, most recently from db7c07a to 5aa6d7e Compare June 11, 2025 14:04
@p1-0tr p1-0tr force-pushed the ps-runner-configuration branch 2 times, most recently from 5bcf3e5 to 6cf3f98 Compare June 11, 2025 14:10
Copy link
Contributor

@xenoscopic xenoscopic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, hard to fully assess since we think this is sort of temporary. I think my only comment of consequence would be to avoid baking in any APIs or code refactors that would be hard to undo if they don't serve the later design.

@p1-0tr p1-0tr force-pushed the ps-runner-configuration branch from 6cf3f98 to 5a4e9a7 Compare June 12, 2025 10:24
@p1-0tr p1-0tr changed the title WiP: Support runner configuration Support runner configuration Jun 12, 2025
@p1-0tr p1-0tr marked this pull request as ready for review June 12, 2025 10:25
@p1-0tr p1-0tr requested a review from xenoscopic June 12, 2025 10:27
Copy link
Contributor

@fiam fiam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally and it works perfectly, great job.

I'm only a bit concerned about the UX when running multiple clients against the same model runner. It seems to me that changing the configuration of an already loaded model will silently ignore the change until the model is restarted. Is there somehting we could do to make this more apparent?

@p1-0tr
Copy link
Author

p1-0tr commented Jun 12, 2025

I'm only a bit concerned about the UX when running multiple clients against the same model runner. It seems to me that changing the configuration of an already loaded model will silently ignore the change until the model is restarted. Is there somehting we could do to make this more apparent?

That's a fair point. We are shooting for a minimal solution, until we can get a proper design for handling configs dialed. Keeping that in mind, we could, for example return a 403 in case someone tries to configure an already running model.

@p1-0tr p1-0tr force-pushed the ps-runner-configuration branch from 5a4e9a7 to 5270e2d Compare June 12, 2025 10:50
return errRunnerAlreadyActive
}

l.log.Infof("Configuring %s runner for %s", backendName, model)

Check failure

Code scanning / CodeQL

Log entries created from user input High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 7 months ago

To fix the issue, the user-provided model value should be sanitized before being logged. Since the log entries are plain text, we can remove newline characters (\n and \r) from the model string using strings.ReplaceAll. This ensures that malicious input cannot introduce new log entries or otherwise manipulate the log format.

The fix involves modifying the setRunnerConfig method in loader.go to sanitize the model parameter before logging it. Specifically:

  1. Use strings.ReplaceAll to remove \n and \r characters from the model string.
  2. Log the sanitized version of the model string.

Suggested changeset 1
pkg/inference/scheduling/loader.go

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/pkg/inference/scheduling/loader.go b/pkg/inference/scheduling/loader.go
--- a/pkg/inference/scheduling/loader.go
+++ b/pkg/inference/scheduling/loader.go
@@ -517,3 +517,5 @@
 
-	l.log.Infof("Configuring %s runner for %s", backendName, model)
+	sanitizedModel := strings.ReplaceAll(model, "\n", "")
+	sanitizedModel = strings.ReplaceAll(sanitizedModel, "\r", "")
+	l.log.Infof("Configuring %s runner for %s", backendName, sanitizedModel)
 	l.runnerConfigs[runnerId] = runnerConfig
EOF
@@ -517,3 +517,5 @@

l.log.Infof("Configuring %s runner for %s", backendName, model)
sanitizedModel := strings.ReplaceAll(model, "\n", "")
sanitizedModel = strings.ReplaceAll(sanitizedModel, "\r", "")
l.log.Infof("Configuring %s runner for %s", backendName, sanitizedModel)
l.runnerConfigs[runnerId] = runnerConfig
Copilot is powered by AI and may make mistakes. Always verify output.
@fiam
Copy link
Contributor

fiam commented Jun 12, 2025

I'm only a bit concerned about the UX when running multiple clients against the same model runner. It seems to me that changing the configuration of an already loaded model will silently ignore the change until the model is restarted. Is there somehting we could do to make this more apparent?

That's a fair point. We are shooting for a minimal solution, until we can get a proper design for handling configs dialed. Keeping that in mind, we could, for example return a 403 in case someone tries to configure an already running model.

Should we include that change in this PR? It seems to me it would follow the principle of least surprise by failing loudly.

@p1-0tr
Copy link
Author

p1-0tr commented Jun 12, 2025

Should we include that change in this PR? It seems to me it would follow the principle of least surprise by failing loudly.

Yes :) I pushed it a couple of minutes ago :)


if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil {
if err == errRunnerAlreadyActive {
w.WriteHeader(http.StatusForbidden)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: log error in case we have to debug?

Copy link
Contributor

@fiam fiam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fiam
Copy link
Contributor

fiam commented Jun 12, 2025

Should we include that change in this PR? It seems to me it would follow the principle of least surprise by failing loudly.

Yes :) I pushed it a couple of minutes ago :)

Excellent, thank you!

@p1-0tr p1-0tr force-pushed the ps-runner-configuration branch from 5270e2d to bddf60d Compare June 12, 2025 11:34
runnerConfig.RawFlags = rawFlags

if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil {
s.log.Warnf("Failed to configure %s runner for %s: %s", backend.Name(), configureRequest.Model, err)

Check failure

Code scanning / CodeQL

Log entries created from user input High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 7 months ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil {
s.log.Warnf("Failed to configure %s runner for %s: %s", backend.Name(), configureRequest.Model, err)
if errors.Is(err, errRunnerAlreadyActive) {
http.Error(w, err.Error(), http.StatusForbidden)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
http.Error(w, err.Error(), http.StatusForbidden)
http.Error(w, err.Error(), http.StatusConflict)

Putting on my "fun at parties" hat, 409 might technically be more appropriate: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/409

We need to allow users to configure the model runtime. Whether to
control inference settings, or low-level llama.cpp specific settings.

In the interest of unblocking users quickly, this patch adds a very simple
mechanism to configure the runtime settings. A `_configure` endpoint is
added per-engine, and acceps POST requests to set context-size and raw
runtime CLI flags. Those settings will be applied to any run of a given
model, until unload is called for that model or model-runner is
terminated.

This is a temporary solution and therefore subject to change once a
design for specifying runtime settings is finalised.

Signed-off-by: Piotr Stankiewicz <[email protected]>
@p1-0tr p1-0tr force-pushed the ps-runner-configuration branch from c320860 to 0e365a1 Compare June 13, 2025 06:50
@p1-0tr p1-0tr merged commit 6b8c3b8 into main Jun 13, 2025
3 of 4 checks passed
@p1-0tr p1-0tr deleted the ps-runner-configuration branch June 13, 2025 08:36
ericcurtin referenced this pull request in ericcurtin/model-runner Sep 21, 2025
* Support .upper() and .lower() string methods

* Add syntax tests for upper and lower methods

* Add llava-hf/llava-1.5-7b-hf to supported models
doringeman added a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025
install-runner: Document both default ports for Moby and Cloud
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants