Support runner configuration #76

p1-0tr · 2025-06-11T13:20:56Z

We need to allow users to configure the model runtime. Whether to
control inference settings, or low-level llama.cpp specific settings.

In the interest of unblocking users quickly, this patch adds a very simple
mechanism to configure the runtime settings. A _configure endpoint is
added per-engine, and acceps POST requests to set context-size and raw
runtime CLI flags. Those settings will be applied to any run of a given
model, until unload is called for that model or model-runner is
terminated.

This is a temporary solution and therefore subject to change once a
design for specifying runtime settings is finalised.

pkg/inference/scheduling/loader.go

xenoscopic

LGTM overall, hard to fully assess since we think this is sort of temporary. I think my only comment of consequence would be to avoid baking in any APIs or code refactors that would be hard to undo if they don't serve the later design.

pkg/inference/scheduling/scheduler.go

pkg/inference/backends/llamacpp/llamacpp.go

pkg/inference/scheduling/scheduler.go

fiam

I've tested this locally and it works perfectly, great job.

I'm only a bit concerned about the UX when running multiple clients against the same model runner. It seems to me that changing the configuration of an already loaded model will silently ignore the change until the model is restarted. Is there somehting we could do to make this more apparent?

p1-0tr · 2025-06-12T10:41:09Z

I'm only a bit concerned about the UX when running multiple clients against the same model runner. It seems to me that changing the configuration of an already loaded model will silently ignore the change until the model is restarted. Is there somehting we could do to make this more apparent?

That's a fair point. We are shooting for a minimal solution, until we can get a proper design for handling configs dialed. Keeping that in mind, we could, for example return a 403 in case someone tries to configure an already running model.

pkg/inference/scheduling/loader.go

+		return errRunnerAlreadyActive
+	}
+
+	l.log.Infof("Configuring %s runner for %s", backendName, model)


To fix the issue, the user-provided model value should be sanitized before being logged. Since the log entries are plain text, we can remove newline characters (\n and \r) from the model string using strings.ReplaceAll. This ensures that malicious input cannot introduce new log entries or otherwise manipulate the log format.

The fix involves modifying the setRunnerConfig method in loader.go to sanitize the model parameter before logging it. Specifically:

Use strings.ReplaceAll to remove \n and \r characters from the model string.

Log the sanitized version of the model string.

fiam · 2025-06-12T11:09:41Z

I'm only a bit concerned about the UX when running multiple clients against the same model runner. It seems to me that changing the configuration of an already loaded model will silently ignore the change until the model is restarted. Is there somehting we could do to make this more apparent?

That's a fair point. We are shooting for a minimal solution, until we can get a proper design for handling configs dialed. Keeping that in mind, we could, for example return a 403 in case someone tries to configure an already running model.

Should we include that change in this PR? It seems to me it would follow the principle of least surprise by failing loudly.

p1-0tr · 2025-06-12T11:12:14Z

Should we include that change in this PR? It seems to me it would follow the principle of least surprise by failing loudly.

Yes :) I pushed it a couple of minutes ago :)

fiam · 2025-06-12T11:27:36Z

pkg/inference/scheduling/scheduler.go

+
+	if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil {
+		if err == errRunnerAlreadyActive {
+			w.WriteHeader(http.StatusForbidden)


tiny nit: log error in case we have to debug?

fiam

LGTM

fiam · 2025-06-12T11:28:02Z

Should we include that change in this PR? It seems to me it would follow the principle of least surprise by failing loudly.

Yes :) I pushed it a couple of minutes ago :)

Excellent, thank you!

pkg/inference/scheduling/scheduler.go

+	runnerConfig.RawFlags = rawFlags
+
+	if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil {
+		s.log.Warnf("Failed to configure %s runner for %s: %s", backend.Name(), configureRequest.Model, err)


xenoscopic · 2025-06-12T20:39:42Z

pkg/inference/scheduling/scheduler.go

+	if err := s.loader.setRunnerConfig(r.Context(), backend.Name(), configureRequest.Model, inference.BackendModeCompletion, runnerConfig); err != nil {
+		s.log.Warnf("Failed to configure %s runner for %s: %s", backend.Name(), configureRequest.Model, err)
+		if errors.Is(err, errRunnerAlreadyActive) {
+			http.Error(w, err.Error(), http.StatusForbidden)


Suggested change

http.Error(w, err.Error(), http.StatusForbidden)

http.Error(w, err.Error(), http.StatusConflict)

Putting on my "fun at parties" hat, 409 might technically be more appropriate: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/409

We need to allow users to configure the model runtime. Whether to control inference settings, or low-level llama.cpp specific settings. In the interest of unblocking users quickly, this patch adds a very simple mechanism to configure the runtime settings. A `_configure` endpoint is added per-engine, and acceps POST requests to set context-size and raw runtime CLI flags. Those settings will be applied to any run of a given model, until unload is called for that model or model-runner is terminated. This is a temporary solution and therefore subject to change once a design for specifying runtime settings is finalised. Signed-off-by: Piotr Stankiewicz <[email protected]>

pkg/inference/scheduling/scheduler.go

Co-authored-by: Dorin-Andrei Geman <[email protected]>

* Support .upper() and .lower() string methods * Add syntax tests for upper and lower methods * Add llava-hf/llava-1.5-7b-hf to supported models

install-runner: Document both default ports for Moby and Cloud

p1-0tr requested review from doringeman and xenoscopic June 11, 2025 13:20

p1-0tr force-pushed the ps-runner-configuration branch 3 times, most recently from db7c07a to 5aa6d7e Compare June 11, 2025 14:04

github-advanced-security bot found potential problems Jun 11, 2025

View reviewed changes

pkg/inference/scheduling/loader.go Fixed Show fixed Hide fixed

p1-0tr force-pushed the ps-runner-configuration branch 2 times, most recently from 5bcf3e5 to 6cf3f98 Compare June 11, 2025 14:10

doringeman mentioned this pull request Jun 11, 2025

Configure inference backend via compose up docker/model-cli#79

Merged

xenoscopic approved these changes Jun 11, 2025

View reviewed changes

pkg/inference/scheduling/scheduler.go Outdated Show resolved Hide resolved

pkg/inference/backends/llamacpp/llamacpp.go Outdated Show resolved Hide resolved

pkg/inference/scheduling/scheduler.go Outdated Show resolved Hide resolved

p1-0tr force-pushed the ps-runner-configuration branch from 6cf3f98 to 5a4e9a7 Compare June 12, 2025 10:24

p1-0tr changed the title ~~WiP: Support runner configuration~~ Support runner configuration Jun 12, 2025

p1-0tr marked this pull request as ready for review June 12, 2025 10:25

p1-0tr requested a review from xenoscopic June 12, 2025 10:27

fiam reviewed Jun 12, 2025

View reviewed changes

p1-0tr force-pushed the ps-runner-configuration branch from 5a4e9a7 to 5270e2d Compare June 12, 2025 10:50

github-advanced-security bot found potential problems Jun 12, 2025

View reviewed changes

fiam reviewed Jun 12, 2025

View reviewed changes

fiam approved these changes Jun 12, 2025

View reviewed changes

p1-0tr force-pushed the ps-runner-configuration branch from 5270e2d to bddf60d Compare June 12, 2025 11:34

github-advanced-security bot found potential problems Jun 12, 2025

View reviewed changes

pkg/inference/scheduling/scheduler.go Fixed Show fixed Hide fixed

doringeman reviewed Jun 12, 2025

View reviewed changes

pkg/inference/scheduling/scheduler.go Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Jun 12, 2025

View reviewed changes

xenoscopic reviewed Jun 12, 2025

View reviewed changes

p1-0tr force-pushed the ps-runner-configuration branch from c320860 to 0e365a1 Compare June 13, 2025 06:50

doringeman reviewed Jun 13, 2025

View reviewed changes

pkg/inference/scheduling/scheduler.go Outdated Show resolved Hide resolved

Update pkg/inference/scheduling/scheduler.go

3dafee3

Co-authored-by: Dorin-Andrei Geman <[email protected]>

p1-0tr merged commit 6b8c3b8 into main Jun 13, 2025
3 of 4 checks passed

p1-0tr deleted the ps-runner-configuration branch June 13, 2025 08:36

doringeman added a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025

Merge pull request docker#76 from doringeman/ports-docs

369e604

install-runner: Document both default ports for Moby and Cloud

@@ -517,3 +517,5 @@
-            	l.log.Infof("Configuring %s runner for %s", backendName, model)
+            	sanitizedModel := strings.ReplaceAll(model, "\n", "")
+            	sanitizedModel = strings.ReplaceAll(sanitizedModel, "\r", "")
+            	l.log.Infof("Configuring %s runner for %s", backendName, sanitizedModel)
             	l.runnerConfigs[runnerId] = runnerConfig

	http.Error(w, err.Error(), http.StatusForbidden)
	http.Error(w, err.Error(), http.StatusConflict)

Support runner configuration #76

Support runner configuration #76

Uh oh!

Conversation

p1-0tr commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xenoscopic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fiam left a comment

Choose a reason for hiding this comment

Uh oh!

p1-0tr commented Jun 12, 2025

Uh oh!

Check failure

Uh oh!

Copilot Autofix

fiam commented Jun 12, 2025

Uh oh!

p1-0tr commented Jun 12, 2025

Uh oh!

fiam Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

fiam left a comment

Choose a reason for hiding this comment

Uh oh!

fiam commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Copilot Autofix

xenoscopic Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

p1-0tr commented Jun 11, 2025 •

edited

Loading