Memory estimation for remote models by p1-0tr · Pull Request #125 · docker/model-runner

p1-0tr · 2025-07-30T11:13:24Z

resolves AIR-237

pkg/inference/models/manager.go

ekcasey · 2025-08-03T19:48:49Z

pkg/inference/models/manager.go

+		http.Error(w, "Could not calculate runtime memory requirement for model", http.StatusInternalServerError)
+		return


If something goes wrong with the calculation maybe we want to log an error and continue, so that an issue with memory estimation doesn't make the runner unusable?

ekcasey · 2025-08-03T19:57:55Z

pkg/inference/models/manager.go

+	if !haveMem {
+		m.log.Warnf("Runtime memory requirement for model %q exceeds total system memory", request.From)
+		http.Error(w, "Runtime memory requirement for model exceeds total system memory", http.StatusInsufficientStorage)
+		return


I was considering wether there were any cases where a user may want to pull a model on a system where there is insufficient memory to run it.

Some edge cases I thought of:

transferring between registries where you might pull, tag, and then push without ever running the model

maybe some future packaging use cases where you are pulling a model and then extending/modifying it?? but in these cases I would actually prefer if we accomplish this without the need to pull remote layers

If we think there are any potential uses for pulling a model to a system where it can't run we may want to add a force param to explicitly ignore the check.

Yup, definitely. I was thinking of putting in a config option to enable/disable resource-related checks.

ekcasey · 2025-08-03T20:24:12Z

pkg/inference/models/manager.go

+func (m *Manager) GetRemoteModel(ctx context.Context, ref string) (types.ModelArtifact, error) {
+	model, err := m.registryClient.Model(ctx, ref)
+	if err != nil {
+		return nil, fmt.Errorf("error while getting remote model: %w", err)
+	}
+	return model, nil
+}
+
+// GetRemoteModelBlobURL returns the URL of a given model blob.
+func (m *Manager) GetRemoteModelBlobURL(ref string, digest v1.Hash) (string, error) {
+	blobURL, err := m.registryClient.BlobURL(ref, digest)
+	if err != nil {
+		return "", fmt.Errorf("error while getting remote model blob URL: %w", err)
+	}
+	return blobURL, nil
+}
+
+// BearerTokenForModel returns the bearer token needed to pull a given model.
+func (m *Manager) BearerTokenForModel(ctx context.Context, ref string) (string, error) {
+	tok, err := m.registryClient.BearerToken(ctx, ref)
+	if err != nil {
+		return "", fmt.Errorf("error while getting bearer token for model: %w", err)
+	}
+	return tok, nil
+}


thoughts for the future:

Seems like we are adding a lot of methods to the Manager that are direct pass throughs to distribution or registry client.

In the future maybe we should use the clients from MD directly when a go API is needed. And leave ModelManager with the single responsibility of exposing this functional via an HTTP API.

ekcasey

I know this is still in draft but I took as look as I'll be out next week. It looks good to me! But I think it might be safer to proceed in the case of an estimation error and also provide an escape hatch so we don't prevent users who know what they are doing from doing valid things we may not have anticipated.

Also left some thoughts for future refactorings that occurred to me while looking through. Not necessarily suggesting we tackle these now, but I do think some of the interfaces will need a cleanup eventually so I'd thought I'd capture some thoughts as I went

ekcasey · 2025-08-03T20:25:14Z

pkg/inference/backends/llamacpp/llamacpp.go

 	}, nil
 }

+func (l *llamaCpp) parseLocalModel(model string) (*parser.GGUFFile, types.Config, error) {


thoughts for the future: I am on board with doing what is convenient for now, especially when we only have one backend where estimating is really supported. But we may eventually what to move the part of this logic that is tied to the format (GGUF) out of here and into distribution (where we already do some GGUF parsing) and leave only the

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

pkg/inference/models/manager.go

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

pkg/inference/models/manager.go

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

xenoscopic

LGTM

go.sum

pkg/inference/memory/system.go

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

p1-0tr force-pushed the ps-pre-pull-memory-estimation branch from 451c9a2 to 6ca247a Compare July 30, 2025 11:15

github-advanced-security bot found potential problems Jul 30, 2025

View reviewed changes

pkg/inference/models/manager.go Fixed Show fixed Hide fixed

ekcasey reviewed Aug 3, 2025

View reviewed changes

Piotr Stankiewicz added 2 commits August 11, 2025 09:30

Bump docker/model-distribution

fc70f07

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

inference: Support memory estimation for remote models

01ea183

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

p1-0tr force-pushed the ps-pre-pull-memory-estimation branch 2 times, most recently from 1a63862 to 055a70b Compare August 11, 2025 08:58

github-advanced-security bot found potential problems Aug 11, 2025

View reviewed changes

pkg/inference/models/manager.go Fixed Show fixed Hide fixed

inference: Block pull if model requires too much memory to run

739146e

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

p1-0tr force-pushed the ps-pre-pull-memory-estimation branch from 055a70b to 739146e Compare August 12, 2025 13:24

p1-0tr marked this pull request as ready for review August 14, 2025 08:48

github-advanced-security bot found potential problems Aug 19, 2025

View reviewed changes

pkg/inference/models/manager.go Dismissed Show dismissed Hide dismissed

inference: Support disabling pre-pull memory checks

e761a77

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

p1-0tr force-pushed the ps-pre-pull-memory-estimation branch from b1b224c to e761a77 Compare August 19, 2025 14:25

p1-0tr requested a review from xenoscopic August 19, 2025 14:50

p1-0tr mentioned this pull request Aug 20, 2025

Add flags for disabling pre-pull memory estimation docker/model-cli#140

Merged

xenoscopic approved these changes Aug 20, 2025

View reviewed changes

go.sum Outdated Show resolved Hide resolved

pkg/inference/memory/system.go Outdated Show resolved Hide resolved

pkg/inference/memory/system.go Outdated Show resolved Hide resolved

pkg/inference/memory/system.go Outdated Show resolved Hide resolved

inference: Fix up review comments

44a1498

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

p1-0tr merged commit 933edd2 into main Aug 22, 2025
4 checks passed

p1-0tr deleted the ps-pre-pull-memory-estimation branch August 22, 2025 08:15

doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 23, 2025

Fix: do not parse arguments twice (docker#125)

f17f97a

doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 24, 2025

Fix: do not parse arguments twice (docker#125)

3e3f5d2

doringeman pushed a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025

Support multiline (docker#125)

f7ed9fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory estimation for remote models#125

Memory estimation for remote models#125
p1-0tr merged 5 commits intomainfrom
ps-pre-pull-memory-estimation

p1-0tr commented Jul 30, 2025 •

edited by kiview

Loading

Uh oh!

Uh oh!

ekcasey Aug 3, 2025

Uh oh!

ekcasey Aug 3, 2025

Uh oh!

p1-0tr Aug 5, 2025

Uh oh!

ekcasey Aug 3, 2025 •

edited

Loading

Uh oh!

ekcasey left a comment

Uh oh!

ekcasey Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!

xenoscopic left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		http.Error(w, "Could not calculate runtime memory requirement for model", http.StatusInternalServerError)
		return

Conversation

p1-0tr commented Jul 30, 2025 • edited by kiview Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ekcasey Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

ekcasey Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

p1-0tr Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

ekcasey Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ekcasey left a comment

Choose a reason for hiding this comment

Uh oh!

ekcasey Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xenoscopic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

p1-0tr commented Jul 30, 2025 •

edited by kiview

Loading

ekcasey Aug 3, 2025 •

edited

Loading