Skip to content

Memory estimation for remote models#125

Merged
p1-0tr merged 5 commits intomainfrom
ps-pre-pull-memory-estimation
Aug 22, 2025
Merged

Memory estimation for remote models#125
p1-0tr merged 5 commits intomainfrom
ps-pre-pull-memory-estimation

Conversation

@p1-0tr
Copy link

@p1-0tr p1-0tr commented Jul 30, 2025

resolves AIR-237

@p1-0tr p1-0tr force-pushed the ps-pre-pull-memory-estimation branch from 451c9a2 to 6ca247a Compare July 30, 2025 11:15
Comment on lines +174 to +175
http.Error(w, "Could not calculate runtime memory requirement for model", http.StatusInternalServerError)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something goes wrong with the calculation maybe we want to log an error and continue, so that an issue with memory estimation doesn't make the runner unusable?

if !haveMem {
m.log.Warnf("Runtime memory requirement for model %q exceeds total system memory", request.From)
http.Error(w, "Runtime memory requirement for model exceeds total system memory", http.StatusInsufficientStorage)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering wether there were any cases where a user may want to pull a model on a system where there is insufficient memory to run it.

Some edge cases I thought of:

  1. transferring between registries where you might pull, tag, and then push without ever running the model
  2. maybe some future packaging use cases where you are pulling a model and then extending/modifying it?? but in these cases I would actually prefer if we accomplish this without the need to pull remote layers

If we think there are any potential uses for pulling a model to a system where it can't run we may want to add a force param to explicitly ignore the check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, definitely. I was thinking of putting in a config option to enable/disable resource-related checks.

Comment on lines +596 to +620
func (m *Manager) GetRemoteModel(ctx context.Context, ref string) (types.ModelArtifact, error) {
model, err := m.registryClient.Model(ctx, ref)
if err != nil {
return nil, fmt.Errorf("error while getting remote model: %w", err)
}
return model, nil
}

// GetRemoteModelBlobURL returns the URL of a given model blob.
func (m *Manager) GetRemoteModelBlobURL(ref string, digest v1.Hash) (string, error) {
blobURL, err := m.registryClient.BlobURL(ref, digest)
if err != nil {
return "", fmt.Errorf("error while getting remote model blob URL: %w", err)
}
return blobURL, nil
}

// BearerTokenForModel returns the bearer token needed to pull a given model.
func (m *Manager) BearerTokenForModel(ctx context.Context, ref string) (string, error) {
tok, err := m.registryClient.BearerToken(ctx, ref)
if err != nil {
return "", fmt.Errorf("error while getting bearer token for model: %w", err)
}
return tok, nil
}
Copy link
Contributor

@ekcasey ekcasey Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts for the future:

Seems like we are adding a lot of methods to the Manager that are direct pass throughs to distribution or registry client.

In the future maybe we should use the clients from MD directly when a go API is needed. And leave ModelManager with the single responsibility of exposing this functional via an HTTP API.

Copy link
Contributor

@ekcasey ekcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still in draft but I took as look as I'll be out next week. It looks good to me! But I think it might be safer to proceed in the case of an estimation error and also provide an escape hatch so we don't prevent users who know what they are doing from doing valid things we may not have anticipated.

Also left some thoughts for future refactorings that occurred to me while looking through. Not necessarily suggesting we tackle these now, but I do think some of the interfaces will need a cleanup eventually so I'd thought I'd capture some thoughts as I went

}, nil
}

func (l *llamaCpp) parseLocalModel(model string) (*parser.GGUFFile, types.Config, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts for the future: I am on board with doing what is convenient for now, especially when we only have one backend where estimating is really supported. But we may eventually what to move the part of this logic that is tied to the format (GGUF) out of here and into distribution (where we already do some GGUF parsing) and leave only the

Piotr Stankiewicz added 2 commits August 11, 2025 09:30
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
@p1-0tr p1-0tr force-pushed the ps-pre-pull-memory-estimation branch 2 times, most recently from 1a63862 to 055a70b Compare August 11, 2025 08:58
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
@p1-0tr p1-0tr force-pushed the ps-pre-pull-memory-estimation branch from 055a70b to 739146e Compare August 12, 2025 13:24
@p1-0tr p1-0tr marked this pull request as ready for review August 14, 2025 08:48
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
Copy link
Contributor

@xenoscopic xenoscopic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
@p1-0tr p1-0tr merged commit 933edd2 into main Aug 22, 2025
4 checks passed
@p1-0tr p1-0tr deleted the ps-pre-pull-memory-estimation branch August 22, 2025 08:15
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 23, 2025
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Sep 24, 2025
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants