Conversation
451c9a2 to
6ca247a
Compare
pkg/inference/models/manager.go
Outdated
| http.Error(w, "Could not calculate runtime memory requirement for model", http.StatusInternalServerError) | ||
| return |
There was a problem hiding this comment.
If something goes wrong with the calculation maybe we want to log an error and continue, so that an issue with memory estimation doesn't make the runner unusable?
pkg/inference/models/manager.go
Outdated
| if !haveMem { | ||
| m.log.Warnf("Runtime memory requirement for model %q exceeds total system memory", request.From) | ||
| http.Error(w, "Runtime memory requirement for model exceeds total system memory", http.StatusInsufficientStorage) | ||
| return |
There was a problem hiding this comment.
I was considering wether there were any cases where a user may want to pull a model on a system where there is insufficient memory to run it.
Some edge cases I thought of:
- transferring between registries where you might pull, tag, and then push without ever running the model
- maybe some future packaging use cases where you are pulling a model and then extending/modifying it?? but in these cases I would actually prefer if we accomplish this without the need to pull remote layers
If we think there are any potential uses for pulling a model to a system where it can't run we may want to add a force param to explicitly ignore the check.
There was a problem hiding this comment.
Yup, definitely. I was thinking of putting in a config option to enable/disable resource-related checks.
| func (m *Manager) GetRemoteModel(ctx context.Context, ref string) (types.ModelArtifact, error) { | ||
| model, err := m.registryClient.Model(ctx, ref) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("error while getting remote model: %w", err) | ||
| } | ||
| return model, nil | ||
| } | ||
|
|
||
| // GetRemoteModelBlobURL returns the URL of a given model blob. | ||
| func (m *Manager) GetRemoteModelBlobURL(ref string, digest v1.Hash) (string, error) { | ||
| blobURL, err := m.registryClient.BlobURL(ref, digest) | ||
| if err != nil { | ||
| return "", fmt.Errorf("error while getting remote model blob URL: %w", err) | ||
| } | ||
| return blobURL, nil | ||
| } | ||
|
|
||
| // BearerTokenForModel returns the bearer token needed to pull a given model. | ||
| func (m *Manager) BearerTokenForModel(ctx context.Context, ref string) (string, error) { | ||
| tok, err := m.registryClient.BearerToken(ctx, ref) | ||
| if err != nil { | ||
| return "", fmt.Errorf("error while getting bearer token for model: %w", err) | ||
| } | ||
| return tok, nil | ||
| } |
There was a problem hiding this comment.
thoughts for the future:
Seems like we are adding a lot of methods to the Manager that are direct pass throughs to distribution or registry client.
In the future maybe we should use the clients from MD directly when a go API is needed. And leave ModelManager with the single responsibility of exposing this functional via an HTTP API.
ekcasey
left a comment
There was a problem hiding this comment.
I know this is still in draft but I took as look as I'll be out next week. It looks good to me! But I think it might be safer to proceed in the case of an estimation error and also provide an escape hatch so we don't prevent users who know what they are doing from doing valid things we may not have anticipated.
Also left some thoughts for future refactorings that occurred to me while looking through. Not necessarily suggesting we tackle these now, but I do think some of the interfaces will need a cleanup eventually so I'd thought I'd capture some thoughts as I went
| }, nil | ||
| } | ||
|
|
||
| func (l *llamaCpp) parseLocalModel(model string) (*parser.GGUFFile, types.Config, error) { |
There was a problem hiding this comment.
thoughts for the future: I am on board with doing what is convenient for now, especially when we only have one backend where estimating is really supported. But we may eventually what to move the part of this logic that is tied to the format (GGUF) out of here and into distribution (where we already do some GGUF parsing) and leave only the
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
1a63862 to
055a70b
Compare
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
055a70b to
739146e
Compare
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
b1b224c to
e761a77
Compare
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
resolves AIR-237