Skip to content

Commit 423e1d6

Browse files
author
Piotr Stankiewicz
committed
Allow unloading multiple models at once
Once we enable running multiple models at once, it will be useful to be able to unload multiple at a time. In preparation for that, make the unload request accept multiple model tags, and evict those models. Signed-off-by: Piotr Stankiewicz <[email protected]>
1 parent d73a30c commit 423e1d6

File tree

2 files changed

+7
-4
lines changed

2 files changed

+7
-4
lines changed

pkg/inference/scheduling/api.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ type DiskUsage struct {
6464

6565
// UnloadRequest is used to specify which models to unload.
6666
type UnloadRequest struct {
67-
All bool `json:"all"`
68-
Backend string `json:"backend"`
69-
Model string `json:"model"`
67+
All bool `json:"all"`
68+
Backend string `json:"backend"`
69+
Models []string `json:"models"`
7070
}
7171

7272
// UnloadResponse is used to return the number of unloaded runners (backend, model).

pkg/inference/scheduling/loader.go

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,10 @@ func (l *loader) Unload(ctx context.Context, unload UnloadRequest) int {
209209
if unload.All {
210210
return l.evict(false)
211211
} else {
212-
return l.evictRunner(unload.Backend, unload.Model)
212+
for _, model := range unload.Models {
213+
l.evictRunner(unload.Backend, model)
214+
}
215+
return len(l.runners)
213216
}
214217
}()
215218
}

0 commit comments

Comments
 (0)