-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Summary
For hands-on lab case, developers use GPU randomly, but the program consumes GPU constantly, it cause insufficient VRAM usage and should be cold-down to host memory or disk.
Motivation
Reduce VRAM in low-frequency GPU usage cases, to increase oversubscription ratio and save more costs.
Tech Design
- TF Controller injects hypervisor AutoFreezeAndResume configs into env var
-
Hypervisor deserialize the config, and pass it to multi-process scheduling module
-
Hypervisor maintains timer for each active process, if if meets the autoFreeze criteria, call TF workers' ABI to trigger suspend, and move it to queue that won't be awaked.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed