Commit 6589134
committed
KEP: Add support for mutable pod resources in suspended jobs
Introduce a new KEP proposal to allow updating container resource
specifications (CPU, memory, GPU, extended resources) for suspended jobs.
Key features:
- Enable dynamic resource allocation for suspended jobs only
- Support CPU, memory, and GPU resource mutations
- Include extended resources (nvidia.com/gpu, amd.com/gpu, tpu-v4, etc.)
- Allow queue controllers to optimize resource allocation based on
cluster conditions
- Feature gate: MutableJobPodResourcesForSuspendedJobs
- Focus on batch workload optimization scenarios
This proposal enables better cluster utilization and cost optimization
by allowing queue controllers to adjust job resource requirements before
execution based on real-time cluster capacity and resource availability.
Particularly valuable for expensive GPU and specialized hardware resources.1 parent 7bf6ad0 commit 6589134
3 files changed
+588
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
0 commit comments