Skip to content

Commit 0c47313

Browse files
committed
feat: add 625-node-resource-fit-plus-scoring kep
Signed-off-by: LY-today <[email protected]>
1 parent 02c34fc commit 0c47313

File tree

4 files changed

+121
-0
lines changed

4 files changed

+121
-0
lines changed
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Disk IO Aware Scheduling
2+
3+
<!-- toc -->
4+
- [Summary](#summary)
5+
- [Motivation](#motivation)
6+
- [Design Consideration](#design-consideration)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Design Details](#design-details)
11+
- [Test Plan](#test-plan)
12+
- [Graduation Criteria](#graduation-criteria)
13+
- [Alpha](#alpha)
14+
- [Beta](#beta)
15+
- [Implementation History](#implementation-history)
16+
<!-- /toc -->
17+
18+
19+
## Summary
20+
21+
The NodeResourcesFit plug-in of native k8s can only adopt a type of strategy for all resources, such as MostRequestedPriority and LeastRequestedPriority. However, in industrial practice, this design does not apply to some scenarios. For example: In AI scenarios, businesses that apply for GPUs prefer to occupy the entire GPU machine first to prevent GPU fragmentation; businesses that apply for CPU & MEM are prioritized and dispersed to non-GPU machines to prevent excessive consumption of CPU & MEM on GPU machines, resulting in real tasks of applying for GPUs. Pending due to insufficient non-GPU resources
22+
. Therefore, two plugins are extended to solve this common problem.
23+
24+
## Motivation
25+
case:
26+
- GPU tasks take priority over the entire GPU
27+
- CPU&MEM tasks are distributed to the CPU machine first
28+
29+
## Design Consideration
30+
31+
- The solution is more versatile, not limited to AI clusters or CPU clusters, and not limited to common CPU resources or extended GPU resources.
32+
33+
- Different resource policies can be configured for different cluster types and prioritized in the form of weights.
34+
35+
- Easy to expand
36+
37+
### Goals
38+
39+
- Different types of resources can be configured with different strategies to prioritize them in the form of weights
40+
41+
- Prevent pods that have not applied for scarce resources from being scheduled to nodes with scarce resources.
42+
43+
### Non-Goals
44+
45+
- None.
46+
47+
## Proposal
48+
49+
Extend two plug-ins to meet the above needs
50+
51+
- NodeResourcesFitPlus
52+
- ScarceResourceAvoidance
53+
54+
## Design Details
55+
56+
### NodeResourcesFitPlus:
57+
58+
config:
59+
```
60+
resources:
61+
nvidia.com/gpu:
62+
type: MostAllocated
63+
weight: 2
64+
cpu:
65+
type: LeastAllocated
66+
weight: 1
67+
memory:
68+
type: LeastAllocated
69+
weight: 1
70+
```
71+
config description:
72+
<p align="center"><img src="images/img1.png" title="Key components" width="600" class="center"/></p>
73+
74+
node score:
75+
```
76+
finalScoreNode = [(weight1 * resource1) + (weight2 * resource2) + … + (weightN* resourceN)] /(weight1+weight2+ … +weightN)
77+
```
78+
79+
### ScarceResourceAvoidance:
80+
config:
81+
```
82+
resources:
83+
- nvidia.com/gpu
84+
```
85+
config description:
86+
<p align="center"><img src="images/img2.png" title="Key components" width="600" class="center"/></p>
87+
88+
node score:
89+
```
90+
finalScoreNode = (allocatablesResourcesNum - requestsResourcesNum) * framework.MaxNodeScore / allocatablesResourcesNum
91+
```
92+
93+
### Test Plan
94+
95+
Comprehensive unit tests will be added to ensure that each functionality works as expected. Additionally, detailed integration tests will be implemented to verify that the scheduler plugin and IO Driver interact without any issue.
96+
97+
Finally, a basic e2e test will be included to ensure that all components can work together properly.
98+
99+
### Graduation Criteria
100+
101+
#### Alpha
102+
103+
- Implement the NodeResourcesFitPlus and ScarceResourceAvoidance scheduler plugins
104+
- Provide a reference implementation of the NodeResourcesFitPlus and ScarceResourceAvoidance
105+
- Unit tests and integration test from [Test Plan](#test-plan).
106+
107+
#### Beta
108+
109+
- Add E2E tests.
110+
- Provide beta-level documentation.
111+
112+
## Implementation History
113+
114+
- 2024-12-23: KEP created
22.8 KB
Loading
45.4 KB
Loading
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
title: Node Resource Fit plus Scheduling
2+
kep-number: 624
3+
authors:
4+
- "@LY-today"
5+
owning-sig: sig-scheduling
6+
creation-date: 2024-12-23
7+
last-updated: 2024-12-23

0 commit comments

Comments
 (0)