Skip to content

Commit fe63656

Browse files
committed
add kep
Signed-off-by: KunWuLuan <[email protected]>
1 parent 68e40f8 commit fe63656

File tree

2 files changed

+183
-0
lines changed

2 files changed

+183
-0
lines changed

kep/594-resourcepolicy/README.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# Resource Policy
2+
3+
## Table of Contents
4+
5+
<!-- toc -->
6+
- [Summary](#summary)
7+
- [Motivation](#motivation)
8+
- [Use Cases](#use-cases)
9+
- [Goals](#goals)
10+
- [Non-Goals](#non-goals)
11+
- [Proposal](#proposal)
12+
- [CRD API](#crd-api)
13+
- [Implementation Details](#implementation-details)
14+
- [Scheduler Plugins](#scheduler-plugins)
15+
- [PreFilter](#prefilter)
16+
- [Filter](#filter)
17+
- [Score](#score)
18+
- [Resource Policy Controller](#resource-policy-controller)
19+
- [Known limitations](#known-limitations)
20+
- [Test plans](#test-plans)
21+
- [Graduation criteria](#graduation-criteria)
22+
- [Feature enablement and rollback](#feature-enablement-and-rollback)
23+
<!-- /toc -->
24+
25+
## Summary
26+
This proposal introduces a plugin that enables users to set priorities for various resources and define maximum resource consumption limits for workloads across different resources.
27+
28+
## Motivation
29+
A Kubernetes cluster typically consists of heterogeneous machines, with varying SKUs on CPU, memory, GPU, and pricing. To
30+
efficiently utilize the different resources available in the cluster, users can set priorities for machines of different
31+
types and configure resource allocations for different workloads. Additionally, they may choose to delete pods running
32+
on low priority nodes instead of high priority ones.
33+
34+
### Use Cases
35+
36+
1. As a administrator of kubernetes cluster, there are some static but expensive VM instances and some dynamic but cheaper Spot
37+
instances in my cluster. I hope to restrict the resource consumption on each kind of resource for different workloads to limit the cost.
38+
I hope that important workloads in my cluster can be deployed first on static VM instances so that they will not worry about been preempted. And during business peak periods, the Pods that are scaled up are deployed on cheap, spot instances. At the end of the business peak, the Pods on Spot
39+
instances are prioritized to be scaled down.
40+
41+
### Goals
42+
43+
1. Develop a filter plugin to restrict the resource consumption on each kind of resource for different workloads.
44+
2. Develop a score plugin to favor nodes matched by a high priority kind of resource.
45+
3. Automatically setting deletion costs on Pods to control the scaling in sequence of workloads through a controller.
46+
47+
### Non-Goals
48+
49+
1. Scheduler will not delete the pods.
50+
51+
## Proposal
52+
53+
### API
54+
```yaml
55+
apiVersion: scheduling.sigs.x-k8s.io/v1alpha1
56+
kind: ResourcePolicy
57+
metadata:
58+
name: xxx
59+
namespace: xxx
60+
spec:
61+
matchLabelKeys:
62+
- pod-template-hash
63+
podSelector:
64+
key1: value1
65+
strategy: prefer
66+
units:
67+
- name: unit1
68+
max: 10
69+
maxResource:
70+
cpu: 10
71+
nodeSelector:
72+
key1: value1
73+
```
74+
75+
```go
76+
type ResourcePolicy struct {
77+
metav1.TypeMeta `json:",inline"`
78+
metav1.ObjectMeta `json:"metadata,omitempty"`
79+
80+
Spec ResourcePolicySpec `json:"spec"`
81+
Status ResourcePolicyStatus `json:"status,omitempty"`
82+
}
83+
84+
type ResourcePolicySpec struct {
85+
// +optional
86+
// +nullable
87+
// +listType=atomic
88+
Units []Unit `json:"units,omitempty" protobuf:"bytes,1,rep,name=units"`
89+
90+
Selector map[string]string `json:"selector,omitempty" protobuf:"bytes,2,rep,name=selector"`
91+
MatchLabelKeys []string `json:"matchLabelKeys,omitempty" protobuf:"bytes,3,rep,name=matchLabelKeys"`
92+
}
93+
94+
type Unit struct {
95+
Max *int32 `json:"max,omitempty" protobuf:"varint,1,opt,name=max"`
96+
MaxResources v1.ResourceList `json:"maxResources,omitempty" protobuf:"bytes,2,rep,name=maxResources"`
97+
98+
NodeSelector map[string]string `json:"nodeSelector,omitempty" protobuf:"bytes,3,rep,name=nodeSelector"`
99+
100+
PodLabelsToAdd map[string]string `json:"podLabels,omitempty" protobuf:"bytes,4,rep,name=podLabels"`
101+
PodAnnotationsToAdd map[string]string `json:"podAnnotations,omitempty" protobuf:"bytes,5,rep,name=podAnnotations"`
102+
}
103+
104+
type ResourcePolicyStatus struct {
105+
Pods []int64 `json:"pods,omitempty"`
106+
LastUpdateTime *metav1.Time `json:"lastUpdateTime,omitempty"`
107+
}
108+
```
109+
110+
Pods will be matched by the ResourcePolicy in same namespace when the `.spec.podSelector`.
111+
ResourcePolicies will never match pods in different namesapces. One pod can not be matched by more than one Resource Policies.
112+
113+
Pods can only be scheduled on units defined in `.spec.units`. Each item in `.spec.units` contains a set of nodes that match the `NodeSelector` which describes a kind of resource in the cluster.
114+
115+
Pods will be scheduled in the order defined by the `.spec.units`.
116+
`.spec.units[].max` define the maximum number of pods that can be scheduled on each unit. If `.spec.units[].max` is not set, pods can always be scheduled on the units except there is no enough resource.
117+
`.spec.units[].maxResource` define the maximum resource that can be scheduled on each unit. If `.spec.units[].maxResource` is not set, pods can always be scheduled on the units except there is no enough resource.
118+
119+
`.spec.matchLabelKeys` indicate how we group the pods matched by `podSelector`, its behavior is like
120+
`.spec.matchLabelKeys` in `PodTopologySpread`.
121+
122+
### Implementation Details
123+
124+
#### PreFilter
125+
PreFilter check if the current pods match only one resource policy. If not, PreFilter will reject the pod.
126+
If yes, PreFilter will get the number of pods on each unit to determine which units are available for the pod
127+
and write this information into cycleState.
128+
129+
#### Filter
130+
Filter check if the node belongs to an available unit. If the node doesn't belong to any unit, we will return unschedulable.
131+
132+
Besides, filter will check if the pods that was scheduled on the unit has already violated the quantity constraint.
133+
If the number of pods has reach the `.spec.unit[].max`, all the nodes in unit will be marked unschedulable.
134+
135+
#### Score
136+
137+
Node score is `100 - (index of the unit)`
138+
139+
#### PreBind
140+
141+
Add annotations and labels to pods to ensure they can be scaled down in the order of the units.
142+
143+
## Known limitations
144+
145+
- Currently deletion costs only take effect on deployment workload.
146+
147+
## Test plans
148+
149+
1. Add detailed unit and integration tests for the plugin and controller.
150+
2. Add basic e2e tests, to ensure all components are working together.
151+
152+
## Graduation criteria
153+
154+
This plugin will not be enabled only when users enable it in scheduler framework and create a resourcepolicy for pods.
155+
So it is safe to be beta.
156+
157+
* Beta
158+
- [ ] Add node E2E tests.
159+
- [ ] Provide beta-level documentation.
160+
161+
## Feature enablement and rollback
162+
163+
Enable resourcepolicy in MultiPointPlugin to enable this plugin, like this:
164+
165+
```yaml
166+
piVersion: kubescheduler.config.k8s.io/v1
167+
kind: KubeSchedulerConfiguration
168+
leaderElection:
169+
leaderElect: false
170+
profiles:
171+
- schedulerName: default-scheduler
172+
plugins:
173+
multiPoint:
174+
enabled:
175+
- name: resourcepolicy
176+
```
177+
178+

kep/594-resourcepolicy/kep.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
title: Resourcepolicy
2+
kep-number: 594
3+
authors:
4+
- "@KunWuLuan"
5+
- "@fjding"

0 commit comments

Comments
 (0)