Skip to content

Commit c75b7f4

Browse files
committed
Simple drop of Phase 1 contents. Keep Phase 2 contents
We are splitting the Phase 2 into its own KEP here, to allow the two Phases to move at different paces. We plan to graduate Phase 1 into Beta, while Phase 2 is still being developed and stay in Alpha.
1 parent e33661f commit c75b7f4

File tree

1 file changed

+10
-105
lines changed
  • keps/sig-node/5394-psi-node-conditions

1 file changed

+10
-105
lines changed

keps/sig-node/5394-psi-node-conditions/README.md

Lines changed: 10 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# KEP-4205: PSI Based Node Conditions
1+
# KEP-5394: PSI Based Node Conditions
22
<!-- toc -->
33
- [Release Signoff Checklist](#release-signoff-checklist)
44
- [Summary](#summary)
@@ -8,22 +8,15 @@
88
- [Proposal](#proposal)
99
- [User Stories (Optional)](#user-stories-optional)
1010
- [Story 1](#story-1)
11-
- [Story 2](#story-2)
1211
- [Risks and Mitigations](#risks-and-mitigations)
1312
- [Design Details](#design-details)
14-
- [Phase 1](#phase-1)
15-
- [CPU](#cpu)
16-
- [Memory](#memory)
17-
- [IO](#io)
18-
- [Phase 2 to add PSI based actions.](#phase-2-to-add-psi-based-actions)
1913
- [Test Plan](#test-plan)
2014
- [Prerequisite testing updates](#prerequisite-testing-updates)
2115
- [Unit tests](#unit-tests)
2216
- [Integration tests](#integration-tests)
2317
- [e2e tests](#e2e-tests)
2418
- [Graduation Criteria](#graduation-criteria)
25-
- [Phase 1: Alpha](#phase-1-alpha)
26-
- [Phase 2: Alpha](#phase-2-alpha)
19+
- [Alpha](#alpha)
2720
- [Beta](#beta)
2821
- [GA](#ga)
2922
- [Deprecation](#deprecation)
@@ -85,7 +78,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
8578

8679
## Summary
8780

88-
This KEP proposes adding support in kubelet to read Pressure Stall Information (PSI) metric pertaining to CPU, Memory and IO resources exposed from cAdvisor and runc. This will enable kubelet to report node conditions which will be utilized to prevent scheduling of pods on nodes experiencing significant resource constraints.
81+
This KEP proposes enabling kubelet to report node conditions which will be utilized to prevent scheduling of pods on nodes experiencing significant resource constraints.
8982

9083
## Motivation
9184

@@ -96,13 +89,7 @@ In short, PSI metric are like barometers that provide fair warning of impending
9689
### Goals
9790

9891
This proposal aims to:
99-
1. Enable the kubelet to have the PSI metric of cgroupv2 exposed from cAdvisor and Runc.
100-
2. Enable the pod level PSI metric and expose it in the Summary API.
101-
3. Utilize the node level PSI metric to set node condition and node taints.
102-
103-
It will have two phases:
104-
Phase 1: includes goal 1, 2
105-
Phase 2: includes goal 3
92+
1. Utilize the node level PSI metric to set node condition and node taints.
10693

10794
### Non-Goals
10895

@@ -115,86 +102,17 @@ userspace OOM kills, and so on, for future KEPs.
115102

116103
#### Story 1
117104

118-
Today, to identify disruptions caused by resource crunches, Kubernetes users need to
119-
install node exporter to read PSI metric. With the feature proposed in this enhancement,
120-
PSI metric will be available for users in the Kubernetes metrics API.
121-
122-
#### Story 2
123-
124105
Kubernetes users want to prevent new pods to be scheduled on the nodes that have resource starvation. By using PSI metric, the kubelet will set Node Condition to avoid pods being scheduled on nodes under high resource pressure. The node controller could then set a [taint on the node based on these new Node Conditions](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-nodes-by-condition).
125106

126107
### Risks and Mitigations
127108

128-
There are no significant risks associated with Phase 1 implementation that involves integrating
129-
the PSI metric in kubelet from either from cadvisor runc libcontainer library or kubelet's CRI runc libcontainer implementation which doesn't involve any shelled binary operations.
130-
131-
Phase 2 involves utilizing the PSI metric to report node conditions. There is a potential
109+
There is a potential
132110
risk of early reporting for nodes under pressure. We intend to address this concern
133111
by conducting careful experimentation with PSI threshold values to identify the optimal
134112
default threshold to be used for reporting the nodes under heavy resource pressure.
135113

136114
## Design Details
137115

138-
#### Phase 1
139-
1. Add new Data structures PSIData and PSIStats corresponding to the PSI metric output format as following:
140-
141-
```
142-
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
143-
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
144-
```
145-
146-
```go
147-
type PSIData struct {
148-
Avg10 *float64 `json:"avg10"`
149-
Avg60 *float64 `json:"avg60"`
150-
Avg300 *float64 `json:"avg300"`
151-
Total *float64 `json:"total"`
152-
}
153-
154-
type PSIStats struct {
155-
Some *PSIData `json:"some,omitempty"`
156-
Full *PSIData `json:"full,omitempty"`
157-
}
158-
```
159-
160-
2. Summary API includes stats for both system and kubepods level cgroups. Extend the Summary API to include PSI metric data for each resource obtained from cadvisor.
161-
Note: if cadvisor-less is implemented prior to the implementation of this enhancement, the PSI
162-
metric data will be available through CRI instead.
163-
164-
##### CPU
165-
```go
166-
type CPUStats struct {
167-
// PSI stats of the overall node
168-
PSI cadvisorapi.PSIStats `json:"psi,omitempty"`
169-
}
170-
```
171-
172-
##### Memory
173-
```go
174-
type MemoryStats struct {
175-
// PSI stats of the overall node
176-
PSI cadvisorapi.PSIStats `json:"psi,omitempty"`
177-
}
178-
```
179-
180-
##### IO
181-
```go
182-
// IOStats contains data about IO usage.
183-
type IOStats struct {
184-
// The time at which these stats were updated.
185-
Time metav1.Time `json:"time"`
186-
187-
// PSI stats of the overall node
188-
PSI cadvisorapi.PSIStats `json:"psi,omitempty"`
189-
}
190-
191-
type NodeStats struct {
192-
// Stats about the IO pressure of the node
193-
IO *IOStats `json:"io,omitempty"`
194-
}
195-
```
196-
197-
#### Phase 2 to add PSI based actions.
198116
**Note:** These actions are tentative, and will depend on different the outcome from testing and discussions with sig-node members, users, and other folks.
199117

200118
1. Introduce a new kubelet config parameter, pressure threshold, to let users specify the pressure percentage beyond which the kubelet would report the node condition to disallow workloads to be scheduled on it.
@@ -318,15 +236,9 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
318236

319237
### Graduation Criteria
320238

321-
#### Phase 1: Alpha
322-
323-
- PSI integrated in kubelet behind a feature flag.
324-
- Unit tests to check the fields are populated in the
325-
Summary API response.
239+
#### Alpha
326240

327-
#### Phase 2: Alpha
328-
329-
- Implement Phase 2 of the enhancement which enables kubelet to
241+
- Enables kubelet to
330242
report node conditions based off PSI values.
331243
- Initial e2e tests completed and enabled if CRI implementation supports
332244
it.
@@ -407,7 +319,7 @@ well as the [existing list] of feature gates.
407319

408320
- [X] Feature gate (also fill in values in `kep.yaml`)
409321
- Feature gate name: PSINodeCondition
410-
- Components depending on the feature gate: kubelet
322+
- Components depending on the feature gate: kubelet, kube-controller-manager, kube-scheduler
411323
- [ ] Other
412324
- Describe the mechanism:
413325
- Will enabling / disabling the feature require downtime of the control
@@ -421,7 +333,7 @@ well as the [existing list] of feature gates.
421333
Any change of default behavior may be surprising to users or break existing
422334
automations, so be extremely careful here.
423335
-->
424-
Not in Phase 1. Phase 2 is TBD in K8s 1.31.
336+
TBD
425337

426338
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
427339

@@ -513,11 +425,6 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
513425
checking if there are objects with field X set) may be a last resort. Avoid
514426
logs or events for this purpose.
515427
-->
516-
For Phase 1:
517-
Use `kubectl get --raw "/api/v1/nodes/{$nodeName}/proxy/stats/summary"` to call Summary API. If the PSIStats field is seen in the API response,
518-
the feature is available to be used by workloads.
519-
520-
For Phase 2:
521428
TBD
522429

523430
###### How can someone using this feature know that it is working for their instance?
@@ -658,12 +565,10 @@ NA
658565
## Implementation History
659566

660567
- 2023/09/13: Initial proposal
568+
- 2025/06/11: Only keep Phase 2 contents in this new KEP. Phase 1 contents are kept in the original KEP.
661569

662570
## Drawbacks
663571

664-
No drawbacks in Phase 1 identified. There's no reason the enhancement should not be
665-
implemented. This enhancement now makes it possible to read PSI metric without installing
666-
additional dependencies
667572

668573
## Infrastructure Needed (Optional)
669574

0 commit comments

Comments
 (0)