Skip to content

Commit 69fca40

Browse files
committed
Update PDB KEP with latest status
1 parent 841863e commit 69fca40

File tree

1 file changed

+38
-35
lines changed
  • keps/sig-apps/85-Graduate-PDB-to-Stable

1 file changed

+38
-35
lines changed

keps/sig-apps/85-Graduate-PDB-to-Stable/README.md

Lines changed: 38 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@
88
- [Goals](#goals)
99
- [Proposal](#proposal)
1010
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
11+
- [Promote Eviction to policy/v1 without breaking pods/eviction support for policy/v1beta1](#promote-eviction-to-policyv1-without-breaking-podseviction-support-for-policyv1beta1)
1112
- [Mutable PDBs](#mutable-pdbs)
1213
- [Eviction of non-ready pods](#eviction-of-non-ready-pods)
1314
- [Make the disruption controller more lenient for pods belonging to non-scale controllers](#make-the-disruption-controller-more-lenient-for-pods-belonging-to-non-scale-controllers)
14-
- [Proposed solution](#proposed-solution)
1515
- [Address scalability issues with the disruption controller](#address-scalability-issues-with-the-disruption-controller)
1616
- [Fix handling of empty selector in disruption controller](#fix-handling-of-empty-selector-in-disruption-controller)
1717
- [API changes](#api-changes)
@@ -81,6 +81,26 @@ long term.
8181

8282
### Implementation Details/Notes/Constraints
8383

84+
#### Promote Eviction to policy/v1 without breaking pods/eviction support for policy/v1beta1
85+
86+
Eviction is part of policy/v1beta1, but because it is a subresource of the v1 Pod API,
87+
support for accepting policy/v1beta1 requests should not be dropped.
88+
89+
Luckily, the endpoint only supports Create and returns v1 Status,
90+
so it is possible to let the current endpoint accept both policy/v1 and policy/v1beta1 Evictions.
91+
92+
The following changes will be made:
93+
94+
* The decoding stack will be adjusted to allow a REST handler to accept multiple GroupVersionKinds
95+
* Discovery documents will indicate that policy/v1 Eviction kinds are accepted
96+
* client-go will add Eviction v1 and v1beta1 methods
97+
* `kubectl drain` will use v1 Eviction if available and fall back to v1beta1 Eviction
98+
* The Eviction subresource handler will accept policy/v1 and policy/v1beta1 Eviction objects
99+
* Integration tests will be added to ensure:
100+
* Get requests continue to be unsupported for this endpoint
101+
* Patch requests continue to be unsupported for this endpoint
102+
* Create requests continue to accept policy/v1 and policy/v1beta1 requests and return Status objects
103+
84104
#### Mutable PDBs
85105

86106
A mutable PDB object allows its `MinAvailable`, `MaxUnavailable`, and `Selector`
@@ -126,43 +146,23 @@ The current behavior of the disruption controller for the different types of
126146
input and the different types of pods that might be encountered are documented in:
127147
https://docs.google.com/spreadsheets/d/12HUundBS-slA6axfQYZPRCeIu_Au_wsGD0Vu_oKAnM8/edit?usp=sharing
128148

129-
##### Proposed solution
130-
131-
To fix this, we will make two adjustments:
132-
133-
Loosen the rules somewhat, so instead of just returning an error and block all
134-
evictions when the controller encounters invalid pods, it will ignore those pods
135-
when computing `AllowedDisruptions`. And similarly, the Eviction API will ignore
136-
the PDB when evicting those pods. This means that in a situation like was
137-
mentioned in the issue, where a PDB selector happen to match both pods belonging
138-
to a Deployment and pods belonging to a CronJob, the PDB will protect the
139-
Deployment pods as expected, but ignore the pods from the CronJob.
140-
141-
Combined with loosening the rules in the controller, we also want to improve
142-
feedback to the user in these situations. So while the pods from the CronJob
143-
described above will not be covered by the PDB, the controller will generate
144-
warning events for these situations that will provide accurate descriptions of
145-
what makes the pod ineligible for the PDB. Events should only be generated the
146-
first time the controller encounters the workload to avoid generating new events
147-
on every reconcile. In this case it would mean an event explaining that the pod
148-
has a controller that does not implement scale, and therefore can only be used
149-
if the PDB is using `minAvilable` as a number. We also want to introduce
150-
conditions on the PDB status object that can signal error situations to users
151-
and tools. In particular, whenever the failsafe functionality of the disruption
152-
controller forces `AllowedDisruptions`to be 0, there should be a condition that
153-
allow tools to distinguish this situation from `AllowedDisruptions` being 0
154-
simply because there are not enough ready pods.
155-
156-
Combined, these should make the behavior of the disruption controller more
157-
intuitive, but also provide signals to users whenever the configuration of the
158-
pdb and/or the targeted workloads are invalid.
149+
This has been addressed by improving the users' visibility into any issues
150+
encountered by the disruption controller, primarily through the addition of
151+
conditions to the PDB status: https://github.com/kubernetes/kubernetes/pull/98127.
152+
We also improve the error message provided to users in the case where a controller
153+
resource can not be found with the scale client: https://github.com/kubernetes/kubernetes/pull/98346
154+
155+
We considered changing the current behavior of setting `DisruptionsAllowed` to zero,
156+
but decided against it as it creates issues for the Eviction API and it makes it
157+
harder to understand the behavior of the disruption controller.
159158

160159
#### Address scalability issues with the disruption controller
161160

162161
The disruption controller has some performance issues as reported in
163162
https://github.com/kubernetes/kubernetes/issues/92826.
164163
https://github.com/kubernetes/kubernetes/pull/92827 was merged to remove the 30s
165-
resync period which should improve performance.
164+
resync period which should improve performance. The frequency at which the
165+
disruption controller creates events has also been reduced.
166166

167167
#### Fix handling of empty selector in disruption controller
168168

@@ -172,6 +172,8 @@ directly in the v1beta1 version of PDBs, but we should fix this as part of
172172
promoting PDBs to v1 using the approach described in
173173
https://github.com/kubernetes/kubernetes/issues/95083#issuecomment-703723763
174174

175+
Fixed as part of https://github.com/kubernetes/kubernetes/pull/99290
176+
175177
### API changes
176178

177179
* Add conditions to the status object for PodDisruptionBudget.
@@ -321,9 +323,9 @@ The following e2e tests will be included in the conformance tests:
321323

322324
Graduation to GA:
323325
- [x] Implement Mutable PDBs
324-
- [ ] Address performance issues
325-
- [ ] Pass conformance tests
326-
- [ ] Update documents to reflect the changes
326+
- [x] Address performance issues
327+
- [x] Pass conformance tests
328+
- [x] Update documents to reflect the changes
327329

328330
## Production Readiness Review Questionnaire
329331

@@ -452,3 +454,4 @@ details). For now, we leave it here.
452454

453455
- PodDisruptionBudget was introduced in Kubernetes 1.4 as an alpha version.
454456
- PodDisruptionBudget was graduated to beta in Kubernetes 1.5.
457+
- PodDisruptionBudget was graduated to GA in Kubernetes 1.21.

0 commit comments

Comments
 (0)