Skip to content

Commit dd5d81a

Browse files
authored
Merge pull request kubernetes#1922 from wojtek-t/efficient_watch_reboot_implementable
Mark efficient watch resumption KEP as implementable
2 parents 435a1c5 + 83ae0bd commit dd5d81a

File tree

2 files changed

+46
-43
lines changed

2 files changed

+46
-43
lines changed

keps/sig-api-machinery/1904-efficient-watch-resumption/README.md

Lines changed: 40 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -267,48 +267,61 @@ delievered to a subset of kube-apiservers, this is not a problem, because:
267267
should still be stored there (unless there is heavy churn of those objects
268268
which is the case that doesn't suffer from this problem)
269269

270-
271270
The POC PR can be found in: https://github.com/kubernetes/kubernetes/pull/92472
272271

273272
### Risks and Mitigations
274273

275-
<!--
276-
What are the risks of this proposal, and how do we mitigate? Think broadly.
277-
For example, consider both security and how this will impact the larger
278-
Kubernetes ecosystem.
279-
280-
How will security be reviewed, and by whom?
281-
282-
How will UX be reviewed, and by whom?
283-
284-
Consider including folks who also work outside the SIG or subproject.
285-
-->
274+
The biggest risk are bugs in the implementation. To mitigate this, the
275+
implementation will be hidden behind `EfficientWatchResumption` feature
276+
gate and necessary tests will be added and/or extended (details below).
286277

287278
## Design Details
288279

289280
### Test Plan
290281

291-
TODO: Fill in before making `Implementable`.
282+
- unit tests for logic enhancing resource version tracking in reflector
283+
- unit tests for newly added watch cache logic
284+
- integration test for sending bookmark on kube-apiserver shutdown
285+
- integration test for proving that resource version that
286+
kube-apiserver can serve from cache progresses eventually when objects of
287+
other types are being added/updated/deleted;
288+
this test should store events (or other type) in a separate etcd cluster
289+
(to test split-etcd backend mode) and ensure no RV leak across etcd clusters
292290

293291
### Graduation Criteria
294292

295-
TODO: Fill in before making `Implementable`.
293+
Alpha should provide basic functionality covered with tests described above.
296294

297295
#### Alpha -> Beta Graduation
298296

299-
TODO: Fill in before making `Implementable`.
297+
- Appropriate metrics are agreed on and implemented
298+
- Ad-hoc manual rolling-upgrade of kube-apiservers in 5k-node cluster
299+
is not resulting in required re-listing for watched resources from
300+
node components
300301

301302
#### Beta -> GA Graduation
302303

303-
TODO: Fill in before making `Implementable`.
304+
- Enabled in Beta for at least two releases without complaints
305+
- Rolling-upgrade of kube-apiservers in 5k-node cluster test is
306+
automated and running periodically.
304307

305308
### Upgrade / Downgrade Strategy
306309

307-
TODO: Fill in before making `Implementable`.
310+
Kubernetes can be safely updated/downgraded, as the implementation
311+
is purely in memory:
312+
- if etcd doesn't support frequent enough progress notify events,
313+
we won't get expected benefits (problems may not be addressed),
314+
but also no unexpected consequences
315+
- enabling the feature may only result in additional watch bookmark
316+
events for clients, which they are explicitly opting-in anyway
317+
- disabling the feature reverts the behavior of watchcache being
318+
synced to values of objects of different types; however given
319+
the initialization is happening at "now" anyway, the time won't
320+
go back
308321

309322
### Version Skew Strategy
310323

311-
TODO: Fill in before making `Implementable`.
324+
n/a - watch bookmarks don't have any frequency guarantees
312325

313326
## Production Readiness Review Questionnaire
314327

@@ -319,33 +332,23 @@ TODO: Fill in before making `Implementable`.
319332
_This section must be completed when targeting alpha to a release._
320333

321334
* **How can this feature be enabled / disabled in a live cluster?**
322-
- [ ] Feature gate (also fill in values in `kep.yaml`)
323-
- Feature gate name:
324-
- Components depending on the feature gate:
325-
- [ ] Other
326-
- Describe the mechanism:
327-
- Will enabling / disabling the feature require downtime of the control
328-
plane?
329-
- Will enabling / disabling the feature require downtime or reprovisioning
330-
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
335+
- [x] Feature gate (also fill in values in `kep.yaml`)
336+
- Feature gate name: EfficientWatchResumption
337+
- Components depending on the feature gate: kube-apiserver
331338

332339
* **Does enabling the feature change any default behavior?**
333-
Any change of default behavior may be surprising to users or break existing
334-
automations, so be extremely careful here.
340+
No.
335341

336342
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
337343
the enablement)?**
338-
Also set `disable-supported` to `true` or `false` in `kep.yaml`.
339-
Describe the consequences on existing workloads (e.g., if this is a runtime
340-
feature, can it break the existing applications?).
344+
Yes, watchcache (and watch bookmark events) will not be propagated with
345+
resource versions of objects of other types.
341346

342347
* **What happens if we reenable the feature if it was previously rolled back?**
348+
The expected behavior will be restored.
343349

344350
* **Are there any tests for feature enablement/disablement?**
345-
The e2e framework does not currently support enabling or disabling feature
346-
gates. However, unit tests in each component dealing with managing data, created
347-
with and without the feature, are necessary. At the very least, think about
348-
conversion tests if API types are being modified.
351+
No.
349352

350353
### Rollout, Upgrade and Rollback Planning
351354

@@ -498,6 +501,7 @@ _This section must be completed when targeting beta graduation to a release._
498501
## Implementation History
499502

500503
2020-06-30: KEP Proposed.
504+
2020-08-04: KEP marked as implementable.
501505

502506
## Drawbacks
503507

keps/sig-api-machinery/1904-efficient-watch-resumption/kep.yaml

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ authors:
55
owning-sig: sig-api-machinery
66
participating-sigs:
77
- sig-scalability
8-
status: provisional
8+
status: implementable
99
creation-date: 2020-07-23
1010
reviewers:
1111
- "@jpbetz"
@@ -33,12 +33,11 @@ milestone:
3333

3434
# The following PRR answers are required at alpha release
3535
# List the feature gate name and the components for which it must be enabled
36-
#feature-gates:
37-
# - name: MyFeature
38-
# components:
39-
# - kube-apiserver
40-
# - kube-controller-manager
41-
#disable-supported: true
36+
feature-gates:
37+
- name: EfficientWatchResumption
38+
components:
39+
- kube-apiserver
40+
disable-supported: true
4241

4342
# The following PRR answers are required at beta release
4443
#metrics:

0 commit comments

Comments
 (0)