Skip to content

Commit 5028ead

Browse files
committed
Add risk
1 parent 5ab1e99 commit 5028ead

File tree

1 file changed

+19
-14
lines changed
  • keps/sig-architecture/4330-compatibility-versions

1 file changed

+19
-14
lines changed

keps/sig-architecture/4330-compatibility-versions/README.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ tags, and then generate with `hack/update-toc.sh`.
8585
- [Proposal](#proposal)
8686
- [Component Flags](#component-flags)
8787
- [Changes to Feature Gates](#changes-to-feature-gates)
88-
- [Feature Gate Lifespans](#feature-gate-lifespans)
88+
- [Feature Gate Lifecycles](#feature-gate-lifecycles)
8989
- [Feature gating changes](#feature-gating-changes)
9090
- [CEL Environment Compatibility Versioning](#cel-environment-compatibility-versioning)
9191
- [StorageVersion Compatibility Versioning](#storageversion-compatibility-versioning)
@@ -95,6 +95,8 @@ tags, and then generate with `hack/update-toc.sh`.
9595
- [Story 2](#story-2)
9696
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
9797
- [Risks and Mitigations](#risks-and-mitigations)
98+
- [Risk: Increased maintenance burden on Kubernetes maintainers](#risk-increased-maintenance-burden-on-kubernetes-maintainers)
99+
- [Risk: Unintended and out-of-allowance compatibility skew](#risk-unintended-and-out-of-allowance-compatibility-skew)
98100
- [Design Details](#design-details)
99101
- [Test Plan](#test-plan)
100102
- [Prerequisite testing updates](#prerequisite-testing-updates)
@@ -190,8 +192,8 @@ orchestration as we are taking smaller and more incremental steps forward,
190192
which means there is less to “undo” on a failure condition.
191193

192194
In beta, we intend to introduce support for greater compatibility version skew
193-
(specifically, N-3) so that it would be possible to skip-level upgrade a
194-
Kubernetes control-plane, by means of:
195+
(specifically, N-3) so that it would be possible to skip binary versions while
196+
still performing a stepwise upgrade of Kubernetes control-plane. For example:
195197

196198
- (starting point) binary-version 1.28 (compat-version 1.28)
197199
- upgrade binary-version to 1.31 (compat-version stays at 1.28 - this is our skip-level binary upgrade)
@@ -253,7 +255,7 @@ compatibility version to determine which features to enable to match the set of
253255
features that where enabled for the Kubernetes version the compatibility version
254256
is set to.
255257
256-
#### Feature Gate Lifespans
258+
#### Feature Gate Lifecycles
257259
258260
`--feature-gates` must behave the same as it did for the Kubernetes
259261
version the compatibility version is set to. I.e. it must be possible to use
@@ -429,8 +431,7 @@ This might be a good place to talk about core concepts and how they relate.
429431
430432
### Risks and Mitigations
431433
432-
Risk: Introducing this change increases the maintenance burden on Kubernetes
433-
maintainers.
434+
#### Risk: Increased maintenance burden on Kubernetes maintainers
434435
435436
Why we think this is manageable:
436437
@@ -444,17 +445,21 @@ Why we think this is manageable:
444445
- Some maintenance becomes simpler as the additional version data about
445446
features makes them easier to reason about and keep track of.
446447
447-
<!--
448-
What are the risks of this proposal, and how do we mitigate? Think broadly.
449-
For example, consider both security and how this will impact the larger
450-
Kubernetes ecosystem.
448+
#### Risk: Unintended and out-of-allowance compatibility skew
451449
452-
How will security be reviewed, and by whom?
450+
From @deads2k: "I see an additional risk of unintended and out-of-allowance compatibility skew between binaries. A kube-apiserver and kube-controller-manager contract is still +/-1 (as far as I see here). This compatibility level, especially across three versions, makes it more likely for accidental mismatches.
453451
454-
How will UX be reviewed, and by whom?
452+
While a hard shutdown of a process is likely worse than the disease, exposing some sort of externally trackable signal for cluster-admins and describing how to use it could significantly mitigate the problem."
455453
456-
Consider including folks who also work outside the SIG or subproject.
457-
-->
454+
Possible mitigations:
455+
456+
- Clients send version numbers in request headers. Servers use this to detect
457+
out-of-allowance skew. Servers then surface this to cluster administrators.
458+
- Components register identity leases (apiserver already does this)
459+
https://github.com/kubernetes/enhancements/pull/4356 proposes doing it for
460+
controller managers. Components include their version information in the
461+
identity leases. A separate controller inspects all the leases for skew and
462+
surafces it to cluster administrators.
458463
459464
## Design Details
460465

0 commit comments

Comments
 (0)