|
14 | 14 | - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
|
15 | 15 | - [Risks and Mitigations](#risks-and-mitigations)
|
16 | 16 | - [Design Details](#design-details)
|
| 17 | + - [Why we are directly enabling this feature in beta](#why-we-are-directly-enabling-this-feature-in-beta) |
17 | 18 | - [Test Plan](#test-plan)
|
18 | 19 | - [Graduation Criteria](#graduation-criteria)
|
19 | 20 | - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
|
|
39 | 40 | Items marked with (R) are required *prior to targeting to a milestone / release*.
|
40 | 41 |
|
41 | 42 | - [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
|
42 |
| -- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
43 |
| -- [ ] (R) Design details are appropriately documented |
44 |
| -- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
45 |
| -- [ ] (R) Graduation criteria is in place |
| 43 | +- [x] (R) KEP approvers have approved the KEP status as `implementable` |
| 44 | +- [x] (R) Design details are appropriately documented |
| 45 | +- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 46 | +- [x] (R) Graduation criteria is in place |
46 | 47 | - [ ] (R) Production readiness review completed
|
47 | 48 | - [ ] Production readiness review approved
|
48 | 49 | - [ ] "Implementation History" section is up-to-date for milestone
|
@@ -154,37 +155,27 @@ For this reason, we can do one alpha release where we log/event if we observe us
|
154 | 155 |
|
155 | 156 | ## Design Details
|
156 | 157 |
|
157 |
| -This can be implemented by modifying the defaults.go and validation.go components for the namespace API, i.e., in the defaults.go file for api/core: |
158 |
| - |
159 |
| -<<[UNRESOLVED See https://github.com/kubernetes/kubernetes/pull/96968 where this evolves, but this example is functionally valid, modulo warnings ]>> |
160 |
| - |
161 |
| -This will change if **warnings** are to be a core requirement of this KEP. As that is not yet one of the goals of this KEP, however, we have a first implementation of this feature using defaults.go. |
162 |
| - |
| 158 | +This can be implemented by modifying the defaults.go and validation.go components for the namespace API, i.e., in the defaults.go file for api/core. Note that we will support DISABLING this via a feature gate (named `NamespaceDefaultLabelName`), per the versioning specification in this KEP. |
163 | 159 |
|
164 | 160 | ```
|
165 |
| -func SetDefaults_Namespace(obj *v1.Namespace) { |
166 |
| - if obj.Labels == nil { |
167 |
| - obj.Labels = map[string]string{} |
168 |
| - } |
169 |
| - |
170 |
| - if _, found := obj.Labels[“kubernetes.io/metadata.name”]; !found { |
171 |
| - obj.Labels[“kubernetes.io/metadata.name”] = obj.Name |
172 |
| - } |
173 |
| -} |
| 161 | + obj.Labels["kubernetes.io/metadata.name"] = obj.Name |
174 | 162 | ```
|
175 |
| -In validation.go, we'll go on to validate this namespace, like so: |
176 | 163 |
|
| 164 | +In validation.go, we'll go on to validate this namespace, like so, again, allowing disablement of the feature per the versioning specification in this KEP. |
177 | 165 | ```
|
178 |
| -func ValidateNamespace(namespace *core.Namespace) field.ErrorList { |
179 |
| - // Check if namespace.Labels[“kubernetes.io/metadata.name”] == namespace.Name |
180 |
| - // if not add an error |
181 |
| - // ... |
182 |
| - return allErrs |
| 166 | + if namespace.Labels[v1.LabelNamespaceName] != namespace.Name { |
| 167 | + allErrs = append(allErrs, field.Invalid(field.NewPath("metadata", "labels").Key(v1.LabelNamespaceName), namespace.Labels[v1.LabelNamespaceName], fmt.Sprintf("must be %s", namespace.Name))) |
| 168 | + } |
183 | 169 | ```
|
184 |
| -<<[/UNRESOLVED]>> |
185 | 170 |
|
186 |
| -This solves not just NetworkPolicy, but all such namespace selector APIs without |
187 |
| -any disruption or version skew problems. |
| 171 | + |
| 172 | +This solves not just NetworkPolicy, but all such namespace selector APIs without any disruption or version skew problems. Note that this feature will be defaulted to on in its Beta phase, since it is expected to be benign in all practical scenarios. |
| 173 | + |
| 174 | +### Why we are directly enabling this feature in beta |
| 175 | + |
| 176 | +This feature is generally thought of us non-disruptive, but reliance on it will likely be taken up quickly due to its simplicity and its ability to solve a common user story for network security. Thus, if we have a complex process around disabling it initially, we could confuse applications which rely on this feature - which have no API driven manner of determining the status of the automatic labelling which we are doing. |
| 177 | + |
| 178 | +Consider the scenario where one uses a `not in` selector to exclude namespaces with specific names. In this scenario, such a selector would not be able to locate such namespaces, and would thus fail. There is no clear way a user would be able to reconcile this failure, due to the fact that its an apiserver default. Thus, its simpler to default to this feature being 'on' for any release after its introduction. |
188 | 179 |
|
189 | 180 | ### Test Plan
|
190 | 181 |
|
@@ -253,18 +244,30 @@ Since this has been reserved since 2017 (if not earlier), no backwards guarantee
|
253 | 244 |
|
254 | 245 | 1.22:
|
255 | 246 | - gate enabled by default
|
256 |
| -- remove gate |
257 | 247 | - release note
|
258 | 248 |
|
| 249 | +1.23: |
| 250 | +- remove the gate |
| 251 | + |
259 | 252 | ### Rollout, Upgrade and Rollback Planning
|
260 | 253 |
|
261 | 254 | _This section must be completed when targeting beta graduation to a release._
|
262 | 255 |
|
263 | 256 | * **How can a rollout fail? Can it impact already running workloads?**
|
264 |
| - Yes, network policies which are already using this label, may break, although such policies would be in violation of the standard that k8s.io labels are reserved. |
| 257 | + Yes, namespaces which are already targeted via this label may break, although such policies would be in violation of the standard that k8s.io labels are reserved. |
265 | 258 |
|
266 | 259 | * **What specific metrics should inform a rollback?**
|
267 |
| - No e2e tests or metrics exist to inform this, but if a net loss of healthy pods occurred and a user is actively using networkpolicies that impersonate on the k8s.io label, that would be a hint. |
| 260 | + No e2e tests or metrics exist to inform this specific action, but if a net loss of healthy pods occurred and a user is actively using networkpolicies that impersonate on the k8s.io label, that would be a hint. |
| 261 | + |
| 262 | + Indirectly, this could be discovered by inspection of APIServer rejection events associated with the namespace API resource. |
| 263 | + |
| 264 | +For example, the apisesrver_request_total metric can be inspected for "non 201" events. In rejection scenarios, the change in metrics is evident as follows... |
| 265 | +``` |
| 266 | +apiserver_request_total{code="201",component="apiserver",contentType="application/json",dry_run="",group="",resource="namespaces",scope="resource",subresource="",verb="POST",version="v1"} 9 |
| 267 | +apiserver_request_total{code="422",component="apiserver",contentType="application/json",dry_run="",group="",resource="namespaces",scope="resource",subresource="",verb="POST",version="v1"} 3 |
| 268 | +``` |
| 269 | + |
| 270 | +Thus, one can monitor for `code != 201` increases, and thereby infer a potential issue around namespace admission to the APIServer, and investigate/rollback as needed. |
268 | 271 |
|
269 | 272 | * **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
|
270 | 273 | Not yet. TBD.
|
@@ -340,16 +343,14 @@ _This section must be completed when targeting beta graduation to a release._
|
340 | 343 | * **How does this feature react if the API server and/or etcd is unavailable?**
|
341 | 344 |
|
342 | 345 | * **What are other known failure modes?**
|
343 |
| - For each of them, fill in the following information by copying the below template: |
344 |
| - - [Failure mode brief description] |
345 |
| - - Detection: How can it be detected via metrics? Stated another way: |
346 |
| - how can an operator troubleshoot without logging into a master or worker node? |
347 |
| - - Mitigations: What can be done to stop the bleeding, especially for already |
348 |
| - running user workloads? |
349 |
| - - Diagnostics: What are the useful log messages and their required logging |
350 |
| - levels that could help debug the issue? |
351 |
| - Not required until feature graduated to beta. |
352 |
| - - Testing: Are there any tests for failure mode? If not, describe why. |
| 346 | +There are no catastrophic failure scenarios of note, but there might be some user facing scenarios which are theoretically worth mentioning here. |
| 347 | +- If this label already existed, a user will get unforeseen consequences from it. |
| 348 | +- If administrators don't allow any labels on namespaces, or have special policies about labels that are allowed, this introduction could violate said poliices. |
| 349 | + |
| 350 | +* **What steps should be taken if SLOs are not being met to determine the |
| 351 | + problem?** |
| 352 | + |
| 353 | +No SLOs are proposed. |
353 | 354 |
|
354 | 355 | * **What steps should be taken if SLOs are not being met to determine the problem?**
|
355 | 356 |
|
|
0 commit comments