You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/3545-improved-multi-numa-alignment/README.md
+29-9Lines changed: 29 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,18 +71,18 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
71
71
72
72
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
73
73
-[ ] (R) KEP approvers have approved the KEP status as `implementable`
74
-
-[] (R) Design details are appropriately documented
75
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
74
+
-[x] (R) Design details are appropriately documented
75
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
76
76
-[ ] e2e Tests for all Beta API Operations (endpoints)
77
77
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
78
78
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
79
-
-[] (R) Graduation criteria is in place
79
+
-[x] (R) Graduation criteria is in place
80
80
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
81
-
-[] (R) Production readiness review completed
81
+
-[x] (R) Production readiness review completed
82
82
-[ ] (R) Production readiness review approved
83
-
-[] "Implementation History" section is up-to-date for milestone
84
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
85
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
83
+
-[x] "Implementation History" section is up-to-date for milestone
84
+
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
85
+
-[x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
86
86
87
87
<!--
88
88
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -252,6 +252,7 @@ to implement this enhancement.
@@ -302,6 +303,12 @@ When an option graduates, its visibility should be moved to be controlled by the
302
303
The introduction of these feature gates gives us the ability to move the option to beta and later stable without implying that all available options are stable.
303
304
This approach is similliar to graduation criteria for `CPUManagerPolicyOptions` introduced [here](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2625-cpumanager-policies-thread-placement#graduation-criteria-of-options).
304
305
306
+
In 1.28 this feature is being promoted to Beta. We propose following changes to TopologyManager policy options default visibility:
307
+
308
+
-`TopologyManagerPolicyOptions` feature flag for enabling/disabling the entire feature will be enabled by default.
309
+
-`TopologyManagerPolicyBetaOptions` feature flag for enabling/disabling beta options will be enabled by default.
310
+
-`prefer-closest-numa-nodes` will be moved to Beta options.
311
+
305
312
The graduation Criteria of options is described below:
306
313
307
314
#### Graduation of Options to `Beta-quality` (non-hidden)
@@ -378,7 +385,7 @@ No.
378
385
379
386
###### How can an operator determine if the feature is in use by workloads?
380
387
381
-
Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option
388
+
Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option.
382
389
383
390
###### How can someone using this feature know that it is working for their instance?
384
391
@@ -434,14 +441,26 @@ No.
434
441
435
442
No.
436
443
444
+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
445
+
446
+
No.
447
+
437
448
### Troubleshooting
438
449
439
450
###### How does this feature react if the API server and/or etcd is unavailable?
451
+
440
452
N/A.
441
453
442
454
###### What are other known failure modes?
443
455
444
-
TBD.
456
+
There are 2 scenarios where Kubelet may fail to start due to using this feature:
457
+
458
+
- Bad policy option name or using policy option without enabling appropriate feature flag. we are emitting appropriate error message for this case,
459
+
Kubelet will fail to start and print error message what happened. To recover one just have to provide fix policy option name or disable/enable feature flags.
460
+
461
+
- Cadvisor is not exposing distances for NUMA domains. In this case Kubelet will fail with `error getting NUMA distances from cadvisor` message.
462
+
Reading NUMA distances is only performed when `prefer-clostest-numa-nodes` option is specified.
463
+
To recover one has to either disable `TopologyManagerPolicyOptions` feature-flag or stop using `prefer-closest-numa-nodes` option.
445
464
446
465
###### What steps should be taken if SLOs are not being met to determine the problem?
0 commit comments