Skip to content

Commit 954f2cd

Browse files
authored
Merge pull request kubernetes#4079 from PiotrProkop/multi-numa-topology
Promote Improved multi-numa alignment in Topology Manager to beta
2 parents 39d691c + af6d25f commit 954f2cd

File tree

3 files changed

+34
-12
lines changed

3 files changed

+34
-12
lines changed
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3545
22
alpha:
3-
approver: "@johnbelamaric"
3+
approver: "@johnbelamaric"
4+
beta:
5+
approver: "@johnbelamaric"

keps/sig-node/3545-improved-multi-numa-alignment/README.md

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -71,18 +71,18 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
7171

7272
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
7373
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
74-
- [ ] (R) Design details are appropriately documented
75-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
74+
- [x] (R) Design details are appropriately documented
75+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
7676
- [ ] e2e Tests for all Beta API Operations (endpoints)
7777
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
7878
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
79-
- [ ] (R) Graduation criteria is in place
79+
- [x] (R) Graduation criteria is in place
8080
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
81-
- [ ] (R) Production readiness review completed
81+
- [x] (R) Production readiness review completed
8282
- [ ] (R) Production readiness review approved
83-
- [ ] "Implementation History" section is up-to-date for milestone
84-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
85-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
83+
- [x] "Implementation History" section is up-to-date for milestone
84+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
85+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
8686

8787
<!--
8888
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
@@ -252,6 +252,7 @@ to implement this enhancement.
252252
##### Unit tests
253253

254254
- `k8s.io/kubernetes/pkg/kubelet/cm/topologymanager`: `09-23-2022` - `92.4`
255+
- `k8s.io/kubernetes/pkg/kubelet/cm/topologymanager`: `06-12-2023` - `93.2`
255256

256257
##### Integration tests
257258

@@ -302,6 +303,12 @@ When an option graduates, its visibility should be moved to be controlled by the
302303
The introduction of these feature gates gives us the ability to move the option to beta and later stable without implying that all available options are stable.
303304
This approach is similliar to graduation criteria for `CPUManagerPolicyOptions` introduced [here](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2625-cpumanager-policies-thread-placement#graduation-criteria-of-options).
304305

306+
In 1.28 this feature is being promoted to Beta. We propose following changes to TopologyManager policy options default visibility:
307+
308+
- `TopologyManagerPolicyOptions` feature flag for enabling/disabling the entire feature will be enabled by default.
309+
- `TopologyManagerPolicyBetaOptions` feature flag for enabling/disabling beta options will be enabled by default.
310+
- `prefer-closest-numa-nodes` will be moved to Beta options.
311+
305312
The graduation Criteria of options is described below:
306313

307314
#### Graduation of Options to `Beta-quality` (non-hidden)
@@ -378,7 +385,7 @@ No.
378385

379386
###### How can an operator determine if the feature is in use by workloads?
380387

381-
Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option
388+
Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option.
382389

383390
###### How can someone using this feature know that it is working for their instance?
384391

@@ -434,14 +441,26 @@ No.
434441

435442
No.
436443

444+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
445+
446+
No.
447+
437448
### Troubleshooting
438449

439450
###### How does this feature react if the API server and/or etcd is unavailable?
451+
440452
N/A.
441453

442454
###### What are other known failure modes?
443455

444-
TBD.
456+
There are 2 scenarios where Kubelet may fail to start due to using this feature:
457+
458+
- Bad policy option name or using policy option without enabling appropriate feature flag. we are emitting appropriate error message for this case,
459+
Kubelet will fail to start and print error message what happened. To recover one just have to provide fix policy option name or disable/enable feature flags.
460+
461+
- Cadvisor is not exposing distances for NUMA domains. In this case Kubelet will fail with `error getting NUMA distances from cadvisor` message.
462+
Reading NUMA distances is only performed when `prefer-clostest-numa-nodes` option is specified.
463+
To recover one has to either disable `TopologyManagerPolicyOptions` feature-flag or stop using `prefer-closest-numa-nodes` option.
445464

446465
###### What steps should be taken if SLOs are not being met to determine the problem?
447466

@@ -450,3 +469,4 @@ N/A.
450469
## Implementation History
451470

452471
- 2021-09-26: KEP created
472+
- 2023-06-12: KEP updated for Beta release

keps/sig-node/3545-improved-multi-numa-alignment/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,12 @@ see-also: []
1414
replaces: []
1515

1616
# The target maturity stage in the current dev cycle for this KEP.
17-
stage: alpha
17+
stage: beta
1818

1919
# The most recent milestone for which work toward delivery of this KEP has been
2020
# done. This can be the current (upcoming) milestone, if it is being actively
2121
# worked on.
22-
latest-milestone: "v1.26"
22+
latest-milestone: "v1.28"
2323

2424
# The milestone at which this feature was, or is targeted to be, at each stage.
2525
milestone:

0 commit comments

Comments
 (0)