Skip to content

Commit 61e04a6

Browse files
committed
graduate multiple sizes huge pages to GA
- Renamed release checklist item: 1.19tbd -> 1.19 as hugepages e2e tests have been implemented in 1.19 time frame - Updated release checklist and implementation history for release 1.22 - Updated kep.yaml - Added graduation criteria and test plan for HugePageStorageMediumSize - Added PRR questionnaire for HugePageStorageMediumSize
1 parent ac183b5 commit 61e04a6

File tree

3 files changed

+104
-4
lines changed

3 files changed

+104
-4
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 1539
2+
stable:
3+
approver: "@ehashman"

keps/sig-node/1539-hugepages/README.md

Lines changed: 91 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,21 @@
2222
- [Huge pages as shared memory](#huge-pages-as-shared-memory)
2323
- [NUMA](#numa)
2424
- [Graduation Criteria](#graduation-criteria)
25+
- [Graduation Criteria for HugePageStorageMediumSize](#graduation-criteria-for-hugepagestoragemediumsize)
2526
- [Test Plan](#test-plan)
27+
- [Test Plan for HugePageStorageMediumSize](#test-plan-for-hugepagestoragemediumsize)
28+
- [Production Readiness Review Questionnaire for HugePageStorageMediumSize](#production-readiness-review-questionnaire-for-hugepagestoragemediumsize)
29+
- [Monitoring requirements](#monitoring-requirements)
30+
- [Dependencies](#dependencies)
31+
- [Scalability](#scalability)
32+
- [Troubleshooting](#troubleshooting)
2633
- [Implementation History](#implementation-history)
2734
- [Version 1.8](#version-18)
2835
- [Version 1.9](#version-19)
2936
- [Version 1.14](#version-114)
3037
- [Version 1.18](#version-118)
31-
- [Version 1.19[TBD]](#version-119tbd)
38+
- [Version 1.19](#version-119)
39+
- [Version 1.22](#version-122)
3240
- [Release Signoff Checklist](#release-signoff-checklist)
3341
<!-- /toc -->
3442

@@ -534,6 +542,12 @@ locality guarantees as a feature of QoS. In particular, pods in the
534542
- E2E testing validating its usage.
535543
-- https://k8s-testgrid.appspot.com/sig-node-kubelet#node-kubelet-serial&include-filter-by-regex=Feature%3AHugePages
536544

545+
## Graduation Criteria for HugePageStorageMediumSize
546+
547+
- Reports of successful usage of the hugepage-<size> resources
548+
- E2E testing validating its usage
549+
-- https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-serial-gce-e2e-hugepages
550+
537551
## Test Plan
538552

539553
- A test plan will consist of the following tests
@@ -546,6 +560,75 @@ locality guarantees as a feature of QoS. In particular, pods in the
546560
- Test case will be added to cri-tools to be used in CRI runtime' test(CI).
547561
- here: https://github.com/kubernetes-sigs/cri-tools
548562

563+
## Test Plan for HugePageStorageMediumSize
564+
565+
- Promote existing HugePages E2E tests to conformance
566+
567+
## Production Readiness Review Questionnaire for HugePageStorageMediumSize
568+
### Monitoring requirements
569+
570+
* **How can an operator determine if the feature is in use by workloads?**
571+
An operator could use hugepages-<size> resource limits and emptydir
572+
mounts with medium: HugePage-<size> as described in the Kubernetes
573+
documentation at https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages
574+
575+
* **What are the SLIs (Service Level Indicators) an operator can use to determine.
576+
the health of the service?**
577+
- [ ] Metrics
578+
- Metric name:
579+
`kube_pod_resource_request` and `kube_pod_resource_limit` for hugepages-<size> resources indicates usage.
580+
- Components exposing the metric: kube-scheduler
581+
582+
Workload performance can be measured by existing system metrics provided by Kubernetes components and e.g. [node_exporter](https://github.com/prometheus/node_exporter)
583+
584+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
585+
586+
These will be set individually by application developers. This feature allows them to tune the performance of their workloads. See e.g. [Linux Huge Pages and virtual memory (VM) tuning](https://blog.yannickjaquier.com/linux/linux-hugepages-and-virtual-memory-vm-tuning.html)
587+
588+
* **Are there any missing metrics that would be useful to have to improve observability.
589+
of this feature?**
590+
No.
591+
592+
### Dependencies
593+
594+
* **Does this feature depend on any specific services running in the cluster?**
595+
No
596+
597+
### Scalability
598+
599+
* **Will enabling / using this feature result in any new API calls?**
600+
No.
601+
602+
* **Will enabling / using this feature result in introducing new API types?**
603+
No
604+
605+
* **Will enabling / using this feature result in any new calls to the cloud.
606+
provider?**
607+
No
608+
609+
* **Will enabling / using this feature result in increasing size or count of.
610+
the existing API objects?**
611+
No
612+
613+
* **Will enabling / using this feature result in increasing time taken by any.
614+
operations covered by [existing SLIs/SLOs]?**
615+
No
616+
617+
* **Will enabling / using this feature result in non-negligible increase of.
618+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
619+
No
620+
621+
### Troubleshooting
622+
623+
* **How does this feature react if the API server and/or etcd is unavailable?**
624+
No impact.
625+
626+
* **What are other known failure modes?**
627+
Not applicable.
628+
629+
* **What steps should be taken if SLOs are not being met to determine the problem?**
630+
A cluster admin can tune the HugePage requests allocated to a workload by changing the available sizes, use the default HugePages configuration, or disable HugePages on the workload entirely.
631+
549632
## Implementation History
550633

551634
### Version 1.8
@@ -565,9 +648,14 @@ using the feature without issue.
565648

566649
Extending of huge pages feature to support container isolation of huge pages and multiple sizes of huge pages.
567650

568-
### Version 1.19[TBD]
651+
### Version 1.19
652+
653+
Extending of huge pages test suite of E2E tests and cri-tools for enhancements after GA.
654+
655+
### Version 1.22
569656

570-
Extending of huge pages test suit of E2E tests and cri-tools for enhancements after GA.
657+
GA support of multiple huge page sizes proposed based on feedback from
658+
user community using the feature without issue.
571659

572660
## Release Signoff Checklist
573661
- \[x] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)

keps/sig-node/1539-hugepages/kep.yaml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,18 @@ reviewers:
1111
- "@vishnu"
1212
approvers:
1313
- "@dawnchen"
14+
prr-approvers:
15+
- "@ehashman"
16+
stage: stable
17+
latest-milestone: "v1.22"
18+
# The milestone at which this feature was, or is targeted to be, at each stage.
19+
milestone:
20+
alpha: "v1.18"
21+
beta: "v1.19"
22+
stable: "v1.22"
1423
editor: Derek Carr
1524
creation-date: 2019-01-29
16-
last-updated: 2019-03-05
25+
last-updated: 2021-05-12
1726
status: implemented
1827
see-also:
1928
replaces:

0 commit comments

Comments
 (0)