You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -214,7 +215,7 @@ Some more examples to compare memory.high using Alpha v1.22 and Alpha v1.27 are
214
215
215
216
###### Quality of Service for Pods
216
217
217
-
In addition to the change in formula for memory.high, we are also adding the support for memory.high to be set as per `Quality of Service(QoS) for Pod` classes. Based on user feedback in Alpha v1.22, some users would like to opt-out of MemoryQoS on a per pod basis to ensure there is no early memory throttling. By making user's pods guaranteed, they will be able to do so. Guaranteed pod ,by definition, are not overcommitted, so memory.high does not provide significant value.
218
+
In addition to the change in formula for memory.high, we are also adding the support for memory.high to be set as per `Quality of Service(QoS) for Pod` classes. Based on user feedback in Alpha v1.22, some users would like to opt-out of MemoryQoS on a per pod basis to ensure there is no early memory throttling. By making user's pods guaranteed, they will be able to do so. Guaranteed pod, by definition, are not overcommitted, so memory.high does not provide significant value.
218
219
219
220
Following are the different cases for setting memory.high as per QOS classes:
220
221
1. Guaranteed
@@ -271,6 +272,10 @@ Alternative solutions that were discussed (but not preferred) before finalizing
271
272
* It is simple to understand as it requires setting only 1 kubelet configuration for setting memory throttling factor.
272
273
* It doesn't involve API changes, and doesn't expose low-level detail to customers.
273
274
275
+
#### Beta v1.28
276
+
The feature is graduated to Beta in v1.28. Its implementation in Beta is same as Alpha
277
+
v1.27.
278
+
274
279
### User Stories (Optional)
275
280
#### Memory Sensitive Workload
276
281
Some workloads are sensitive to memory allocation and availability, slight delays may cause service outage. In this case, a mechanism is needed to ensure the quality of memory.
@@ -485,6 +490,9 @@ The test will be reside in `test/e2e_node`.
485
490
- Metrics and graphs to show the amount of reclaim done on a cgroup as it moves from below-request to above-request to throttling
486
491
- Memory QoS is covered by unit and e2e-node tests
487
492
- Memory QoS supports containerd, cri-o and dockershim
493
+
- Expose memory events e.g. memory.high field of memory.events which can inform
494
+
how many times memory.high was breached and the cgroup was throttled.
-[cgroup_v2](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2254-cgroup-v2) is in `GA`
@@ -538,7 +546,7 @@ Pick one of these and delete the rest.
538
546
Any change of default behavior may be surprising to users or break existing
539
547
automations, so be extremely careful here.
540
548
-->
541
-
Yes, the kubelet will set `memory.min` for Guaranteed and Burstable pod/container level cgroup. It also will set `memory.high` for burstable container, which may cause memory allocation throttle. `memory.min` for qos or node level cgroup will be set when `--cgroups-per-qos` or `--enforce-node-allocatable` is satisfied.
549
+
Yes, the kubelet will set `memory.min` for Guaranteed and Burstable pod/container level cgroup. It also will set `memory.high` for burstable and best effort containers, which may cause memory allocation to be slowed down is the memory usage level in the containers reaches `memory.high` level. `memory.min` for qos or node level cgroup will be set when `--cgroups-per-qos` or `--enforce-node-allocatable` is satisfied.
542
550
543
551
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
544
552
@@ -568,6 +576,11 @@ Yes, some unit tests are exercised with the feature both enabled and disabled to
568
576
<!--
569
577
This section must be completed when targeting beta to a release.
570
578
-->
579
+
N/A
580
+
There's no API change involved. MemoryQos is a kubelet level flag, that will be enabled by default in Beta.
581
+
It doesn't require any special opt-in by the user in their PodSpec.
582
+
583
+
The kubelet will reconcile `memory.min/memory.high` with related cgroups depending on whether the feature gate is enabled or not separately for each node.
571
584
572
585
###### How can a rollout or rollback fail? Can it impact already running workloads?
573
586
@@ -580,6 +593,10 @@ feature flags will be enabled on some API servers and not others during the
580
593
rollout. Similarly, consider large clusters and how enablement/disablement
581
594
will rollout across nodes.
582
595
-->
596
+
Already running workloads will not have `memory.min/memory.high` set at Pod level. Only `memory.min` will be
597
+
set at Node level cgroup when the kubelet restarts. The existing workloads will be impacted only when kernel
598
+
isn't able to maintain at least `memory.min` level of memory for the non-guaranteed workloads within the
599
+
Node level cgroup.
583
600
584
601
###### What specific metrics should inform a rollback?
585
602
@@ -601,6 +618,7 @@ are missing a bunch of machinery and tooling and can't do that now.
601
618
<!--
602
619
Even if applying deprecation policies, they may still surprise some users.
603
620
-->
621
+
No
604
622
605
623
### Monitoring Requirements
606
624
@@ -619,6 +637,8 @@ checking if there are objects with field X set) may be a last resort. Avoid
619
637
logs or events for this purpose.
620
638
-->
621
639
640
+
An operator could run ls `/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod<SOME_ID>.slice` on a node with cgroupv2 enabled to confirm the presence of `memory.min` file which tells us that the feature is in use by the workloads.
641
+
622
642
###### How can someone using this feature know that it is working for their instance?
623
643
624
644
<!--
@@ -630,13 +650,15 @@ and operation of this feature.
630
650
Recall that end users cannot usually observe component logs or access metrics.
631
651
-->
632
652
633
-
-[] Events
653
+
-[] Events
634
654
- Event Reason:
635
655
-[ ] API .status
636
656
- Condition name:
637
657
- Other field:
638
-
-[ ] Other (treat as last resort)
639
-
- Details:
658
+
-[X] Other (treat as last resort)
659
+
- Details: Kernel memory events will be available in kubelet logs via cadvisor.
660
+
These events will inform about the number of times `memory.min` and `memory.high`
661
+
levels were breached.
640
662
641
663
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
642
664
@@ -654,6 +676,7 @@ high level (needs more precise definitions) those may be things like:
654
676
These goals will help you determine what you need to measure (SLIs) in the next
655
677
question.
656
678
-->
679
+
N/A. Same as when running without this feature.
657
680
658
681
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
659
682
@@ -665,15 +688,16 @@ Pick one more of these and delete the rest.
665
688
- Metric name:
666
689
-[Optional] Aggregation method:
667
690
- Components exposing the metric:
668
-
-[] Other (treat as last resort)
669
-
- Details:
691
+
-[X] Other (treat as last resort)
692
+
- Details: Not a service
670
693
671
694
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
672
695
673
696
<!--
674
697
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
675
698
implementation difficulties, etc.).
676
699
-->
700
+
No
677
701
678
702
### Dependencies
679
703
@@ -697,6 +721,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
697
721
- Impact of its outage on the feature:
698
722
- Impact of its degraded performance or high-error rates on the feature:
699
723
-->
724
+
The container runtime must also support cgroup v2
700
725
701
726
### Scalability
702
727
@@ -835,7 +860,8 @@ For each of them, fill in the following information by copying the below templat
0 commit comments