You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Components depending on the feature gate: kubelet
762
744
763
-
###### Does enabling the feature change any default behavior?
745
+
This feature uses project quotas to monitor emptyDir volume storage consumption
746
+
rather than filesystem walk for better performance and accuracy.
764
747
765
-
None. Behavior will not change.
766
-
When LocalStorageCapacityIsolation is enabled for local ephemeral storage and the backing filesystem for emptyDir volumes supports project quotas and they are enabled, use project quotas to monitor emptyDir volume storage consumption rather than filesystem walk for better performance and accuracy.
748
+
###### Does enabling the feature change any default behavior?
767
749
750
+
None. Behavior will not change. The change is the way to monitoring the volume
751
+
like ephemeral storage volumes and emptyDirs.
752
+
When LocalStorageCapacityIsolation is enabled for local ephemeral storage and the
753
+
backing filesystem for emptyDir volumes supports project quotas and they are enabled,
754
+
use project quotas to monitor emptyDir volume storage consumption rather than
755
+
filesystem walk for better performance and accuracy.
768
756
769
757
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
770
758
771
-
Yes. If the pod was created with enforcing quota, disable the feature gate will not change the running pod.
772
-
After setting the feature gate to false, the newly created pod will not use the enforcing quota.
759
+
Yes, but only for newly created pods.
760
+
- Existed Pods: If the pod was created with enforcing quota, disable the feature gate
761
+
will not change the running pod.
762
+
- Newly Created Pods: After setting the feature gate to false, the newly created pod
763
+
will not use the enforcing quota.
773
764
774
765
###### What happens if we reenable the feature if it was previously rolled back?
775
766
776
-
Performance changes. This feature uses project quotas to monitor emptyDir volume storage consumption rather than filesystem walk for better performance and accuracy.
767
+
Like above, after we reenable the feature, newly created pod will use this feature.
768
+
If a pod was created before rolling back, the pod will benifit from this feature as well.
777
769
778
770
###### Are there any tests for feature enablement/disablement?
779
771
780
-
Yes, test/e2e_node/quota_lsci_test.go
772
+
Yes, in `test/e2e_node/quota_lsci_test.go`
781
773
782
774
### Rollout, Upgrade and Rollback Planning
783
775
784
-
785
776
###### How can a rollout or rollback fail? Can it impact already running workloads?
786
777
787
-
None. The rollout/rollback will not impact running workloads.
778
+
No. The rollout/rollback will not impact running workloads.
788
779
789
780
###### What specific metrics should inform a rollback?
790
781
791
-
None. To see its status, read kubelet log for eviction related logs or using xfs_quota to check the quota settings.
782
+
`kubelet_volume_metric_collection_duration_seconds`was added since v1.24 for duration in
783
+
seconds to calculate volume stats. This metric can help to compare between fsquota
784
+
monitoring and `du` for disk usage.
792
785
793
786
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
794
787
795
-
Yes.
788
+
Yes. I tested it locally and fixed [a bug after restarting kubelet](https://github.com/kubernetes/kubernetes/pull/107302)
796
789
797
790
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
798
791
799
-
LocalStorageCapacityIsolationFSQuotaMonitoring should be turned on only if LocalStorageCapacityIsolation is enabled as well.
792
+
LocalStorageCapacityIsolationFSQuotaMonitoring should be turned on only if LocalStorageCapacityIsolationis enabled as well.
800
793
If LocalStorageCapacityIsolationFSQuotaMonitoring is turned on but LocalStorageCapacityIsolation is false, the check will be skipped.
801
794
802
795
### Monitoring Requirements
803
796
804
797
* **How can an operator determine if the feature is in use by workloads?**
798
+
805
799
- A cluster-admin can set kubelet on each node. If the feature gate is disabled, workloads on that node will not use it.
806
800
For example, run `xfs_quota -x -c 'report -h' /dev/sdc` to check quota settings in the device.
807
801
Check `spec.containers[].resources.limits.ephemeral-storage` of each container.
808
802
803
+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
804
+
805
+
- 99.9% of volume stats calculation will cost less than 1s or even 500ms.
806
+
It can be calculated by `kubelet_volume_metric_collection_duration_seconds` metrics.
807
+
809
808
* **What are the SLIs (Service Level Indicators) an operator can use to determine
810
809
the health of the service?**
811
-
- Set a quota for the specified volume and try to write to the volume to check if there is a limitation.
812
810
813
-
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
* **Are there any missing metrics that would be useful to have to improve observability of this feature? **
817
-
- Yes, there is a kubelet metrics `kubelet_evictions{eviction_signal="ephemeralpodfs.limit"}`([ALPHA] Cumulative number of pod evictions by eviction signal).
817
+
818
+
- Yes, there are no histogram metrics for each volume. The above metric was grouped by volume types because
819
+
the cost for every volume is too expensive.
818
820
819
821
### Dependencies
820
822
* **Does this feature depend on any specific services running in the cluster? **
821
-
- No.
823
+
824
+
- Yes, the feature depneds on project quotas. Once quotas are enabled, the xfs_quota tool can be used to
825
+
set limits and report on disk usage.
826
+
822
827
823
828
### Scalability
824
829
* **Will enabling / using this feature result in any new API calls?**
@@ -856,31 +861,32 @@ details). For now, we leave it here.
856
861
857
862
###### What are other known failure modes?
858
863
859
-
If the ephemeral storage limitation is reached, the pod will be evicted by kubelet.
864
+
1. If the ephemeral storage limitation is reached, the pod will be evicted by kubelet.
860
865
861
-
It should skip when the image is not configured correctly (unsupported FS or quota not enabled).
866
+
2. It should skip when the image is not configured correctly (unsupported FS or quota not enabled).
867
+
868
+
3. For "out of space" failure, kublet eviction should be triggered.
862
869
863
-
<!--
864
-
For each of them, fill in the following information by copying the below template:
865
-
- [Failure mode brief description]
866
-
- Detection: How can it be detected via metrics? Stated another way:
867
-
how can an operator troubleshoot without logging into a master or worker node?
868
-
- Mitigations: What can be done to stop the bleeding, especially for already
869
-
running user workloads?
870
-
- Diagnostics: What are the useful log messages and their required logging
871
-
levels that could help debug the issue?
872
-
Not required until feature graduated to beta.
873
-
- Testing: Are there any tests for failure mode? If not, describe why.
874
-
-->
875
870
876
871
###### What steps should be taken if SLOs are not being met to determine the problem?
877
872
873
+
- Restart kubelet and wait for 1 minute to make the SLOs clear.(The volume stats checking interval is determined by kubelet flag `volumeStatsAggPeriod`(default 1m).)
874
+
878
875
879
876
## Implementation History
880
877
881
878
### Version 1.15
882
879
883
-
` LocalStorageCapacityIsolationFSMonitoring`implemented at Alpha
880
+
- `LocalStorageCapacityIsolationFSMonitoring`implemented at Alpha
881
+
882
+
### Version 1.24
883
+
884
+
- `kubelet_volume_metric_collection_duration_seconds`metrics was added
885
+
- A bug that quota cannot work after kubelet restarted, was fixed
886
+
887
+
### Version 1.25
888
+
889
+
- Plan to promote `LocalStorageCapacityIsolationFSMonitoring` to Beta
0 commit comments