You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/1029-ephemeral-storage-quotas/README.md
+15-8Lines changed: 15 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -759,8 +759,8 @@ filesystem walk for better performance and accuracy.
759
759
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
760
760
761
761
Yes, but only for newly created pods.
762
-
- Existed Pods: If the pod was created with enforcing quota, disable the feature gate
763
-
will not change the running pod.
762
+
- Existed Pods: If the pod was created with enforcing quota, pod will not use the enforcing
763
+
quota after the feature gate is disabled.
764
764
- Newly Created Pods: After setting the feature gate to false, the newly created pod
765
765
will not use the enforcing quota.
766
766
@@ -798,9 +798,10 @@ If LocalStorageCapacityIsolationFSQuotaMonitoring is turned on but LocalStorageC
798
798
799
799
* **How can an operator determine if the feature is in use by workloads?**
800
800
801
-
- A cluster-admin can set kubelet on each node. If the feature gate is disabled, workloads on that node will not use it.
802
-
For example, run `xfs_quota -x -c 'report -h' /dev/sdc` to check quota settings in the device.
803
-
Check `spec.containers[].resources.limits.ephemeral-storage` of each container.
801
+
- In kubelet metrics, an operator can check the histgram metric `kubelet_volume_metric_collection_duration_seconds`
802
+
with metric_source equals "fsquota". If there is no `metric_source=fsquota`, this feature should be disabled.
803
+
- However, to figure out if a workload is use this feature, there is no direct way now and see more in below
804
+
methods of how to check fsquota settings on a node.
804
805
805
806
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
806
807
@@ -818,7 +819,12 @@ the health of the service?**
818
819
* **Are there any missing metrics that would be useful to have to improve observability of this feature? **
819
820
820
821
- Yes, there are no histogram metrics for each volume. The above metric was grouped by volume types because
821
-
the cost for every volume is too expensive.
822
+
the cost for every volume is too expensive. As a result, users cannot figure out if the feature is used by
823
+
a workload directly by the metrics. A cluster-admin can check kubelet configuration on each node. If the
824
+
feature gate is disabled, workloads on that node will not use it.
825
+
For example, run `xfs_quota -x -c 'report -h' /dev/sdc` to check quota settings in the device.
826
+
Check `spec.containers[].resources.limits.ephemeral-storage` of each container to compare.
827
+
822
828
823
829
### Dependencies
824
830
* **Does this feature depend on any specific services running in the cluster? **
@@ -872,8 +878,9 @@ details). For now, we leave it here.
872
878
873
879
###### What steps should be taken if SLOs are not being met to determine the problem?
874
880
875
-
- Restart kubelet and wait for 1 minute to make the SLOs clear.(The volume stats checking interval is determined by kubelet flag `volumeStatsAggPeriod`(default 1m).)
876
-
881
+
If the metrics shows some problems, we can check the log and quota dir with below commands.
882
+
- There will be warning logs([after the # is merged](https://github.com/kubernetes/kubernetes/pull/107490)) if volume calculation took too long than 1 second
883
+
- If quota is enabled, you can find the volume information and the process time with `time repquota -P /var/lib/kubelet -s -v`
0 commit comments