@@ -687,7 +687,9 @@ The feature can be disabled without any issues.
687
687
688
688
###### What happens if we reenable the feature if it was previously rolled back?
689
689
690
- Nothing interesting happens.
690
+ Nothing interesting happens. This feature changes how kubelet starts and how it
691
+ cleans volume mounts. It has no visible effect in any API object nor structure
692
+ of data / mount table in the host OS.
691
693
692
694
###### Are there any tests for feature enablement/disablement?
693
695
@@ -773,8 +775,6 @@ For GA, this section is required: approvers should be able to confirm the
773
775
previous answers based on experience in the field.
774
776
-->
775
777
776
- TODO whole chapter before GA.
777
-
778
778
###### How can an operator determine if the feature is in use by workloads?
779
779
780
780
<!--
@@ -783,6 +783,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
783
783
logs or events for this purpose.
784
784
-->
785
785
786
+ They can check if the FeatureGate is enabled on a node, e.g. by monitoring
787
+ ` kubernetes_feature_enabled ` metric. Or read kubelet logs.
788
+
786
789
###### How can someone using this feature know that it is working for their instance?
787
790
788
791
<!--
@@ -819,18 +822,30 @@ These goals will help you determine what you need to measure (SLIs) in the next
819
822
question.
820
823
-->
821
824
825
+ These two metrics are populated during kubelet startup:
826
+
827
+ * ` reconstructed_volumes_total{result="error"} ` should be zero. An error here
828
+ means that kubelet was not able to reconstruct its cache of mounted volumes
829
+ and appropriate volume plugin was not called to clean up a volume mount.
830
+ There could be a leaked file or directory on the filesystem.
831
+
832
+ * ` force_cleaned_failed_volumes_total{result="error"} ` should be zero. An error
833
+ here means that kubelet was not able to unmount a volume even with all
834
+ fallbacks it has. There * is* at least a leaked directory on the filesystem,
835
+ there could be also a leaked mount.
836
+
822
837
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
823
838
824
839
<!--
825
840
Pick one more of these and delete the rest.
826
841
-->
827
842
828
- - [ ] Metrics
843
+ - [X ] Metrics
829
844
- Metric name:
830
- - [ Optional ] Aggregation method:
831
- - Components exposing the metric:
832
- - [ ] Other (treat as last resort)
833
- - Details:
845
+ - ` reconstructed_volumes_total `
846
+ - ` force_cleaned_failed_volumes_total `
847
+ - ` orphaned_volumes_cleanup_errors_total `
848
+ - Components exposing the metric: kubelet
834
849
835
850
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
836
851
@@ -839,6 +854,8 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
839
854
implementation difficulties, etc.).
840
855
-->
841
856
857
+ No
858
+
842
859
### Dependencies
843
860
844
861
<!--
@@ -988,6 +1005,23 @@ For each of them, fill in the following information by copying the below templat
988
1005
989
1006
###### What steps should be taken if SLOs are not being met to determine the problem?
990
1007
1008
+ Check kubelet logs. There should be errors about a failed volume reconstruction,
1009
+ together with the directory where the volume was supposed to be mounted.
1010
+ Ensure that:
1011
+
1012
+ 1 . There is no Pod that uses the volume on the node.
1013
+ 2 . The directory of the volume is not mounted there.
1014
+ 3 . The directory and all its parents up to ` /var/lib/kubelet/pods/<uid>/volumes `
1015
+ are removed.
1016
+ 4 . If possible, locate global mount of the volume (if it exists) in
1017
+ ` /var/lib/kubelet/plugins/<volume plugin name> ` and unmount + remove it.
1018
+ The actual directory varies by volume plugin.
1019
+ * For CSI volumes, if the CSI driver supports ` NodeStageVolume ` CSI call,
1020
+ the location is ` /var/lib/kubelet/plugins/kubernetes.io/csi/<csi driver name>/<sha256sum of pv.spec.csi.volumeHandle>/globalmount ` .
1021
+ Otherwise, there is no global mount directory.
1022
+ * EmptyDir, Projected, DownwardAPI, Secrets and ConfigMaps do not have global
1023
+ mount directory.
1024
+
991
1025
## Implementation History
992
1026
993
1027
* 1.26: Alpha version was implemented as part of
0 commit comments