@@ -426,14 +426,14 @@ then periodically does:
426
426
Today, any errors during volume reconstruction are exposed only as log messages.
427
427
We propose adding these new metrics, both to the old and new VolumeManager code:
428
428
429
- * ` reconstructed_volumes_total ` with label ` result={success, error} ` : nr. of
430
- successfully / unsuccessfully reconstructed volumes.
429
+ * ` reconstruct_volume_operations_total ` / ` reconstruct_volume_operations_errors_total ` :
430
+ nr. of all / unsuccessfully reconstructed volumes.
431
431
* In the new VolumeManager code, this will include all volume mounts in
432
432
` /var/lib/kubelet/pods/*/volumes `
433
433
* In the old VolumeManager it will include only volumes that were not already
434
434
in ASW (those are not reconstructed).
435
- * ` force_cleaned_failed_volumes_total ` with label ` result={success, error} ` : nr.
436
- of successful / unsuccessful cleanups of volumes that failed reconstruction.
435
+ * ` force_cleaned_failed_volume_operations_total ` / ` force_cleaned_failed_volume_operation_errors_total ` : nr.
436
+ of all / unsuccessful cleanups of volumes that failed reconstruction.
437
437
* ` orphaned_volumes_cleanup_errors_total ` : nr. of reports
438
438
like ` orphaned pod "<uid>" found, but XYZ failed `
439
439
([ example] ( https://github.com/kubernetes/kubernetes/blob/4fac7486d41c033d6bba9dfeda2356e8189035cd/pkg/kubelet/kubelet_volumes.go#L215 ) ).
@@ -740,7 +740,10 @@ What signals should users be paying attention to when the feature is young
740
740
that might indicate a serious problem?
741
741
-->
742
742
743
- ` reconstructed_volumes_total ` , ` force_cleaned_failed_volumes_total ` ,
743
+ ` reconstruct_volume_operations_total ` ,
744
+ ` reconstruct_volume_operations_errors_total ` ,
745
+ ` force_cleaned_failed_volume_operations_total ` ,
746
+ ` force_cleaned_failed_volume_operation_errors_total ` ,
744
747
` orphaned_volumes_cleanup_errors_total `
745
748
746
749
See Observability in the detail design section. All newly introduced metrics
@@ -824,12 +827,12 @@ question.
824
827
825
828
These two metrics are populated during kubelet startup:
826
829
827
- * ` reconstructed_volumes_total{result="error"} ` should be zero. An error here
830
+ * ` reconstruct_volume_operations_errors_total ` should be zero. An error here
828
831
means that kubelet was not able to reconstruct its cache of mounted volumes
829
832
and appropriate volume plugin was not called to clean up a volume mount.
830
833
There could be a leaked file or directory on the filesystem.
831
834
832
- * ` force_cleaned_failed_volumes_total{result="error"} ` should be zero. An error
835
+ * ` force_cleaned_failed_volume_operation_errors_total ` should be zero. An error
833
836
here means that kubelet was not able to unmount a volume even with all
834
837
fallbacks it has. There * is* at least a leaked directory on the filesystem,
835
838
there could be also a leaked mount.
@@ -842,8 +845,10 @@ Pick one more of these and delete the rest.
842
845
843
846
- [X] Metrics
844
847
- Metric name:
845
- - ` reconstructed_volumes_total `
846
- - ` force_cleaned_failed_volumes_total `
848
+ - ` reconstruct_volume_operations_total `
849
+ - ` reconstruct_volume_operations_errors_total `
850
+ - ` force_cleaned_failed_volume_operations_total `
851
+ - ` force_cleaned_failed_volume_operation_errors_total `
847
852
- ` orphaned_volumes_cleanup_errors_total `
848
853
- Components exposing the metric: kubelet
849
854
0 commit comments