@@ -434,21 +434,21 @@ We propose adding these new metrics, both to the old and new VolumeManager code:
434
434
in ASW (those are not reconstructed).
435
435
* ` force_cleaned_failed_volume_operations_total ` / ` force_cleaned_failed_volume_operation_errors_total ` : nr.
436
436
of all / unsuccessful cleanups of volumes that failed reconstruction.
437
- * ` orphaned_volumes_cleanup_errors_total ` : nr. of reports
437
+ * ` orphan_pod_cleaned_volumes_errors ` : nr. of pods that failed cleanup with errors
438
438
like ` orphaned pod "<uid>" found, but XYZ failed `
439
- ([ example] ( https://github.com/kubernetes/kubernetes/blob/4fac7486d41c033d6bba9dfeda2356e8189035cd/pkg/kubelet/kubelet_volumes.go#L215 ) ).
439
+ ([ example] ( https://github.com/kubernetes/kubernetes/blob/4fac7486d41c033d6bba9dfeda2356e8189035cd/pkg/kubelet/kubelet_volumes.go#L215 ) ) in the last sync .
440
440
These messages can be a symptom of failed reconstruction (e.g.
441
441
[ #105536 ] ( https://github.com/kubernetes/kubernetes/issues/105536 ) ).
442
442
Note that kubelet logs this periodically and bumping this metric periodically
443
443
would not be useful.
444
444
[ ` cleanupOrphanedPodDirs ` ] ( https://github.com/kubernetes/kubernetes/blob/4fac7486d41c033d6bba9dfeda2356e8189035cd/pkg/kubelet/kubelet_volumes.go#L168 )
445
445
needs to be changed to collect errors found during
446
446
one ` /var/lib/kubelet/pods/ ` check and report collected "nr of errors during
447
- the last housekeeping sweep (every 2 seconds)".
448
- * TODO: do we want to have a label to distinguish each error reason,
449
- e.g. "Pod found, but volumes are still mounted on disk" from say
450
- "orphaned pod %q found, but error occurred during reading of
451
- volume-subpaths dir from disk"?
447
+ the last housekeeping sweep (every 2 seconds)". There is no label that would
448
+ distinguish between each error cause.
449
+ * ` orphan_pod_cleaned_volumes ` : nr. of total pods that were attempted to be
450
+ cleaned up by ` cleanupOrphanedPodDirs ` in the last sync, both successful and
451
+ failed.
452
452
453
453
### Test Plan
454
454
0 commit comments