Skip to content

Commit 28621e6

Browse files
authored
Merge pull request #4379 from jsafrane/update-orphaned-metric
3746: Update orphaned pod metrics name
2 parents f451a19 + cc27706 commit 28621e6

File tree

1 file changed

+7
-7
lines changed
  • keps/sig-storage/3756-volume-reconstruction

1 file changed

+7
-7
lines changed

keps/sig-storage/3756-volume-reconstruction/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -434,21 +434,21 @@ We propose adding these new metrics, both to the old and new VolumeManager code:
434434
in ASW (those are not reconstructed).
435435
* `force_cleaned_failed_volume_operations_total` / `force_cleaned_failed_volume_operation_errors_total`: nr.
436436
of all / unsuccessful cleanups of volumes that failed reconstruction.
437-
* `orphaned_volumes_cleanup_errors_total`: nr. of reports
437+
* `orphan_pod_cleaned_volumes_errors`: nr. of pods that failed cleanup with errors
438438
like `orphaned pod "<uid>" found, but XYZ failed`
439-
([example](https://github.com/kubernetes/kubernetes/blob/4fac7486d41c033d6bba9dfeda2356e8189035cd/pkg/kubelet/kubelet_volumes.go#L215)).
439+
([example](https://github.com/kubernetes/kubernetes/blob/4fac7486d41c033d6bba9dfeda2356e8189035cd/pkg/kubelet/kubelet_volumes.go#L215)) in the last sync.
440440
These messages can be a symptom of failed reconstruction (e.g.
441441
[#105536](https://github.com/kubernetes/kubernetes/issues/105536)).
442442
Note that kubelet logs this periodically and bumping this metric periodically
443443
would not be useful.
444444
[`cleanupOrphanedPodDirs`](https://github.com/kubernetes/kubernetes/blob/4fac7486d41c033d6bba9dfeda2356e8189035cd/pkg/kubelet/kubelet_volumes.go#L168)
445445
needs to be changed to collect errors found during
446446
one `/var/lib/kubelet/pods/` check and report collected "nr of errors during
447-
the last housekeeping sweep (every 2 seconds)".
448-
* TODO: do we want to have a label to distinguish each error reason,
449-
e.g. "Pod found, but volumes are still mounted on disk" from say
450-
"orphaned pod %q found, but error occurred during reading of
451-
volume-subpaths dir from disk"?
447+
the last housekeeping sweep (every 2 seconds)". There is no label that would
448+
distinguish between each error cause.
449+
* `orphan_pod_cleaned_volumes`: nr. of total pods that were attempted to be
450+
cleaned up by `cleanupOrphanedPodDirs` in the last sync, both successful and
451+
failed.
452452

453453
### Test Plan
454454

0 commit comments

Comments
 (0)