Skip to content

Commit fde8cea

Browse files
committed
Add note for adding new metric
1 parent 1d0746e commit fde8cea

File tree

1 file changed

+9
-6
lines changed
  • keps/sig-storage/1790-recover-resize-failure

1 file changed

+9
-6
lines changed

keps/sig-storage/1790-recover-resize-failure/README.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -306,11 +306,8 @@ after expansion is complete even with older kubelet. No recovery from expansion
306306
_This section must be completed when targeting beta graduation to a release._
307307

308308
* **How can an operator determine if the feature is in use by workloads?**
309-
For a PVC that has undergone recovery from expansion failure successfully, it is not possible
310-
to identify the fact that - PVC used this feature. But for PVCs for which
311-
recovery failed even after reducing size, an operator can determine the feature in-use
312-
by looking at newly introduced `pvc.Status.ResizeStatus` field.
313-
309+
Any volume that has been recovered will emit a metric: `operation_operation_volume_recovery_total{state='success', volume_name='pvc-abce'}`.
310+
314311
* **What are the SLIs (Service Level Indicators) an operator can use to
315312
determine the health of the service?**
316313
- [ ] Metrics
@@ -340,7 +337,13 @@ _This section must be completed when targeting beta graduation to a release._
340337

341338
* **Are there any missing metrics that would be useful to have to improve
342339
observability if this feature?**
343-
Not applicable.
340+
We are planning to add new counter metrics that will record success and failure of recovery operations.
341+
In cases where recovery fails, the counter will forever be increasing until an admin action resolves the error.
342+
343+
Tentative name of metric is - `operation_operation_volume_recovery_total{state='success', volume_name='pvc-abce'}`
344+
345+
The reason of using PV name as a label is - we do not expect this feature to be used in a cluster very often
346+
and hence it should be okay to use name of PVs that were recovered this way.
344347

345348
### Dependencies
346349

0 commit comments

Comments
 (0)