Skip to content

Commit 22200d6

Browse files
committed
doc: address review comments
Signed-off-by: Shraddha Agrawal <[email protected]>
1 parent cae4231 commit 22200d6

File tree

1 file changed

+26
-13
lines changed

1 file changed

+26
-13
lines changed

doc/rados/operations/monitoring.rst

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -756,19 +756,32 @@ Example output:
756756
.. prompt:: bash $
757757

758758
POOL UPTIME DOWNTIME NUMFAILURES MTBF MTTR SCORE AVAILABLE
759-
rbd 2m 21s 1 2m 21s 0.888889 1
760-
.mgr 86s 0s 0 0s 0s 1 1
761-
cephfs.a.meta 77s 0s 0 0s 0s 1 1
762-
cephfs.a.data 76s 0s 0 0s 0s 1 1
759+
rbd 2m 21s 1 2m 21s 0.888889 1
760+
.mgr 86s 0s 0 0s 0s 1 1
761+
cephfs.a.meta 77s 0s 0 0s 0s 1 1
762+
cephfs.a.data 76s 0s 0 0s 0s 1 1
763763

764764
A pool is considered ``unavailable`` when at least one PG in the pool
765765
becomes inactive or there is at least one unfound object in the pool.
766-
Otherwise the pool is considered ``available``.
767-
768-
We first calculate the Mean Time Between Failures (MTBF) and
769-
Mean Time To Recover (MTTR) from the uptime and downtime recorded
770-
for each pool and arrive at the availability score
771-
by finding ratio of MTBF to total time (ie MTTR + MTBF).
772-
773-
The score is updated every 5 seconds. This interval is currently
774-
not configurable.
766+
Otherwise the pool is considered ``available``. Depending on the
767+
current and previous state of the pool we update ``uptime`` and
768+
``downtime`` values:
769+
770+
================ =============== =============== =================
771+
Previous State Current State Uptime Update Downtime Update
772+
================ =============== =============== =================
773+
Available Available +diff time no update
774+
Available Unavailable +diff time no update
775+
Unavailable Available +diff time no update
776+
Unavailable Unavailable no update +diff time
777+
================ =============== =============== =================
778+
779+
From the updated ``uptime`` and ``downtime`` values, we calculate
780+
the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR)
781+
for each pool. The availability score is then calculated by finding
782+
the ratio of MTBF to the total time.
783+
784+
The score is updated every five seconds. This interval is currently
785+
not configurable. Any intermittent changes to the pools that
786+
occur between this duration but are reset before we recheck the pool
787+
status will not be captured by this feature.

0 commit comments

Comments
 (0)