Skip to content

Commit b2e9f38

Browse files
Merge pull request ceph#63181 from shraddhaag/wip-shraddhaag-availability-docs
docs: add release notes and docs for availability score feature
2 parents 1eea37e + 22200d6 commit b2e9f38

File tree

2 files changed

+55
-0
lines changed

2 files changed

+55
-0
lines changed

PendingReleaseNotes

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,14 @@
147147
`s3:GetObjectRetention` are also considered when fetching the source object.
148148
Replication of tags is controlled by the `s3:GetObject(Version)Tagging` permission.
149149

150+
* RADOS: A new command, `ceph osd pool availability-status`, has been added that allows
151+
users to view the availability score for each pool in a cluster. A pool is considered
152+
unavailable if any PG in the pool is not in active state or if there are unfound
153+
objects. Otherwise the pool is considered available. The score is updated every
154+
5 seconds. This feature is in tech preview.
155+
Related trackers:
156+
- https://tracker.ceph.com/issues/67777
157+
150158
>=19.2.1
151159

152160
* CephFS: Command `fs subvolume create` now allows tagging subvolumes through option

doc/rados/operations/monitoring.rst

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -738,3 +738,50 @@ Print active connections and their TCP round trip time and retransmission counte
738738
739739
248 89 1 mgr.0 863 1677 0
740740
3 86 2 mon.0 230 278 0
741+
742+
Tracking Data Availability Score of a Cluster
743+
=============================================
744+
745+
Ceph internally tracks the data availability of each pool in a cluster.
746+
To check the data availability score of each pool in a cluster,
747+
the following command can be invoked:
748+
749+
750+
.. prompt:: bash $
751+
752+
ceph osd pool availability-status
753+
754+
Example output:
755+
756+
.. prompt:: bash $
757+
758+
POOL UPTIME DOWNTIME NUMFAILURES MTBF MTTR SCORE AVAILABLE
759+
rbd 2m 21s 1 2m 21s 0.888889 1
760+
.mgr 86s 0s 0 0s 0s 1 1
761+
cephfs.a.meta 77s 0s 0 0s 0s 1 1
762+
cephfs.a.data 76s 0s 0 0s 0s 1 1
763+
764+
A pool is considered ``unavailable`` when at least one PG in the pool
765+
becomes inactive or there is at least one unfound object in the pool.
766+
Otherwise the pool is considered ``available``. Depending on the
767+
current and previous state of the pool we update ``uptime`` and
768+
``downtime`` values:
769+
770+
================ =============== =============== =================
771+
Previous State Current State Uptime Update Downtime Update
772+
================ =============== =============== =================
773+
Available Available +diff time no update
774+
Available Unavailable +diff time no update
775+
Unavailable Available +diff time no update
776+
Unavailable Unavailable no update +diff time
777+
================ =============== =============== =================
778+
779+
From the updated ``uptime`` and ``downtime`` values, we calculate
780+
the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR)
781+
for each pool. The availability score is then calculated by finding
782+
the ratio of MTBF to the total time.
783+
784+
The score is updated every five seconds. This interval is currently
785+
not configurable. Any intermittent changes to the pools that
786+
occur between this duration but are reset before we recheck the pool
787+
status will not be captured by this feature.

0 commit comments

Comments
 (0)