Skip to content

Commit e989e57

Browse files
authored
Merge pull request ceph#54656 from zdover23/wip-doc-2023-11-25-rados-troubleshooting-mon-mon-store-failures-1-of-x
doc/rados: edit "monitor store failures" Reviewed-by: Anthony D'Atri <[email protected]>
2 parents 68e0d99 + 0a1ce00 commit e989e57

File tree

1 file changed

+23
-24
lines changed

1 file changed

+23
-24
lines changed

doc/rados/troubleshooting/troubleshooting-mon.rst

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -505,9 +505,9 @@ Monitor Store Failures
505505
Symptoms of store corruption
506506
----------------------------
507507

508-
Ceph monitors store the :term:`Cluster Map` in a key-value store. If key-value
509-
store corruption causes a monitor to fail, then the monitor log might contain
510-
one of the following error messages::
508+
Ceph Monitors maintain the :term:`Cluster Map` in a key-value store. If
509+
key-value store corruption causes a Monitor to fail, then the Monitor log might
510+
contain one of the following error messages::
511511

512512
Corruption: error in middle of record
513513

@@ -518,26 +518,25 @@ or::
518518
Recovery using healthy monitor(s)
519519
---------------------------------
520520

521-
If there are surviving monitors, we can always :ref:`replace
522-
<adding-and-removing-monitors>` the corrupted monitor with a new one. After the
523-
new monitor boots, it will synchronize with a healthy peer. After the new
524-
monitor is fully synchronized, it will be able to serve clients.
521+
If the cluster contains surviving Monitors, the corrupted Monitor can be
522+
:ref:`replaced <adding-and-removing-monitors>` with a new Monitor. After the
523+
new Monitor boots, it will synchronize with a healthy peer. After the new
524+
Monitor is fully synchronized, it will be able to serve clients.
525525

526526
.. _mon-store-recovery-using-osds:
527527

528528
Recovery using OSDs
529529
-------------------
530530

531531
Even if all monitors fail at the same time, it is possible to recover the
532-
monitor store by using information stored in OSDs. You are encouraged to deploy
533-
at least three (and preferably five) monitors in a Ceph cluster. In such a
534-
deployment, complete monitor failure is unlikely. However, unplanned power loss
535-
in a data center whose disk settings or filesystem settings are improperly
536-
configured could cause the underlying filesystem to fail and this could kill
537-
all of the monitors. In such a case, data in the OSDs can be used to recover
538-
the monitors. The following is such a script and can be used to recover the
539-
monitors:
540-
532+
Monitor store by using information that is stored in OSDs. You are encouraged
533+
to deploy at least three (and preferably five) Monitors in a Ceph cluster. In
534+
such a deployment, complete Monitor failure is unlikely. However, unplanned
535+
power loss in a data center whose disk settings or filesystem settings are
536+
improperly configured could cause the underlying filesystem to fail and this
537+
could kill all of the monitors. In such a case, data in the OSDs can be used to
538+
recover the Monitors. The following is a script that can be used in such a case
539+
to recover the Monitors:
541540

542541
.. code-block:: bash
543542
@@ -590,10 +589,10 @@ monitors:
590589
591590
This script performs the following steps:
592591
593-
#. Collects the map from each OSD host.
594-
#. Rebuilds the store.
595-
#. Fills the entities in the keyring file with appropriate capabilities.
596-
#. Replaces the corrupted store on ``mon.foo`` with the recovered copy.
592+
#. Collect the map from each OSD host.
593+
#. Rebuild the store.
594+
#. Fill the entities in the keyring file with appropriate capabilities.
595+
#. Replace the corrupted store on ``mon.foo`` with the recovered copy.
597596
598597
599598
Known limitations
@@ -605,15 +604,15 @@ The above recovery tool is unable to recover the following information:
605604
auth add`` command are recovered from the OSD's copy, and the
606605
``client.admin`` keyring is imported using ``ceph-monstore-tool``. However,
607606
the MDS keyrings and all other keyrings will be missing in the recovered
608-
monitor store. You might need to manually re-add them.
607+
Monitor store. It might be necessary to manually re-add them.
609608
610609
- **Creating pools**: If any RADOS pools were in the process of being created,
611610
that state is lost. The recovery tool operates on the assumption that all
612611
pools have already been created. If there are PGs that are stuck in the
613-
'unknown' state after the recovery for a partially created pool, you can
612+
``unknown`` state after the recovery for a partially created pool, you can
614613
force creation of the *empty* PG by running the ``ceph osd force-create-pg``
615-
command. Note that this will create an *empty* PG, so take this action only
616-
if you know the pool is empty.
614+
command. This creates an *empty* PG, so take this action only if you are
615+
certain that the pool is empty.
617616
618617
- **MDS Maps**: The MDS maps are lost.
619618

0 commit comments

Comments
 (0)