Skip to content

Commit 434d896

Browse files
committed
doc/rados: edit t-mon "common issues" (1 of x)
Edit the first part of the section "Most Common Monitor Issues" in doc/rados/troubleshooting/troublehsooting-mon.rst. Signed-off-by: Zac Dover <[email protected]>
1 parent db6fbc9 commit 434d896

File tree

1 file changed

+23
-18
lines changed

1 file changed

+23
-18
lines changed

doc/rados/troubleshooting/troubleshooting-mon.rst

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -180,38 +180,43 @@ the quorum is formed by only two monitors, and *c* is in the quorum as a
180180
Most Common Monitor Issues
181181
===========================
182182

183-
Have Quorum but at least one Monitor is down
184-
---------------------------------------------
183+
The Cluster Has Quorum but at Least One Monitor is Down
184+
-------------------------------------------------------
185185

186-
When this happens, depending on the version of Ceph you are running,
187-
you should be seeing something similar to::
186+
When the cluster has quorum but at least one monitor is down, ``ceph health
187+
detail`` returns a message similar to the following::
188188

189189
$ ceph health detail
190190
[snip]
191191
mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
192192

193-
**How to troubleshoot this?**
193+
**How do I troubleshoot a Ceph cluster that has quorum but also has at least one monitor down?**
194194

195-
First, make sure ``mon.a`` is running.
195+
#. Make sure that ``mon.a`` is running.
196196

197-
Second, make sure you are able to connect to ``mon.a``'s node from the
198-
other mon nodes. Check the TCP ports as well. Check ``iptables`` and
199-
``nf_conntrack`` on all nodes and ensure that you are not
200-
dropping/rejecting connections.
197+
#. Make sure that you can connect to ``mon.a``'s node from the
198+
other Monitor nodes. Check the TCP ports as well. Check ``iptables`` and
199+
``nf_conntrack`` on all nodes and make sure that you are not
200+
dropping/rejecting connections.
201201

202-
If this initial troubleshooting doesn't solve your problems, then it's
203-
time to go deeper.
202+
If this initial troubleshooting doesn't solve your problem, then further
203+
investigation is necessary.
204204

205205
First, check the problematic monitor's ``mon_status`` via the admin
206206
socket as explained in `Using the monitor's admin socket`_ and
207207
`Understanding mon_status`_.
208208

209-
If the monitor is out of the quorum, its state should be one of ``probing``,
210-
``electing`` or ``synchronizing``. If it happens to be either ``leader`` or
211-
``peon``, then the monitor believes to be in quorum, while the remaining
212-
cluster is sure it is not; or maybe it got into the quorum while we were
213-
troubleshooting the monitor, so check you ``ceph status`` again just to make
214-
sure. Proceed if the monitor is not yet in the quorum.
209+
If the Monitor is out of the quorum, then its state will be one of the
210+
following: ``probing``, ``electing`` or ``synchronizing``. If the state of
211+
the Monitor is ``leader`` or ``peon``, then the Monitor believes itself to be
212+
in quorum but the rest of the cluster believes that it is not in quorum. It
213+
is possible that a Monitor that is in one of the ``probing``, ``electing``,
214+
or ``synchronizing`` states has entered the quorum during the process of
215+
troubleshooting. Check ``ceph status`` again to determine whether the Monitor
216+
has entered quorum during your troubleshooting. If the Monitor remains out of
217+
the quorum, then proceed with the investigations described in this section of
218+
the documentation.
219+
215220

216221
**What if the state is ``probing``?**
217222

0 commit comments

Comments
 (0)