@@ -266,7 +266,7 @@ Each Ceph daemon provides an admin socket that bypasses the MONs (See
266266Running Various Ceph Tools
267267--------------------------------
268268
269- To run Ceph tools like ``ceph-objectstore-tool `` or
269+ To run Ceph tools such as ``ceph-objectstore-tool `` or
270270``ceph-monstore-tool ``, invoke the cephadm CLI with
271271``cephadm shell --name <daemon-name> ``. For example::
272272
@@ -283,98 +283,114 @@ To run Ceph tools like ``ceph-objectstore-tool`` or
283283 election_strategy: 1
284284 0: [v2:127.0.0.1:3300/0,v1:127.0.0.1:6789/0] mon.myhostname
285285
286- The cephadm shell sets up the environment in a way that is suitable
287- for extended daemon maintenance and running daemons interactively .
286+ The cephadm shell sets up the environment in a way that is suitable for
287+ extended daemon maintenance and for the interactive running of daemons .
288288
289289.. _cephadm-restore-quorum :
290290
291291Restoring the Monitor Quorum
292292----------------------------
293293
294- If the Ceph monitor daemons (mons) cannot form a quorum, cephadm will not be
295- able to manage the cluster until quorum is restored.
294+ If the Ceph Monitor daemons (mons) cannot form a quorum, `` cephadm `` will not
295+ be able to manage the cluster until quorum is restored.
296296
297297In order to restore the quorum, remove unhealthy monitors
298298form the monmap by following these steps:
299299
300- 1. Stop all mons. For each mon host::
300+ 1. Stop all Monitors. Use ``ssh `` to connect to each Monitor's host, and then
301+ while connected to the Monitor's host use ``cephadm `` to stop the Monitor
302+ daemon:
303+
304+ .. prompt :: bash
305+
306+ ssh {mon-host}
307+ cephadm unit --name {mon.hostname} stop
301308
302- ssh {mon-host}
303- cephadm unit --name mon.`hostname` stop
304309
310+ 2. Identify a surviving Monitor and log in to its host:
305311
306- 2. Identify a surviving monitor and log in to that host::
312+ .. prompt :: bash
307313
308- ssh {mon-host}
309- cephadm enter --name mon.` hostname`
314+ ssh {mon-host}
315+ cephadm enter --name { mon.hostname}
310316
311- 3. Follow the steps in :ref: `rados-mon-remove-from-unhealthy `
317+ 3. Follow the steps in :ref: `rados-mon-remove-from-unhealthy `.
312318
313319.. _cephadm-manually-deploy-mgr :
314320
315321Manually Deploying a Manager Daemon
316322-----------------------------------
317- At least one manager (mgr) daemon is required by cephadm in order to manage the
318- cluster. If the last mgr in a cluster has been removed, follow these steps in
319- order to deploy a manager called (for example)
320- ``mgr.hostname.smfvfd `` on a random host of your cluster manually.
323+ At least one Manager (``mgr ``) daemon is required by cephadm in order to manage
324+ the cluster. If the last remaining Manager has been removed from the Ceph
325+ cluster, follow these steps in order to deploy a fresh Manager on an arbitrary
326+ host in your cluster. In this example, the freshly-deployed Manager daemon is
327+ called ``mgr.hostname.smfvfd ``.
328+
329+ #. Disable the cephadm scheduler, in order to prevent ``cephadm `` from removing
330+ the new Manager. See :ref: `cephadm-enable-cli `:
331+
332+ .. prompt :: bash #
321333
322- Disable the cephadm scheduler, in order to prevent cephadm from removing the new
323- manager. See :ref: `cephadm-enable-cli `::
334+ ceph config-key set mgr/cephadm/pause true
324335
325- ceph config-key set mgr/cephadm/pause true
336+ #. Retrieve or create the "auth entry" for the new Manager:
326337
327- Then get or create the auth entry for the new manager::
338+ .. prompt :: bash #
328339
329- ceph auth get-or-create mgr.hostname.smfvfd mon "profile mgr" osd "allow *" mds "allow *"
340+ ceph auth get-or-create mgr.hostname.smfvfd mon "profile mgr" osd "allow *" mds "allow *"
330341
331- Get the ceph.conf: :
342+ #. Retrieve the Monitor's configuration :
332343
333- ceph config generate-minimal-conf
344+ .. prompt :: bash #
334345
335- Get the container image::
346+ ceph config generate-minimal-conf
336347
337- ceph config get "mgr.hostname.smfvfd" container_image
348+ #. Retrieve the container image:
338349
339- Create a file ``config-json.json `` which contains the information necessary to deploy
340- the daemon:
350+ .. prompt :: bash #
341351
342- .. code-block :: json
352+ ceph config get "mgr.hostname.smfvfd" container_image
343353
344- {
345- "config" : " # minimal ceph.conf for 8255263a-a97e-4934-822c-00bfe029b28f\n [global]\n\t fsid = 8255263a-a97e-4934-822c-00bfe029b28f\n\t mon_host = [v2:192.168.0.1:40483/0,v1:192.168.0.1:40484/0]\n " ,
346- "keyring" : " [mgr.hostname.smfvfd]\n\t key = V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4=\n "
347- }
354+ #. Create a file called ``config-json.json ``, which contains the information
355+ necessary to deploy the daemon:
348356
349- Deploy the daemon::
357+ .. code-block :: json
350358
351- cephadm --image <container-image> deploy --fsid <fsid> --name mgr.hostname.smfvfd --config-json config-json.json
359+ {
360+ "config" : " # minimal ceph.conf for 8255263a-a97e-4934-822c-00bfe029b28f\n [global]\n\t fsid = 8255263a-a97e-4934-822c-00bfe029b28f\n\t mon_host = [v2:192.168.0.1:40483/0,v1:192.168.0.1:40484/0]\n " ,
361+ "keyring" : " [mgr.hostname.smfvfd]\n\t key = V2VyIGRhcyBsaWVzdCBpc3QgZG9vZi4=\n "
362+ }
363+
364+ #. Deploy the Manager daemon:
365+
366+ .. prompt :: bash #
367+
368+ cephadm --image <container-image> deploy --fsid <fsid> --name mgr.hostname.smfvfd --config-json config-json.json
352369
353370Capturing Core Dumps
354371---------------------
355372
356- A Ceph cluster that uses cephadm can be configured to capture core dumps.
357- Initial capture and processing of the coredump is performed by
358- `systemd-coredump <https://www.man7.org/linux/man-pages/man8/systemd-coredump.8.html >`_.
373+ A Ceph cluster that uses ``cephadm `` can be configured to capture core dumps.
374+ The initial capture and processing of the coredump is performed by
375+ `systemd-coredump
376+ <https://www.man7.org/linux/man-pages/man8/systemd-coredump.8.html> `_.
359377
360378
361- To enable coredump handling, run:
379+ To enable coredump handling, run the following command
362380
363381.. prompt :: bash #
364382
365- ulimit -c unlimited
383+ ulimit -c unlimited
366384
367- Core dumps will be written to ``/var/lib/systemd/coredump ``.
368- This will persist until the system is rebooted.
369385
370386.. note ::
371387
372- Core dumps are not namespaced by the kernel, which means
373- they will be written to ``/var/lib/systemd/coredump `` on
374- the container host.
388+ Core dumps are not namespaced by the kernel. This means that core dumps are
389+ written to ``/var/lib/systemd/coredump `` on the container host. The `` ulimit
390+ -c unlimited `` setting will persist only until the system is rebooted.
375391
376- Now, wait for the crash to happen again. To simulate the crash of a daemon, run
377- e.g. ``killall -3 ceph-mon ``.
392+ Wait for the crash to happen again. To simulate the crash of a daemon, run for
393+ example ``killall -3 ceph-mon ``.
378394
379395
380396Running the Debugger with cephadm
@@ -383,45 +399,58 @@ Running the Debugger with cephadm
383399Running a single debugging session
384400~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
385401
386- One can initiate a debugging session using the ``cephadm shell `` command.
402+ Initiate a debugging session by using the ``cephadm shell `` command.
387403From within the shell container we need to install the debugger and debuginfo
388404packages. To debug a core file captured by systemd, run the following:
389405
390- .. prompt :: bash #
391406
392- # start the shell session
393- cephadm shell --mount /var/lib/system/coredump
394- # within the shell:
395- dnf install ceph-debuginfo gdb zstd
407+ #. Start the shell session:
408+
409+ .. prompt :: bash #
410+
411+ cephadm shell --mount /var/lib/system/coredump
412+
413+ #. From within the shell session, run the following commands:
414+
415+ .. prompt :: bash #
416+
417+ dnf install ceph-debuginfo gdb zstd
418+
419+ .. prompt :: bash #
420+
396421 unzstd /var/lib/systemd/coredump/core.ceph-*.zst
422+
423+ .. prompt :: bash #
424+
397425 gdb /usr/bin/ceph-mon /mnt/coredump/core.ceph-*.zst
398426
399- You can then run debugger commands at gdb's prompt.
427+ #. Run debugger commands at gdb's prompt:
428+
429+ .. prompt :: bash (gdb)
400430
401- .. prompt ::
431+ bt
432+
433+ ::
402434
403- (gdb) bt
404- #0 0x00007fa9117383fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
405- #1 0x00007fa910d7f8f0 in std::condition_variable: :wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
406- #2 0x00007fa913d3f48f in AsyncMessenger::wait() () from /usr/lib64/ceph/libceph-common.so.2
407- #3 0x0000563085ca3d7e in main ()
435+ #0 0x00007fa9117383fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
436+ #1 0x00007fa910d7f8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
437+ #2 0x00007fa913d3f48f in AsyncMessenger::wait() () from /usr/lib64/ceph/libceph-common.so.2
438+ #3 0x0000563085ca3d7e in main ()
408439
409440
410441Running repeated debugging sessions
411442~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
412443
413- When using ``cephadm shell ``, like in the example above, the changes made to
414- the container the shell command spawned are ephemeral. Once the shell session
415- exits all of the files that were downloaded and installed are no longer
416- available. One can simply re-run the same commands every time ``cephadm shell ``
417- is invoked, but in order to save time and resources one can create a new
418- container image and use it for repeated debugging sessions.
444+ When using ``cephadm shell ``, as in the example above, any changes made to the
445+ container that is spawned by the shell command are ephemeral. After the shell
446+ session exits, the files that were downloaded and installed cease to be
447+ available. You can simply re-run the same commands every time ``cephadm
448+ shell `` is invoked, but in order to save time and resources one can create a
449+ new container image and use it for repeated debugging sessions.
419450
420- In the following example we create a simple file for constructing the
421- container image. The command below uses podman but it should work correctly
422- if ``podman `` is replaced with ``docker ``.
423-
424- .. prompt :: bash
451+ In the following example, we create a simple file that will construct the
452+ container image. The command below uses podman but it is expected to work
453+ correctly even if ``podman `` is replaced with ``docker ``::
425454
426455 cat >Containerfile <<EOF
427456 ARG BASE_IMG=quay.io/ceph/ceph:v18
@@ -432,16 +461,17 @@ if ``podman`` is replaced with ``docker``.
432461 podman build -t ceph:debugging -f Containerfile .
433462 # pass --build-arg=BASE_IMG=<your image> to customize the base image
434463
435- The result should be a new local image named ``ceph:debugging ``. This image can
436- be used on the same machine that built it. Later, the image could be pushed to
437- a container repository, or saved and copied to a node runing other ceph
438- containers. Please consult the documentation for ``podman `` or ``docker `` for
439- more details on the general container workflow.
464+ The above file creates a new local image named ``ceph:debugging ``. This image
465+ can be used on the same machine that built it. The image can also be pushed to
466+ a container repository or saved and copied to a node runing other Ceph
467+ containers. Consult the ``podman `` or ``docker `` documentation for more
468+ information about the container workflow.
440469
441- Once the image has been built it can be used to initiate repeat debugging
442- sessions without having to re-install the debug tools and debuginfo packages.
443- To debug a core file using this image, in the same way as previously described,
444- run:
470+ After the image has been built, it can be used to initiate repeat debugging
471+ sessions. By using an image in this way, you avoid the trouble of having to
472+ re-install the debug tools and debuginfo packages every time you need to run a
473+ debug session. To debug a core file using this image, in the same way as
474+ previously described, run:
445475
446476.. prompt :: bash #
447477
@@ -451,29 +481,31 @@ run:
451481Debugging live processes
452482~~~~~~~~~~~~~~~~~~~~~~~~
453483
454- The gdb debugger has the ability to attach to running processes to debug them.
455- For a containerized process this can be accomplished by using the debug image
456- and attaching it to the same PID namespace as the process to be debugged.
484+ The gdb debugger can attach to running processes to debug them. This can be
485+ achieved with a containerized process by using the debug image and attaching it
486+ to the same PID namespace in which the process to be debugged resides .
457487
458- This requires running a container command with some custom arguments. We can generate a script that can debug a process in a running container.
488+ This requires running a container command with some custom arguments. We can
489+ generate a script that can debug a process in a running container.
459490
460491.. prompt :: bash #
461492
462493 cephadm --image ceph:debugging shell --dry-run > /tmp/debug.sh
463494
464- This creates a script with the container command cephadm would use to create a
465- shell. Now, modify the script by removing the ``--init `` argument and replace
466- that with the argument to join to the namespace used for a running running
467- container. For example, let's assume we want to debug the MGR, and have
468- determnined that the MGR is running in a container named
469- ``ceph-bc615290-685b-11ee-84a6-525400220000-mgr-ceph0-sluwsk ``. The new
470- argument
495+ This creates a script that includes the container command that `` cephadm ``
496+ would use to create a shell. Modify the script by removing the ``--init ``
497+ argument and replace it with the argument that joins to the namespace used for
498+ a running running container. For example, assume we want to debug the Manager
499+ and have determnined that the Manager is running in a container named
500+ ``ceph-bc615290-685b-11ee-84a6-525400220000-mgr-ceph0-sluwsk ``. In this case,
501+ the argument
471502``--pid=container:ceph-bc615290-685b-11ee-84a6-525400220000-mgr-ceph0-sluwsk ``
472503should be used.
473504
474- Now, we can run our debugging container with ``sh /tmp/debug.sh ``. Within the shell
475- we can run commands such as ``ps `` to get the PID of the MGR process. In the following
476- example this will be ``2 ``. Running gdb, we can now attach to the running process:
505+ We can run our debugging container with ``sh /tmp/debug.sh ``. Within the shell,
506+ we can run commands such as ``ps `` to get the PID of the Manager process. In
507+ the following example this is ``2 ``. While running gdb, we can attach to the
508+ running process:
477509
478510.. prompt :: bash (gdb)
479511
0 commit comments