@@ -734,3 +734,72 @@ Purge ceph daemons from all hosts in the cluster
734734
735735 # For each host:
736736 cephadm rm-cluster --force --zap-osds --fsid <fsid>
737+
738+
739+ Replacing a device
740+ ==================
741+
742+ The ``ceph orch device replace `` command automates the process of replacing the underlying device of an OSD.
743+ Previously, this process required manual intervention at various stages.
744+ With this new command, all necessary operations are performed automatically, streamlining the replacement process
745+ and improving the overall user experience.
746+
747+ .. note :: This only supports LVM-based deployed OSD(s)
748+
749+ .. prompt :: bash #
750+
751+ ceph orch device replace <host> <device-path>
752+
753+ In the case the device being replaced is shared by multiple OSDs (eg: DB/WAL device shared by multiple OSDs), the orchestrator will warn you.
754+
755+ .. prompt :: bash #
756+
757+ [ceph: root@ceph /]# ceph orch device replace osd-1 /dev/vdd
758+
759+ Error EINVAL: /dev/vdd is a shared device.
760+ Replacing /dev/vdd implies destroying OSD(s): ['0', '1'].
761+ Please, *be very careful *, this can be a very dangerous operation.
762+ If you know what you are doing, pass --yes-i-really-mean-it
763+
764+ If you know what you are doing, you can go ahead and pass ``--yes-i-really-mean-it ``.
765+
766+ .. prompt :: bash #
767+
768+ [ceph: root@ceph /]# ceph orch device replace osd-1 /dev/vdd --yes-i-really-mean-it
769+ Scheduled to destroy osds: ['6', '7', '8'] and mark /dev/vdd as being replaced.
770+
771+ ``cephadm `` will make ``ceph-volume `` zap and destroy all related devices and mark the corresponding OSD as ``destroyed `` so the
772+ different OSD(s) ID(s) will be preserved:
773+
774+ .. prompt :: bash #
775+
776+ [ceph: root@ceph-1 /]# ceph osd tree
777+ ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
778+ -1 0.97659 root default
779+ -3 0.97659 host devel-1
780+ 0 hdd 0.29300 osd.0 destroyed 1.00000 1.00000
781+ 1 hdd 0.29300 osd.1 destroyed 1.00000 1.00000
782+ 2 hdd 0.19530 osd.2 up 1.00000 1.00000
783+ 3 hdd 0.19530 osd.3 up 1.00000 1.00000
784+
785+ The device being replaced is finally seen as ``being replaced `` preventing ``cephadm `` from redeploying the OSDs too fast:
786+
787+ .. prompt :: bash #
788+
789+ [ceph: root@ceph-1 /]# ceph orch device ls
790+ HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
791+ osd-1 /dev/vdb hdd 200G Yes 13s ago
792+ osd-1 /dev/vdc hdd 200G Yes 13s ago
793+ osd-1 /dev/vdd hdd 200G Yes 13s ago Is being replaced
794+ osd-1 /dev/vde hdd 200G No 13s ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
795+ osd-1 /dev/vdf hdd 200G No 13s ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
796+
797+ If for any reason you need to clear the 'device replace header' on a device, then you can use ``ceph orch device replace <host> <device> --clear ``:
798+
799+ .. prompt :: bash #
800+
801+ [ceph: root@devel-1 /]# ceph orch device replace devel-1 /dev/vdk --clear
802+ Replacement header cleared on /dev/vdk
803+ [ceph: root@devel-1 /]#
804+
805+ After that, ``cephadm `` will redeploy the OSD service spec within a few minutes (unless the service is set to ``unmanaged ``).
0 commit comments