Skip to content

Commit 5e7a3c2

Browse files
committed
mgr/cephadm: make SMB and NVMEoF upgrade last in staggered upgrade
This needs to happen as some work on the NVMEoF side (still unmerged as of writing this) will make the NVMEoF daemon dependent on the mon. Prior to this patch, in a staggered upgrade, all daemons not using the ceph image were upgraded after the mgr since we typically only care about the default image changing or potential changes to how we handle our systemd units which only needs the mgr to be upgraded to be applied. This NVMEoF dependency on the mon changes this and we can no longer upgrade it directly after the mgr. This patch changes it so the NVMEoF daemon is instead upgraded after all ceph image daemons have been upgraded in a staggered upgrade scenario. Non-staggered upgrades are unaffected as the NVMEoF daemon was already upgraded near the end in that scenario. The SMB dameon has no reason it needs to be upgraded later, but it's in the (small) pool of daemons that don't use the ceph image and aren't for monitoring, so it's been affected by this as well. NOTE: This is a bit of an ugly patch imo and shows that a refactoring of the upgrade code is likely required. Hopefully this patch is more of a stopgap until that larger effort can be made Fixes: https://tracker.ceph.com/issues/65809 Signed-off-by: Adam King <[email protected]>
1 parent 8a001cb commit 5e7a3c2

File tree

1 file changed

+19
-3
lines changed

1 file changed

+19
-3
lines changed

src/pybind/mgr/cephadm/upgrade.py

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from cephadm.serve import CephadmServe
1010
from cephadm.services.cephadmservice import CephadmDaemonDeploySpec
1111
from cephadm.utils import ceph_release_to_major, name_to_config_section, CEPH_UPGRADE_ORDER, \
12-
CEPH_TYPES, NON_CEPH_IMAGE_TYPES, GATEWAY_TYPES
12+
CEPH_TYPES, CEPH_IMAGE_TYPES, NON_CEPH_IMAGE_TYPES, MONITORING_STACK_TYPES, GATEWAY_TYPES
1313
from cephadm.ssh import HostConnectionError
1414
from orchestrator import OrchestratorError, DaemonDescription, DaemonDescriptionStatus, daemon_type_to_service
1515

@@ -1199,8 +1199,10 @@ def _do_upgrade(self):
11991199
upgraded_daemon_count += done
12001200
self._update_upgrade_progress(upgraded_daemon_count / len(daemons))
12011201

1202-
# make sure mgr and non-ceph-image daemons are properly redeployed in staggered upgrade scenarios
1203-
if daemon_type == 'mgr' or daemon_type in NON_CEPH_IMAGE_TYPES:
1202+
# make sure mgr and monitoring stack daemons are properly redeployed in staggered upgrade scenarios
1203+
# The idea here is to upgrade the mointoring daemons after the mgr is done upgrading as
1204+
# that means cephadm and the dashboard modules themselves have been upgraded
1205+
if daemon_type == 'mgr' or daemon_type in MONITORING_STACK_TYPES:
12041206
if any(d in target_digests for d in self.mgr.get_active_mgr_digests()):
12051207
need_upgrade_names = [d[0].name() for d in need_upgrade] + \
12061208
[d[0].name() for d in need_upgrade_deployer]
@@ -1214,6 +1216,20 @@ def _do_upgrade(self):
12141216
else:
12151217
# no point in trying to redeploy with new version if active mgr is not on the new version
12161218
need_upgrade_deployer = []
1219+
elif daemon_type in NON_CEPH_IMAGE_TYPES:
1220+
# Also handle daemons that are not on the ceph image but aren't monitoring daemons.
1221+
# This needs to be handled differently than the monitoring daemons as the nvmeof daemon,
1222+
# which falls in this category, relies on the mons being upgraded as well. This block
1223+
# sets these daemon types to be upgraded only when all ceph image daemons have been upgraded
1224+
if any(d in target_digests for d in self.mgr.get_active_mgr_digests()):
1225+
ceph_daemons = [d for d in self.mgr.cache.get_daemons() if d.daemon_type in CEPH_IMAGE_TYPES]
1226+
_, n1, n2, __ = self._detect_need_upgrade(ceph_daemons, target_digests, target_image)
1227+
if not n1 and not n2:
1228+
# no ceph daemons need upgrade
1229+
dds = [d for d in self.mgr.cache.get_daemons_by_type(
1230+
daemon_type) if d.name() not in need_upgrade_names]
1231+
_, ___, n2, ____ = self._detect_need_upgrade(dds, target_digests, target_image)
1232+
need_upgrade_deployer += n2
12171233

12181234
if any(d in target_digests for d in self.mgr.get_active_mgr_digests()):
12191235
# only after the mgr itself is upgraded can we expect daemons to have

0 commit comments

Comments
 (0)