Skip to content

Commit 6956808

Browse files
committed
mgr/cephadm: don't mark nvmeof daemons without pool and group in name as stray
Cephadm's naming of these daemons always includes the pool and group name associated with the nvmeof service. Nvmeof recently has started to register with the cluster using names that don't include that, resulting in warnings likes ``` [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm stray daemon nvmeof.vm-01.hwwhfc on host vm-01 not managed by cephadm ``` where cephadm knew that nvmeof daemon as ``` [ceph: root@vm-00 /]# ceph orch ps --daemon-type nvmeof NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID nvmeof.foo.group1.vm-01.hwwhfc vm-01 *:5500,4420,8009,10008 stopped 5m ago 25m - - <unknown> <unknown> ``` Signed-off-by: Adam King <[email protected]>
1 parent 6ccf8a7 commit 6956808

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

src/pybind/mgr/cephadm/services/nvmeof.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,30 @@ def ok_to_stop(self,
220220
warn_message = f'It is presumed safe to stop {names}'
221221
return HandleCommandResult(0, warn_message, '')
222222

223+
def ignore_possible_stray(
224+
self, service_type: str, daemon_id: str, name: str
225+
) -> bool:
226+
if service_type == 'nvmeof':
227+
return False
228+
# Some newer versions of nvmeof will register with the cluster
229+
# with a name that does not include the pool or group name
230+
# getting us from "nvmeof.<pool>.<group>.<hostname>.<6-random-chars>"
231+
# to "nvmeof.<hostname>.<6-random-chars>"
232+
#
233+
# This isn't a perfect solution, but we're assuming here if the
234+
# random chars at the end of the daemon name match a daemon
235+
# we know, it's likely not a stray
236+
try:
237+
random_chars = daemon_id.split('.')[-1]
238+
except ValueError:
239+
logger.debug('got nvmeof daemon id: "%s" with no dots', daemon_id)
240+
return False
241+
for nvmeof_daemon in self.mgr.cache.get_daemons_by_type('nvmeof'):
242+
if nvmeof_daemon.name().endswith(random_chars):
243+
logger.debug('ignoring possibly stray nvmeof daemon: %s', name)
244+
return True
245+
return False
246+
223247
def post_remove(self, daemon: DaemonDescription, is_failed_deploy: bool) -> None:
224248
"""
225249
Called after the daemon is removed.

0 commit comments

Comments
 (0)