Skip to content

Commit 0638892

Browse files
committed
mgr/cephadm: make .nfs pool before trying to deploy nfs daemons
I was confused as to how the parallel deployment work seemed to be causing this, but it appears that's because it worked in a silly way before anyway. What it would do is try to apply an nfs spec, fail to deploy every daemon, make the ganesha pool in the service config function after failing to deploy every daemon, wait until the serve loop loops back around, fence all the daemons it attempted to deploy, and then finally actually deploy the daemons. NOTE: this would only happen with the first nfs service applied to the cluster, as the .nfs pool is global for all nfs services. Logs of the above happening on a cluster without the parallel deployment work but with one extra ERR level log line when we make the pool ``` 2025-06-06T13:46:51.045963+0000 mgr.vm-00.cfgvti [INF] Saving service nfs.foo spec with placement * 2025-06-06T13:46:51.156910+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.0.0.vm-02.idlvjq 2025-06-06T13:46:51.193775+0000 mgr.vm-00.cfgvti [INF] Ensuring 0 is in the ganesha grace table 2025-06-06T13:46:51.259610+0000 mgr.vm-00.cfgvti [WRN] ganesha-rados-grace tool failed: rados_pool_create: -1 Can't connect to cluster: -1 2025-06-06T13:46:51.286967+0000 mgr.vm-00.cfgvti [ERR] Failed while placing nfs.foo.0.0.vm-02.idlvjq on vm-02: grace tool failed: rados_pool_create: -1 Can't connect to cluster: -1 2025-06-06T13:46:51.287662+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.1.0.vm-00.vohpge 2025-06-06T13:46:51.320650+0000 mgr.vm-00.cfgvti [INF] Ensuring 1 is in the ganesha grace table 2025-06-06T13:46:51.386818+0000 mgr.vm-00.cfgvti [WRN] ganesha-rados-grace tool failed: rados_pool_create: -1 Can't connect to cluster: -1 2025-06-06T13:46:51.423210+0000 mgr.vm-00.cfgvti [ERR] Failed while placing nfs.foo.1.0.vm-00.vohpge on vm-00: grace tool failed: rados_pool_create: -1 Can't connect to cluster: -1 2025-06-06T13:46:51.424041+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.2.0.vm-01.wnains 2025-06-06T13:46:51.466844+0000 mgr.vm-00.cfgvti [INF] Ensuring 2 is in the ganesha grace table 2025-06-06T13:46:51.535651+0000 mgr.vm-00.cfgvti [WRN] ganesha-rados-grace tool failed: rados_pool_create: -1 Can't connect to cluster: -1 2025-06-06T13:46:51.564439+0000 mgr.vm-00.cfgvti [ERR] Failed while placing nfs.foo.2.0.vm-01.wnains on vm-01: grace tool failed: rados_pool_create: -1 Can't connect to cluster: -1 2025-06-06T13:46:51.570637+0000 mgr.vm-00.cfgvti [ERR] XXXXXXXXX Creating ganesha pool 2025-06-06T13:47:54.226334+0000 mgr.vm-00.cfgvti [INF] Fencing old nfs.foo.0.0.vm-02.idlvjq 2025-06-06T13:47:54.253066+0000 mgr.vm-00.cfgvti [INF] Fencing old nfs.foo.1.0.vm-00.vohpge 2025-06-06T13:47:54.305202+0000 mgr.vm-00.cfgvti [INF] Fencing old nfs.foo.2.0.vm-01.wnains 2025-06-06T13:47:54.347106+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.0.1.vm-02.rljjus 2025-06-06T13:47:54.364283+0000 mgr.vm-00.cfgvti [INF] Ensuring 0 is in the ganesha grace table 2025-06-06T13:47:54.489529+0000 mgr.vm-00.cfgvti [INF] Creating rados config object: conf-nfs.foo 2025-06-06T13:47:54.532110+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.0.1.vm-02.rljjus-rgw 2025-06-06T13:47:54.546062+0000 mgr.vm-00.cfgvti [WRN] Bind address in nfs.foo.0.1.vm-02.rljjus's ganesha conf is defaulting to empty 2025-06-06T13:47:54.551189+0000 mgr.vm-00.cfgvti [INF] Deploying daemon nfs.foo.0.1.vm-02.rljjus on vm-02 2025-06-06T13:47:55.670306+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.1.1.vm-00.mimqum 2025-06-06T13:47:55.685546+0000 mgr.vm-00.cfgvti [INF] Ensuring 1 is in the ganesha grace table 2025-06-06T13:47:58.750154+0000 mgr.vm-00.cfgvti [INF] Rados config object exists: conf-nfs.foo 2025-06-06T13:47:58.750212+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.1.1.vm-00.mimqum-rgw 2025-06-06T13:47:58.766454+0000 mgr.vm-00.cfgvti [WRN] Bind address in nfs.foo.1.1.vm-00.mimqum's ganesha conf is defaulting to empty 2025-06-06T13:47:58.767588+0000 mgr.vm-00.cfgvti [INF] Deploying daemon nfs.foo.1.1.vm-00.mimqum on vm-00 2025-06-06T13:48:00.030735+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.2.1.vm-01.sqhcdo 2025-06-06T13:48:00.073450+0000 mgr.vm-00.cfgvti [INF] Ensuring 2 is in the ganesha grace table 2025-06-06T13:48:03.043859+0000 mgr.vm-00.cfgvti [INF] Rados config object exists: conf-nfs.foo 2025-06-06T13:48:03.043996+0000 mgr.vm-00.cfgvti [INF] Creating key for client.nfs.foo.2.1.vm-01.sqhcdo-rgw 2025-06-06T13:48:03.073891+0000 mgr.vm-00.cfgvti [WRN] Bind address in nfs.foo.2.1.vm-01.sqhcdo's ganesha conf is defaulting to empty 2025-06-06T13:48:03.075855+0000 mgr.vm-00.cfgvti [INF] Deploying daemon nfs.foo.2.1.vm-01.sqhcdo on vm-01 2025-06-06T13:48:04.451854+0000 mgr.vm-00.cfgvti [ERR] XXXXXXXXX Creating ganesha pool ``` This commit changes the behavior so we try to make the .nfs pool when we're applying an nfs spec before attempting to deploy the daemons. The cephadm module remembers that it has made this pool and won't keep trying to do it again, until the module is restarted. Note that create_ganesha_pool in the nfs module is implemented to be a no-op if the .nfs pool already exists, so calling it again isn't a problem, we just want to avoid doing so repeatedly when we don't need to. Signed-off-by: Adam King <[email protected]> (cherry picked from commit 04a4599) Resolves: rhbz#2370541
1 parent d0c0b27 commit 0638892

File tree

3 files changed

+12
-6
lines changed

3 files changed

+12
-6
lines changed

src/pybind/mgr/cephadm/module.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -709,6 +709,8 @@ def __init__(self, *args: Any, **kwargs: Any):
709709
self.daemon_deploy_queue = DaemonDeployQueue()
710710
self.daemon_removal_queue = DaemonRemovalQueue()
711711

712+
self.created_ganesha_pool = False
713+
712714
def shutdown(self) -> None:
713715
self.log.debug('shutdown')
714716
self._worker_pool.close()
@@ -3169,6 +3171,12 @@ def _check_pool_exists(self, pool: str, service_name: str) -> None:
31693171
raise OrchestratorError(f'Cannot find pool "{pool}" for '
31703172
f'service {service_name}')
31713173

3174+
def create_nfs_pool(self) -> None: # type: ignore
3175+
from nfs.cluster import create_ganesha_pool
3176+
3177+
create_ganesha_pool(self)
3178+
self.created_ganesha_pool = True
3179+
31723180
def _add_daemon(self,
31733181
daemon_type: str,
31743182
spec: ServiceSpec) -> List[str]:

src/pybind/mgr/cephadm/serve.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -852,6 +852,10 @@ def _apply_service(self, spec: ServiceSpec) -> Optional[Dict[int, Dict[int, Opti
852852
# return a solid indication
853853
return None
854854

855+
if service_type == 'nfs':
856+
if not self.mgr.created_ganesha_pool:
857+
self.mgr.create_nfs_pool()
858+
855859
try:
856860
slots_to_add, daemons_to_remove, rank_map = self.discover_daemons_to_add_and_remove_by_service(spec)
857861
self.mgr.daemon_deploy_queue.add_to_queue([(daemon_to_add, spec) for daemon_to_add in slots_to_add])

src/pybind/mgr/cephadm/services/nfs.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -61,12 +61,6 @@ def fence_old_ranks(self,
6161
del rank_map[rank][gen]
6262
self.mgr.spec_store.save_rank_map(spec.service_name(), rank_map)
6363

64-
def config(self, spec: NFSServiceSpec) -> None: # type: ignore
65-
from nfs.cluster import create_ganesha_pool
66-
67-
assert self.TYPE == spec.service_type
68-
create_ganesha_pool(self.mgr)
69-
7064
def prepare_create(self, daemon_spec: CephadmDaemonDeploySpec) -> CephadmDaemonDeploySpec:
7165
assert self.TYPE == daemon_spec.daemon_type
7266
daemon_spec.final_config, daemon_spec.deps = self.generate_config(daemon_spec)

0 commit comments

Comments
 (0)