Skip to content

Commit a03d331

Browse files
committed
mgr/cephadm: continue in nfs service purge if grace file is already deleted
The test_nfs task we run in teuthology creates and removes a number of nfs clusters during the task. I think it's possible based on timing for it to end up in a situation where it tries to remove an nfs service before the grace file has been created. In that case, cephadm doesn't know it hasn't created the grace file and just repeatedly fails forever attempting to remove the nonexistent file. This patch adds handling for the error case where we get a nonzero rc but the error message implies the command failed because the file already does not exist. Fixes: https://tracker.ceph.com/issues/69736 Signed-off-by: Adam King <[email protected]>
1 parent 66a90a9 commit a03d331

File tree

1 file changed

+19
-7
lines changed
  • src/pybind/mgr/cephadm/services

1 file changed

+19
-7
lines changed

src/pybind/mgr/cephadm/services/nfs.py

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from ceph.deployment.service_spec import ServiceSpec, NFSServiceSpec
1515
from .service_registry import register_cephadm_service
1616

17-
from orchestrator import DaemonDescription
17+
from orchestrator import DaemonDescription, OrchestratorError
1818

1919
from cephadm.services.cephadmservice import AuthEntity, CephadmDaemonDeploySpec, CephService
2020

@@ -319,12 +319,24 @@ def purge(self, service_name: str) -> None:
319319
'--namespace', cast(str, spec.service_id),
320320
'rm', 'grace',
321321
]
322-
subprocess.run(
323-
cmd,
324-
stdout=subprocess.PIPE,
325-
stderr=subprocess.PIPE,
326-
timeout=10
327-
)
322+
try:
323+
result = subprocess.run(
324+
cmd,
325+
stdout=subprocess.PIPE,
326+
stderr=subprocess.PIPE,
327+
timeout=10
328+
)
329+
except Exception as e:
330+
err_msg = f'Got unexpected exception trying to remove ganesha grace file for nfs.{spec.service_id} service: {str(e)}'
331+
self.mgr.log.warning(err_msg)
332+
raise OrchestratorError(err_msg)
333+
if result.returncode:
334+
if "No such file" in result.stderr.decode('utf-8'):
335+
logger.info(f'Grace file for nfs.{spec.service_id} already deleted')
336+
else:
337+
err_msg = f'Failed to remove ganesha grace file for nfs.{spec.service_id} service: {result.stderr.decode("utf-8")}'
338+
self.mgr.log.warning(err_msg)
339+
raise OrchestratorError(err_msg)
328340

329341
def _haproxy_hosts(self) -> List[str]:
330342
# NB: Ideally, we would limit the list to IPs on hosts running

0 commit comments

Comments
 (0)