Skip to content

ClusterShell.Propagation.RouteResolvingError: No route available to pm4-nod01 #566

@MarbolanGos

Description

@MarbolanGos

Hello,

When using clustershell with milkcheck, I have an error:

ClusterShell.Propagation.RouteResolvingError: No route available to pm4-nod01

The error comes as soon as I have a topology file.

The debug mode of milkcheck shows:

Traceback (most recent call last):bmc]
  File "/usr/lib/python3.6/site-packages/MilkCheck/UI/Cli.py", line 538, in execute
    self.manager.call_services(services, action, conf=self._conf)
  File "/usr/lib/python3.6/site-packages/MilkCheck/ServiceManager.py", line 173, in call_services
    self.run(action)
  File "/usr/lib/python3.6/site-packages/MilkCheck/Engine/Service.py", line 236, in run
    action_manager_self().run()
  File "/usr/lib/python3.6/site-packages/MilkCheck/Engine/Action.py", line 182, in run
    self._master_task.run()
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 877, in run
    self.resume(timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 831, in resume
    self._resume()
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 794, in _resume
    self._run(self.timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Task.py", line 404, in _run
    self._engine.run(timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 723, in run
    self.runloop(timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/EPoll.py", line 170, in runloop
    self.remove_stream(client, stream)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 520, in remove_stream
    self.remove(client)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 495, in remove
    self._remove(client, abort, did_timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Engine/Engine.py", line 483, in _remove
    client._close(abort=abort, timeout=did_timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Exec.py", line 142, in _close
    self.worker._check_fini()
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Exec.py", line 384, in _check_fini
    self._has_timeout)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Worker.py", line 55, in _eh_sigspec_invoke_compat
    return method(*args)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Propagation.py", line 417, in ev_close
    mw._relaunch(gateway)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Tree.py", line 404, in _relaunch
    self._launch(targets)
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Tree.py", line 265, in _launch
    next_hops = self._distribute(self.task.info("fanout"), nodes.copy())
  File "/usr/lib/python3.6/site-packages/ClusterShell/Worker/Tree.py", line 342, in _distribute
    for gw, dstset in self.router.dispatch(dst_nodeset):
  File "/usr/lib/python3.6/site-packages/ClusterShell/Propagation.py", line 106, in dispatch
    yield self.next_hop(host), host
  File "/usr/lib/python3.6/site-packages/ClusterShell/Propagation.py", line 141, in next_hop
    str(dst))
ClusterShell.Propagation.RouteResolvingError: No route available to pm4-nod01

I cannot reproduce the error using clush only:

$ clush --remote=no -u2 -bw pm4-nod01 hostname
---------------
pm4-nod01
---------------
mngt0-2
$ clush -u2 -bw pm4-nod01 hostname
---------------
pm4-nod01
---------------
pm4-nod01
$ cat /etc/clustershell/topology.conf
[routes]
mngt0-1: mngt0-2
mngt0-2: @compute

Python version 3.6.8

In order to have a temporary fix I did change this:

--- /usr/lib/python3.6/site-packages/ClusterShell/Propagation.py.orig   2023-06-27 15:00:39.099237135 +0200
+++ /usr/lib/python3.6/site-packages/ClusterShell/Propagation.py        2023-06-27 15:00:47.504344461 +0200
@@ -405,7 +405,7 @@ class PropagationChannel(Channel):
         self.logger.debug("ev_close rc=%s", self._rc) # may be None

         # NOTE: self._rc may be None if the communication channel has aborted
-        if self._rc != 0:
+        if self._rc != 0 and not self._rc == None:
             self.logger.debug("error on gateway %s (setup=%s)", gateway,
                               self.setup)
             self.task.router.mark_unreachable(gateway)

And this:

--- /bin/milkcheck.orig 2024-09-04 09:19:15.826180684 +0200
+++ /bin/milkcheck      2024-09-04 09:19:22.076099490 +0200
@@ -1,4 +1,4 @@
-#!/usr/libexec/platform-python
+#!/usr/bin/python3
 #
 # Copyright CEA (2011)
 #  Contributor: Jeremie TATIBOUET

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions