You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
remote/coordinator: refactor handling of orphaned resources
Labgrid currently uses bidirectional streams between coordinator and
client/exporter. For the client side this is a good fit, since the
client sends requests and the coordinator can answer directly. However
for the exporter we have a case where nested calls were used in the old
crossbar infrastructure, namely when re-acquiring a resource after the
exporter was offline but the place was kept acquired. We call these
orphaned resources. They replace the real resource on the coordinator
side until the resource can be reacquired on the respective exporter
after it has restarted.
With crossbar, when seeing the resource update, the coordinator could
directly call the exporter to acquire the resource for the specific
place. This was possible since crossbar did the RPC route handling and
arbitrary services connected to the crossbar could provide RPC calls to
the service. With GRPC, we are more constrained. Since we only have a
single Input/Output stream which needs to multiplex different objects,
nested calls are not directly supported, since the exporter side would
still wait for the coordinator to answer its own request.
A different approach to orphaned resource handling is required. The
coordinator now uses a loop where it checks the orphaned resources and
tries to reacquire them if the exporter reappears. This however
introduces another problem, the exporter can be under high load and thus
the acquire request from the coordinator can time out. In this case, we
need to abort the acquisition during a regular lock and in case of an
orphaned resource need to replace the orphaned resource with the
eventually acquired resource from the exporter.
We also need to handle the case where the exporter has an acquired
resource, but the place has been released in the meantime (perhaps due
to a timeout on a normal place acquire), the same poll loop handles this
in the coordinator as well.
All in all this means that the resource acquired state for each place is
not necessarily consistent on the coordinator, but will reach an
eventual consistent state. This should be sufficient, since exporter
restarts with orphaned resources should be relatively rare.
Signed-off-by: Jan Luebbe <[email protected]>
0 commit comments