Skip to content

Commit aff2d5f

Browse files
committed
broker: provision dead brokers for flub replacement
Problem: there is no way to replace a node in Flux instance that goes down. Call overlay_flub_provision () when a rank goes offline so that the flub allocator can allocate its rank to a replacement. Unprovision ranks when they return to online.
1 parent 99a107b commit aff2d5f

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

src/broker/state_machine.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -836,6 +836,25 @@ static void broker_online_cb (flux_future_t *f, void *arg)
836836
return;
837837
}
838838

839+
/* A broker that drops out of s->quorum.online is provisioned
840+
* for replacement via flub, and unprovisioned if returns.
841+
*/
842+
if (s->quorum.online) {
843+
unsigned int id;
844+
id = idset_first (s->quorum.online);
845+
while (id != IDSET_INVALID_ID) { // online -> offline
846+
if (!idset_test (ids, id))
847+
(void)overlay_flub_provision (s->ctx->overlay, id, id, true);
848+
id = idset_next (s->quorum.online, id);
849+
}
850+
id = idset_first (ids);
851+
while (id != IDSET_INVALID_ID) { // offline -> online
852+
if (!idset_test (s->quorum.online, id))
853+
(void)overlay_flub_provision (s->ctx->overlay, id, id, false);
854+
id = idset_next (ids, id);
855+
}
856+
}
857+
839858
idset_destroy (s->quorum.online);
840859
s->quorum.online = ids;
841860
if (idset_count (s->quorum.online) >= s->quorum.size)

0 commit comments

Comments
 (0)