Filter out single active consumers from dead connection #13671
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[Why]
When re-evaluating single active consumer groups because of the DOWN connection message, the stream SAC coordinator may register
mod_callRA effects to send messages to other connections. These messages aims at activating or deactivating consumers.This works fine if one connection at a time dies, but when a node goes down, several connections may go down and the stream SAC coordinator may send messages to these dead connections. SAC groups can then get stuck, with only inactive consumers. This is because the coordinator considers only one connection during a group evaluation.
[How]
While evaluating the consumers of a SAC group after a DOWN message, the stream SAC coordinator not only remove the consumers of the "current" dead connection, but also checks if the consumer connections in the group are still alive and remove the consumers accordingly.
The consumers are preemptively removed from the group and so not considered during the evaluation of the new active consumer.