-
Notifications
You must be signed in to change notification settings - Fork 4k
Prevent blocked groups in stream SAC with fine-grained status #13672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
acogoluegnes
merged 4 commits into
main
from
stream-sac-coordinator-status-instead-of-active-flag
Jun 17, 2025
Merged
Prevent blocked groups in stream SAC with fine-grained status #13672
acogoluegnes
merged 4 commits into
main
from
stream-sac-coordinator-status-instead-of-active-flag
Jun 17, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
62b7b64 to
255153f
Compare
481472d to
6b1a27c
Compare
fcb00f9 to
b80cec2
Compare
0f4d90a to
a243e5b
Compare
965ee4a to
a956572
Compare
2bfb092 to
6c3f27f
Compare
2b7dfc3 to
42f9878
Compare
b3e4b32 to
2fc5ca0
Compare
Member
|
I can't validate the code but I can't break the SAC anymore. The PR looks to me. |
2fc5ca0 to
d4c9049
Compare
aa2e159 to
5d3d4a4
Compare
kjnilsson
approved these changes
Jun 17, 2025
5d3d4a4 to
78b21c2
Compare
A boolean status in the stream SAC coordinator is not enough to follow
the evolution of a consumer. For example a former active consumer that
is stepping down can go down before another consumer in the group is
activated, letting the coordinator expect an activation request that
will never arrive, leaving the group without any active consumer.
This commit introduces 3 status: active (formerly "true"), waiting
(formerly "false"), and deactivating. The coordinator will now know when
a deactivating consumer goes down and will trigger a rebalancing to
avoid a stuck group.
This commit also introduces a status related to the connectivity state
of a consumer. The possible values are: connected, disconnected, and
presumed_down. Consumers are by default connected, they can become
disconnected if the coordinator receives a down event with a
noconnection reason, meaning the node of the consumer has been
disconnected from the other nodes. Consumers can become connected again when
their node joins the other nodes again.
Disconnected consumers are still considered part of a group, as they are
expected to come back at some point. For example there is no rebalancing
in a group if the active consumer got disconnected.
The coordinator sets a timer when a disconnection occurs. When the timer
expires, corresponding disconnected consumers pass into the "presumed
down" state. At this point they are no longer considered part of their
respective group and are excluded from rebalancing decision. They are expected
to get removed from the group by the appropriate down event of a
monitor.
So the consumer status is now a tuple, e.g. {connected, active}. Note
this is an implementation detail: only the stream SAC coordinator deals with
the status of stream SAC consumers.
2 new configuration entries are introduced:
* rabbit.stream_sac_disconnected_timeout: this is the duration in ms of the
disconnected-to-forgotten timer.
* rabbit.stream_cmd_timeout: this is the timeout in ms to apply RA commands
in the coordinator. It used to be a fixed value of 30 seconds. The
default value is still the same. The setting has been introduced to
make integration tests faster.
Fixes #14070
The clean-up of a stream connection state when a stream member goes down can remove subscriptions not affected by the member. The subscription state is removed from the connection, but the subscription is not removed from the SAC state (if the subscription is a SAC), because the subscription member PID does not match the down member PID. When the actual member of the subscription goes down, the subscription is no longer part of the state, so the clean-up does not find the subscription and does not remove it from the SAC state. This lets a ghost consumer in the corresponding SAC group. This commit makes sure only the affected subscriptions are removed from the state when a stream member goes down. Fixes #13961
Calls to the stream SAC coordinator can fail for various reason (e.g. a timeout because of a network partition). The stream reader does not take into account what the SAC coordinator returns and moves on even in case of errors. This can lead to inconsistent state for SAC groups. This commit changes this behavior by handling unexpected errors from the SAC coordinator and closing the connection. The client is expected to reconnect. This is safer than risking inconsistent state. Fixes #14040
New CLI command to trigger a rebalancing in a SAC group and activate a
consumer. This is a last resort solution if all consumers in a group
accidently end up in {connected, waiting} state.
The command re-uses an existing function, which only picks the consumer
that should be active. This means it does not try to "fix" the state
(e.g. removing a disconnected consumer because its node is definitely
gone from the cluster).
Fixes #14055
78b21c2 to
41acc11
Compare
acogoluegnes
added a commit
that referenced
this pull request
Jun 17, 2025
Prevent blocked groups in stream SAC with fine-grained status (backport #13672)
acogoluegnes
added a commit
to rabbitmq/rabbitmq-stream-java-client
that referenced
this pull request
Jun 17, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.