Single active consumers locked in "waiting" state #447
jonnepmyra
started this conversation in
General
Replies: 1 comment
-
|
Thanks for the report. We identified a bug that could be the cause of this deadlock, see rabbitmq/rabbitmq-server#15353. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
We’ve recently adopted RabbitMQ Streams (currently ~800 in production, some with up to 20 partitions) and they’ve been an excellent fit for multiple use cases.
However, we’re struggling with a scenario where all consumers in a consumer group end up “locked” in
waitingstate.We don’t have a structured way to reproduce this yet, but it typically occurs during RabbitMQ node restarts.
Cluster setup
What we observe
Given a SuperStream with 4 partitions (
1, 2, 3, 4), it’s almost always Partition 3 where consumers get stuck.Partition 3 has many consumer groups (in this example 11 groups).
Each group has 3 consumers (so 33 consumers total for that partition).
For each group, we therefore expect:
But from time to time, all consumers within a consumer group end up in
waiting.Once this happens, the client app never receives any Activation event in the .NET client’s
ConsumerUpdateListenercallback.This remains true even if we try:
waitingagainOnly workaround we’ve found
The only thing that recovers the system is to:
In other words:
Additional details
This screenshot shows the output of:
rabbitmq-streams list_stream_group_consumersfor the affected stream + consumer group:Beta Was this translation helpful? Give feedback.
All reactions