A question about a quorum queue behavior on 3.13.0 #12404
Replies: 2 comments
-
RabbitMQ 3.13.x is out of community support. Even before 4.0.x shipped, we would not help you with anything on I'm afraid a screenshot with a few metrics that are all zeroes does not prove or demonstrate anything. The logs demonstrate that two nodes lost network connectivity and then a few quorum queues have had a leader election. Finally, "could not infer the number of file handles" has nothing to do with quorum queues per se. This is a message logged by an external/infrastructure node stats collector. |
Beta Was this translation helpful? Give feedback.
-
@GroovyRice GitHub does not send out notifications when an issue is moved to a discussion, so here is your notification. See my response above. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
This might not be the right place for this please change or reassign if necessary. Best to give some contextual understanding of the environment. It's a cluster of 4 nodes:
(2) on airgapped environment called rabbit@5BFVPMC01 and rabbit@5BFVPMC02
(2) not on an airgapped environment called rabbit@EPCVMSG01 and rabbit@EPCVMSG02
They can communicate with each other as all the ports required are open.
On the 27th we received this issue in the RabbitMQ log:
At 18:22:53.886000 it appears that connection was lost from the airgapped nodes to the others. Upon reconnection, quorums determined leaders. The interesting part is that certain queues like 5BF.PAMTransfers or 5BF.BurdenRecipe.Service started experiencing a peculiar issue. They were not receiving any messages for particular routing keys, this is because the quorum queue would start automatically acknowledging the messages and prevent the consumer from knowing what it is. To simulate that this was the issue, I dropped the consumer and the queue was acknowledging every message that was coming into it (didn't know the best way to show this was the case):

I resolved the issue by restarting all of the nodes. Not sure why this occurred and why for certain queues not all of them?
Hypothesis, but again not sure. These queues have a routing key that is frequently used. I wonder if mid way through consumption the airgapped environment dropped and on reconnection it keeps telling all nodes that the message has been acknowledged. So this caused an endless cycle. Unsure, but could be an idea. Not sure what to do to check this theory.
Reproduction steps
Unsure how to reproduce it, as its more of an edge case issue or could be a configuration issue on my behalf than a bug. Please move this to the correct label / branch if thats the case.
Expected behavior
Messages to not be automatically consumed by quorum queues until consumers themselves acknowledge the message.
Additional context
Please let me know if there is anything else I can provide to discern the issue more or accurately depict it better.
Beta Was this translation helpful? Give feedback.
All reactions