Quorum queue ignores some received acknowledgments when channel is closed #10328
-
Describe the bugWhen manually acknowledging messages received from a quorum queue, some acks sent immediately before the channel is closed are sometimes ignored by the RabbitMQ server - these messages will be returned to the queue instead of removed. I could not reproduce this on non-quorum queues. In my test code I'm first fetching all messages using basic.get, and sending all acks afterwards. I know this is not optimal usage, we could issue single ack with multiple flag, or perform ack after each get, etc. However, scenario which lead to this discovery was more complex with mix of acks and nacks. This way simply shows the issue more easily. AFAIK .Net client blocks on each ack call, so we should not proceed to channel close before server confirmed all these acks. I have a sample .Net code which demonstrates this issue https://github.com/grujicd/RabbitMQAckProblemTest
The workaround is to add some delay between last ack and channel close. Thread.Sleep(200) solves the problem in my setup. Reproduction steps
Expected behaviorAll of acked messages should be removed from the quorum queue. However, sometimes some messages are returned to the queue. Additional contextRabbitMQ 3.12.4 on Windows. RabbitMQ 3.12.10 on MacOS. RabbitMQ .Net client 6.8.1 (the latest one official). |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 6 replies
-
@grujicd your code creates a race condition between confirmations and channel closure. Depending on the timing of these events, the ack may or may not get to the node, the channel or eventually the queue leader. You are positively acknowledging the delivery, which means
As with all race conditions, you have to run this enough times to observe both behaviors. I do not see any evidence of a bug, and I see a very limited number of options for this potentially can be avoided, all potentially having negative effect on throughput with an unclear benefit. |
Beta Was this translation helpful? Give feedback.
-
Not only using To summarize:
This is not a bug in RabbitMQ, and I'm afraid this scenario, if I understand it correctly, is not something our team would be willing spend much time on. |
Beta Was this translation helpful? Give feedback.
-
Someone on our team suggests that this is important to mention: the above limitation means that consumers sometimes can get redeliveries (which will be explicitly marked as such in delivery properties). This is true in other scenarios: if a proper consumer (and not a polling loop) with a long lived channel loses its TCP connection to the node, it will get redeliveries and will have to be ready to deal with them. This is just one specific scenario or that, where the client itself initiates channel closure with pending consumer acknowledgements. In this specific case, with some adjustments all received but pending acks on the node could be processed by a queue as long as the queue replica or the entire node do not fail. But it will involve a lot of work and risks for a very specific and relatively unlikely scenario, and would by no means free consumer implementations from having to have an answer to the question "what do we do with redeliveries?" |
Beta Was this translation helpful? Give feedback.
-
I have added a note to the channel lifecycle section in the doc guide on channels. Should be live in a few minutes. |
Beta Was this translation helpful? Give feedback.
To clarify: RabbitMQ channels will begin processing the
basic.ack
frame beforebasic.close
, this is guaranteed.What is not guaranteed is that the channel can finish the first step before it is shut down by the connection, as the application instructs it to do. It does not coordinate received but pending ack processing with either its connection or the target queue, and making it coordinate with either or both is guaranteed to result in a throughput hit and suboptimal CPU utilization for no gain for the absolute majority of users.
Use long lived channels, as the docs explicitly recommend, and you will avoid this problem.