Consistency guarantees between publishers to a queue and conditional delete operations on the same queue #4747
Replies: 2 comments 11 replies
-
This feels like a natural race condition between the async publishing and consuming. Using queue TTL is a better solution for this problem. https://www.rabbitmq.com/ttl.html#queue-ttl |
Beta Was this translation helpful? Give feedback.
-
Publishers and consumers do not coordinate, and the publishing party in this scenario (a channel process) does not and should not coordinate with the queue leader. The channel can only guarantee And then there's always "in flight" data that has been read from the socket and parsed but hasn't This is a practical concurrent/distributed system consistency limitation we are willing to accept. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
A message gets confirmation (ACK) instead of return (RETURN) due to a race condition between publishing to the direct exchange bound to a single queue and deleting that queue with an empty flag.
The following test scenario describes the issue:
The experiment waits until all 1000 messages receive confirmed status and then until the queue becomes empty or deleted (so that the consumer has a chance to consume the messages). As the result of the experiment, I expect that all messages are published (and receive confirmations) by the publisher and are received by a consumer. However, few messages receive confirmation but never reach the consumer. The moment they receive ACK is very close to when the queue gets deleted. Therefore I believe there is a race condition between publishing and deleting a queue with the empty flag.
The experiment was done with rabbitmq-java-client version 5.14.2. The same experiment with the spring-amqp library produces the same results. Further, the experiment succeeds (publish and consume all the messages correctly) with RabbitMQ server with version 3.7.20-management and fails for 3.8.0-management, 3.8.23-management, 3.9.16-management, 3.10-rc-management - the issue is likely to be introduced in the 3.8.0 version.
I am attaching the project on which I was able to reproduce the issue for your reference. publishAndDelete.zip
The messages are numbers from 0 to 999. You'll see the similar logs with INFO logger level:
With DEBUG level, you'll see particular messages (that were published but did not reach the consumer) receives ACK with a very small time difference from successful queue deletion (search for 'message: ' for all the logs with that message, sometimes message gets return correctly, retries and then gets ack but never reaches a consumer):
Beta Was this translation helpful? Give feedback.
All reactions