quorum queue with no consumers and TTL can have continually increasing segment file size #10516
-
I'm running RabbitMQ 3.12.10 with Erlang 26.0.2 on macosx. The quorum queue set up is as follows:
Thus, the messages are endlessly looping from q1 to q2 and back to q1. The segment file for q2 grows endlessly. With the rates described above, the file increases in size by about 250 MB every minute or so. After 25 or so minutes this is the segment file size on disk:
I have tried specifically setting the dead letter routing key for each dead letter and leaving the routing key untouched. That does not appear to matter. If there is a time when q2 has 0 messages, the segment file is deleted, and a new one starts. With the setup described above, I ensure that there is always at least 1 message in q2. There is no growth of segment file size in q1. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 11 replies
-
So what is your question? A segment file is not guaranteed to be deleted immediately when the queue reaches 0 messages in ready state. Dead lettering has (and has always had) a loop detection mechanism but you can still construct a topology that would trick this basic protection mechanism. I'm afraid I do not understand the meaning of
without an executable example. Finally, messages of 50 MiB in size belong to a blob store. Then you can pass their ID(s) in the message. |
Beta Was this translation helpful? Give feedback.
-
Here is the question: why does the segment file grow endlessly? I would expect that a new segment file would get created once it exceeds the default max size. That does not happen. I have seen one file grow to 100 GB. Additionally, I would expect that once a message is dead lettered from q2 that it would no longer have a record of it. When that message comes back, it would be considered a 'new' message. Thus, there would be no growth of the segment files at all, save for the standard sawtooth growth and decay. Regarding dead letter implementation question, I meant that I specifically set the dead letter routing key in 1 test, and I did not set it in another test. That made no difference. |
Beta Was this translation helpful? Give feedback.
-
You certainly can exhaust disk space with queue without consumers, TTL and length limit but with publishers (and DLX is one special case of a publisher). The docs mention that QQ design assumes that eventually consumers do come online and begin consuming (and acknowledging deliveries).
Quorum queues have certain protection against faulty consumers in place but they can only do so much, There are other protection mechanisms in place, most notably the delivery acknowledgement timeout, that protect from buggy consumers that consume and never acknowledge. #10500 is somewhat relevant. If consumers in your system can go offline for long periods of time, then a queue length limit (they are very easy to reason about), or message TTL (a much more nuanced option), or both of them, are mandatory. Maybe even a TTL on the queue itself. |
Beta Was this translation helpful? Give feedback.
-
Repeated requeues is not exactly the same scenario but there are similarities, and the QQ docs explicitly state that such consumers have a chance of growing the Raft log indefinitely, since they never acknowledge (usually not by design, of course). You can use classic queues for these intermediary delays, they do not have a Raft-style log of operations and largely avoid this particular problem. They are also a non-replicated queue type since classic queue mirroring has been deprecated for years and this year, will be removed completely in RabbitMQ 4.x. Whether using a classic queue is acceptable for such "message staging", I don't know, but these are the design space constraints. If your system is entirely designed around such retries, you might be better off using a stream and re-consuming with the frequency that works best. At least technically they can be used as an alternative to retries. Frameworks like NServiceBus likely support their timed retries magic by setting up a sequence of queues, and likely queues of the same type. That sucks but if a queue never has any consumers and acknowledgements but has a message flow, it will run into the issue described in the doc section above. I see acknowledgements performed in the DLX worker module. If you can put together an executable way to reproduce, we should be able to trace where it is or is not called. |
Beta Was this translation helpful? Give feedback.
-
You need to set a much lower than the default |
Beta Was this translation helpful? Give feedback.
You need to set a much lower than the default
raft.max_segment_entries
configuration. This controls how many entries will be added to each segment. The default is 4096. I suggest you try 128.