Quorum queue segment files growing unbounded #8860
-
Hi, We have been struggling with an issue where the disk space usage on RabbitMQ is growing uncontrollably. Inspecting the disk shows the space is consumed by an ever-growing number of segment files for a single quorum queue (i.e. inside one folder that has a config file stating the name of the queue) This is an active queue with data coming in and getting consumed, so while the symptoms sound similar to this: #6447, the conditions don't line up, as this is an active queue. The queue is not growing and does not have anything unusual (to us) like a growing number of unacknowledged messages. We are on version 3.8.12 at the moment, running a RabbitMQ cluster operator on Kubernetes. How would we go about figuring out what is filling up the space? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
RabbitMQ 3.8 has reached end of life one year ago. Most likely you have a consumer or a group of consumers that consumes but never acknowledges, that prevents log compaction from happening. |
Beta Was this translation helpful? Give feedback.
-
Another can be that you use MQTT and connection churn is fairly high. Directory name should hint at the log of MQTT client ID state versus quorum queues. This has been addressed in later versions In any case, the only path forward is to upgrade from 3.8 to the latest 3.9, then the latest 3.10 plus enabling all feature flags, then the latest 3.11 and enabling all feature flags one more time, then 3.12. Or just straight to 3.12 via a Blue/Green deployment. |
Beta Was this translation helpful? Give feedback.
-
Thank you Michael, Connection churn is another plausible explanation we considered, finding similar info. We concluded it's not related, since the number of connections is stable, and there's no indication of disconnects in the logs (which we have seen on other occasions). We were trying to peek into the segment files to see what's in them (connection info, or data, but also what kind of commands). There was a hint elsewhere that basic.get could also cause something similar. Because it's mostly binary it was inconclusive but hinted at queue payloads and contained queue names, so we thought it's unlikely to do with connections. The queues are AMQP 0.9.1. Although the client is not auto-ack, they are definitely acknowledging messages. We had to intervene today and tried purging the queue, but it didn't achieve anything. However deleting and re-creating the queue cleaned it all up. A bit drastic, but we had to put a stop to it. Also thank you for the upgrade hints. We were planning to catch-up with the version soon and the Blue/Green deployment is something to consider! |
Beta Was this translation helpful? Give feedback.
Another can be that you use MQTT and connection churn is fairly high. Directory name should hint at the log of MQTT client ID state versus quorum queues. This has been addressed in later versions
but more importantly, a Raft cluster is no longer used for client ID tracking as of 3.12.
In any case, the only path forward is to upgrade from 3.8 to the latest 3.9, then the latest 3.10 plus enabling all feature flags, then the latest 3.11 and enabling all feature flags one more time, then 3.12.
Or just straight to 3.12 via a Blue/Green deployment.