Quorum queue segment files growing unbounded #8860

goniko · 2023-07-13T23:28:15Z

goniko
Jul 13, 2023

Hi,

We have been struggling with an issue where the disk space usage on RabbitMQ is growing uncontrollably. Inspecting the disk shows the space is consumed by an ever-growing number of segment files for a single quorum queue (i.e. inside one folder that has a config file stating the name of the queue)

This is an active queue with data coming in and getting consumed, so while the symptoms sound similar to this: #6447, the conditions don't line up, as this is an active queue.

The queue is not growing and does not have anything unusual (to us) like a growing number of unacknowledged messages.

We are on version 3.8.12 at the moment, running a RabbitMQ cluster operator on Kubernetes.

How would we go about figuring out what is filling up the space?
Is it likely to just be an issue with compaction not kicking in for some reason? If so, is there any other way to force the compaction except purging the queue?

Answered by michaelklishin

Jul 14, 2023

Another can be that you use MQTT and connection churn is fairly high. Directory name should hint at the log of MQTT client ID state versus quorum queues. This has been addressed in later versions
but more importantly, a Raft cluster is no longer used for client ID tracking as of 3.12.

In any case, the only path forward is to upgrade from 3.8 to the latest 3.9, then the latest 3.10 plus enabling all feature flags, then the latest 3.11 and enabling all feature flags one more time, then 3.12.

Or just straight to 3.12 via a Blue/Green deployment.

View full answer

michaelklishin · 2023-07-14T03:50:02Z

michaelklishin
Jul 14, 2023
Maintainer

RabbitMQ 3.8 has reached end of life one year ago.

Most likely you have a consumer or a group of consumers that consumes but never acknowledges, that prevents log compaction from happening.

0 replies

michaelklishin · 2023-07-14T03:53:11Z

michaelklishin
Jul 14, 2023
Maintainer

Another can be that you use MQTT and connection churn is fairly high. Directory name should hint at the log of MQTT client ID state versus quorum queues. This has been addressed in later versions
but more importantly, a Raft cluster is no longer used for client ID tracking as of 3.12.

In any case, the only path forward is to upgrade from 3.8 to the latest 3.9, then the latest 3.10 plus enabling all feature flags, then the latest 3.11 and enabling all feature flags one more time, then 3.12.

Or just straight to 3.12 via a Blue/Green deployment.

0 replies

goniko · 2023-07-14T05:06:18Z

goniko
Jul 14, 2023
Author

Thank you Michael,

Connection churn is another plausible explanation we considered, finding similar info. We concluded it's not related, since the number of connections is stable, and there's no indication of disconnects in the logs (which we have seen on other occasions). We were trying to peek into the segment files to see what's in them (connection info, or data, but also what kind of commands). There was a hint elsewhere that basic.get could also cause something similar. Because it's mostly binary it was inconclusive but hinted at queue payloads and contained queue names, so we thought it's unlikely to do with connections.

The queues are AMQP 0.9.1. Although the client is not auto-ack, they are definitely acknowledging messages.

We had to intervene today and tried purging the queue, but it didn't achieve anything. However deleting and re-creating the queue cleaned it all up. A bit drastic, but we had to put a stop to it.

Also thank you for the upgrade hints. We were planning to catch-up with the version soon and the Blue/Green deployment is something to consider!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quorum queue segment files growing unbounded #8860

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Quorum queue segment files growing unbounded #8860

Uh oh!

goniko Jul 13, 2023

Replies: 3 comments

Uh oh!

michaelklishin Jul 14, 2023 Maintainer

Uh oh!

Uh oh!

michaelklishin Jul 14, 2023 Maintainer

Uh oh!

Uh oh!

goniko Jul 14, 2023 Author

goniko
Jul 13, 2023

michaelklishin
Jul 14, 2023
Maintainer

michaelklishin
Jul 14, 2023
Maintainer

goniko
Jul 14, 2023
Author