RabbitMQ disk fills up with segment files #9853

4danmi · 2023-11-02T08:21:28Z

4danmi
Nov 2, 2023

Hi,

We had similar issue to #8860, though we upgraded a month ago from version 3.8.7 to version 3.12.4 (not in place, but in a completely different machines). After the upgrade it seemed that the issued had been resolved, but we now found that the segment files folder of one of the queues keeps growing uncontrollabley. This folder belongs (according to the config file inside it) to a quorum queue that is replicated on 3 nodes, and the folder keeps growing on all of its nodes.

As far as we can tell, it happens only in a single queue (out of hundreds), though this queue has much higher message rate compare to most other queues in the cluster, but other queues with slightly lower message rate don't seem to have this issue. The queue is almost always emtpy as the consumers keeps reading from it with no appearant problem. We also have the same queue with the same configuration and data, running as a backup in a different cluster that doesn't show the same behaviour.

We considered removing each one of the nodes from this specific queue, deleting the segements file folder and re-adding it back to the queue (using rabbitmq-queues), but we afraid that it might break the queue and we can't risk it at this time.

All of our consumers (also for other queues) are using AMQP with Python's pika library. The server is running RHEL 8.4 with Erlang version 26.0.2.

The only solution we found for this problem is to reset one of the cluster nodes, and then join it back to the cluster, but during this process we risk the cluster availablity. We already performed this operation once, but the problem seems to occur again after a few hours.

We also considered upgrading to version 3.12.8, but non of the bug fixes in the changelog seems to address something that might resolve this problem.

What else can we do?

Thanks for your help.

Answered by kjnilsson

Nov 5, 2023

Most likely something like a faulty consumer held a lock on a message and eventually released the lock. You didn’t really provide any information for us to use to even provide a guess with

View full answer

mkuratczyk · 2023-11-02T09:03:03Z

mkuratczyk
Nov 2, 2023
Maintainer

Can you share a folder with such segments files? This would allow us to identify what's special about it.
You can contact one of the RabbitMQ team members on slack or discord if you don't want to attach these files here.

3 replies

4danmi Nov 2, 2023
Author

Unfortunately, the cluster in on a closed network, so we cant share the folder. We can copy the config file manually if needed. Can you elaborate on what we can try to do with this folder?

kjnilsson Nov 2, 2023
Maintainer

Please can you share as much detail as you can, output of rabbitmq-queues quorum_status, screenshot of queue page, policies that are in place.

Can you try to restart all your consuming applications that consume from that queue?

Do you use persistent consumers or basic.get type consumer access?

mkuratczyk Nov 2, 2023
Maintainer

One more question - do you use the federation plugin? If so, 3.12.5 contains a fix that might be relevant.

michaelklishin · 2023-11-02T12:11:17Z

michaelklishin
Nov 2, 2023
Maintainer

If consumers on a quorum queue consume a message but do not acknowledge it, quorum queues won't be able to reclaim disk space from that moment on. In other words, a single stack acknowledgement can lead to a similar behavior.

Restarting your consumers would be an easy way to rest this hypothesis.

3 replies

michaelklishin Nov 2, 2023
Maintainer

Note that resetting one node may help because it is effectively the same thing as restarting consumers if the node hosts the leader of the queue in question: the leader replica changes, all consumers recover or will be "re-attached" by RabbitMQ itself if their connections are intact.

4danmi Nov 2, 2023
Author

We have tried to restart the consumers.
However, when i execute rabbitmq-queues quorum_status <queue_name>, one of the node's raft-state is timeout (before it was a follower).

4danmi Nov 5, 2023
Author

Hi,
After a few days, the folder was cleared without any explanation.
Do you have explanation for this behavior?

kjnilsson · 2023-11-05T15:00:21Z

kjnilsson
Nov 5, 2023
Maintainer

Most likely something like a faulty consumer held a lock on a message and eventually released the lock. You didn’t really provide any information for us to use to even provide a guess with

…

On Sun, 5 Nov 2023 at 09:48, 4danmi ***@***.***> wrote: Hi, After a few days, the folder was cleared without any explanation. Do you have explanation for this behavior? — Reply to this email directly, view it on GitHub <#9853 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJAHFAIVUT7FZJNZ3UIMBDYC5OIDAVCNFSM6AAAAAA62PKQLCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TINZYGE3DK> . You are receiving this because you commented.Message ID: ***@***.*** com>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RabbitMQ disk fills up with segment files #9853

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RabbitMQ disk fills up with segment files #9853

Uh oh!

4danmi Nov 2, 2023

Replies: 3 comments · 6 replies

Uh oh!

mkuratczyk Nov 2, 2023 Maintainer

Uh oh!

4danmi Nov 2, 2023 Author

Uh oh!

kjnilsson Nov 2, 2023 Maintainer

Uh oh!

mkuratczyk Nov 2, 2023 Maintainer

Uh oh!

michaelklishin Nov 2, 2023 Maintainer

Uh oh!

michaelklishin Nov 2, 2023 Maintainer

Uh oh!

4danmi Nov 2, 2023 Author

Uh oh!

4danmi Nov 5, 2023 Author

Uh oh!

kjnilsson Nov 5, 2023 Maintainer

4danmi
Nov 2, 2023

Replies: 3 comments 6 replies

mkuratczyk
Nov 2, 2023
Maintainer

4danmi Nov 2, 2023
Author

kjnilsson Nov 2, 2023
Maintainer

mkuratczyk Nov 2, 2023
Maintainer

michaelklishin
Nov 2, 2023
Maintainer

michaelklishin Nov 2, 2023
Maintainer

4danmi Nov 2, 2023
Author

4danmi Nov 5, 2023
Author

kjnilsson
Nov 5, 2023
Maintainer