Node has consumed 500 GiB of disk space within a short time window #8871
Replies: 4 comments
-
Currently you are asking us to mostly guess how to reproduce your issue. Two log files and some prose is insufficient. Please provide the following:
Thanks |
Beta Was this translation helpful? Give feedback.
-
RabbitMQ 3.9 has reached end of life for users without a support subscription some six months ago. Classic mirrored queues are deprecated. Disk space usage cannot be considered to be a bug. It is almost always a function of what applications do, sometimes intentionally and sometimes not. See these scenarios described in this discussion in the context of quorum queues. My best guess without any data to work withNeither short lived connections nor classic queue mirroring per se contribute to such massive spikes in What that happens, nodes will continue accumulating messages in memory because applications How modern versions are differentClassic queues v2 and modern quorum queues move data to disk with a very small "working set" in memory, ignoring Or that disk space was used by something else, it's not uncommon to see people co-locate services |
Beta Was this translation helpful? Give feedback.
-
@lukebakken unless this behavior can be reproduced with 3.11 or 3.12, we should disengaged. "End of life" for a series means "end of life", in particular since the OpenStack community in general never contribute or pay for support. |
Beta Was this translation helpful? Give feedback.
-
One more relevant topic here is the ratio of node memory limit to its available disk space. Historically the recommendation has been one to one. For a node to use up 500 GiB of disk space, it must either have a comparable backlog (including unconfirmed messages), or a comparable heap size. For OpenStack installations is rare to see, the latter is fairly rare to see. That makes me think of another couple of hypotheses: something else on the host has used up the disk space, or RabbitMQ does uses a different filesystem volume from what's being monitored. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
We have issues with running rabbit 3.9.24 for our Openstack services with mirror queues and we've observed quite frequent crashes with no space left error, note that filesystem with rabbit does have ~500GiB of free space. We had set low disk space watermark to 10GiB in order to try and catch what actually is trying to write so much data, but with no luck - it still crashed. After rabbitmq crash disk space is reclaimed and frees up instantly.

Log provided below:
rabbitmq_crash.log
erl crash dump:
erl_crash_dump.log
screen shot of our disk space monitoring, you can see it filling up quickly:
I would appreciate support and guidance on how to resolve this issue.
Reproduction steps
Expected behavior
Running rabbit does not take up 500 GiB of disk space out of nowhere and when disk resource limit alarm is set it should not write to disk anymore.
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions