-
We are facing an intermittent crash with RabbitMQ running v3.13.0 with Erlang/OTP v26.2.3 occurring once every few weeks. It leaves a erl_crash.dump file with the following slogan:
I have attached one such erl_crash.dump file with this email. This typically happens when a few hundred messages are published into the queue simultaneously on 3-4 different production systems running their own dedicated RabbitMQ instances. The system has 15 GB of RAM available and the RAM utilization is around 20% in a normal usage, so it makes very little sense to me why the system went out-of-memory in allocating a block of just around 1.5 MB. I have no specific steps to reproduce this as this doesn't happen on our test systems and I am able to push loads of up to 600K messages without any issues. Can anyone help me pinpoint possible reasons of why this is happening? What RabbitMQ parameters we should tune up to avoid this from happening? Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 13 replies
-
interesting issue indeed. I suspect memory allocated by the Erlang VM is very fragmented as the crash dump reports only ~153MB is used. Searching more for
It looks like But why the queue process terminates? There is something like a stacktrace here (sorry for the lengthy dump)
that suggests in |
Beta Was this translation helpful? Give feedback.
-
@gomoripeti Thank you so much for your response and detailed explanation. I have tried to correlate this with two other crash dumps from earlier on from the same system but the underlying reasons seem to be different each time. I have attached these two crash dumps here as well. Also, more importantly, is there a way to avoid the memory allocated by the Erlang VM to become so much fragmented? I'm using the following parameters currently.
Should I try changing the memory allocation strategy or any other parameters? Or is there any other suggestion that could help avoid this? Regards, |
Beta Was this translation helpful? Give feedback.
-
@ssurya is there anything in the Windows Event Viewer that corresponds to these crashes? |
Beta Was this translation helpful? Give feedback.
-
After conducting some extensive amount of testing with manually-configured low virtual memory configuration of the operating system (Windows Server), I'm able to reproduce these Erlang VM crashes quite frequently. During investigation, it also came up as a surprise that the stock VMs from the cloud services provider that we have been using for many years had virtual memory configuration set to the "Custom" option instead of "System managed" option and it was restricted to a relatively restricted maximum size of the paging file. With the virtual memory configuration set to a the standard "System managed" option, no such issues have been reported in the past two weeks. Besides this, I have also configured antivirus exclusions for Erlang/RabbitMQ processes and that has also helped in reducing the overall resource consumption on these servers. While it may be a bit early to reach a final conclusion, I think I'm fairly confident that this was root cause of the problem. Thank you so much @lukebakken, @michaelklishin and @gomoripeti for all your help and guidance in helping me narrow-down this issue. Regards, |
Beta Was this translation helpful? Give feedback.
After conducting some extensive amount of testing with manually-configured low virtual memory configuration of the operating system (Windows Server), I'm able to reproduce these Erlang VM crashes quite frequently.
During investigation, it also came up as a surprise that the stock VMs from the cloud services provider that we have been using for many years had virtual memory configuration set to the "Custom" option instead of "System managed" option and it was restricted to a relatively restricted maximum size of the paging file. With the virtual memory configuration set to a the standard "System managed" option, no such issues have been reported in the past two weeks.
Besides this, I have …