-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
we have an Rocky openstack deployment that includes 3 controller and 500 computes.just at one moment,nova-compute detect that rabbitmq connection was broken ,then reconnected.In 15 minutes,memory consumption on rabbitmq-server increased abruptly,from 3G orinally to 150G, reached 40% watermark.
rabbitmq.log
2021-07-05 15:58:28.633 8 ERROR oslo.messaging._drivers.impl_rabbit [req-a09d4a8b-c24b-4b30-b433-64fe4f6bace5 - - - - -] [8ed1f425-ad67-4b98-874c-e4516aaf3134] AMQP server on 145.247.103.16:5671 is unreachable: . Trying again in 1 seconds.: timeout
2021-07-05 15:58:29.656 8 INFO oslo.messaging._drivers.impl_rabbit [req-a09d4a8b-c24b-4b30-b433-64fe4f6bace5 - - - - -] [8ed1f425-ad67-4b98-874c-e4516aaf3134] Reconnected to AMQP server on 145.247.103.16:5671 via [amqp] client with port 28205.
then rabbitmq report huge connections was closed by client.
=WARNING REPORT==== 5-Jul-2021::15:57:59 ===
closing AMQP connection <0.6345.754> (20.16.36.44:2451 -> 145.247.103.14:5671 - nova-compute:8:b4ce7b09-b9b5-4db1-983b-a071dc031c64, vhost: '/', user: 'openstack'):
client unexpectedly closed TCP connection
after 10 minutes ,cluster was blocked with 0.4 memory watermark.
=INFO REPORT==== 5-Jul-2021::16:19:29 ===
vm_memory_high_watermark set. Memory used:111358541824 allowed:107949065830
*** Publishers will be blocked until this alarm clears ***
However ,after the publishers were bloked ,rabbitmq pod still result in memory leaking,in the end, the node OOM,system force pod to restart.
rabbitmq-management : on
rabbitmq-server: 3.6.16
erlang: 19.3.6
amqp release : 2.5.2
oslo-messaging release :8.1.4
openstack : Rocky