3.11.17: rabbitmq-upgrade drain
fails with an exception
#8415
-
Describe the bugWe use Ansible code to drain the node before restarting services on it, after upgrade to 3.11.17 it fails with an error: RUNNING HANDLER [rabbitmq : Put RabbitMQ node into maintenance mode] *********** Reproduction steps
Expected behaviorSuccess (as in pre 3.11.17) Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 16 comments 12 replies
-
"Deploy OpenStack" is not a reproduction step. There is a billion settings you can configure differently. I cannot reproduce: ./sbin/rabbitmq-upgrade drain
Will put node rabbit@sunnyside into maintenance mode. The node will no longer serve any client traffic!
21:24:55.249 [warning] This node is being put into maintenance (drain) mode
21:24:55.250 [warning] Suspended all listeners and will no longer accept client connections
21:24:55.250 [warning] Closed 0 local client connections
21:24:55.250 [warning] Skipping leadership transfer of quorum queues: no candidate (online, not under maintenance) nodes to transfer to! ./sbin/rabbitmq-diagnostics status | grep "under maintenance"
Is under maintenance?: true |
Beta Was this translation helpful? Give feedback.
-
My best guess is that the node was not fully booted before it was put into maintenance, so it does not have any Ranch ETS tables, and In 3.11.17, there were no
|
Beta Was this translation helpful? Give feedback.
-
I'll have a go at reproducing it on a VM (the error was from OpenStack CI) and come back with better diagnostics. |
Beta Was this translation helpful? Give feedback.
-
Is there a precheck that we could implement to check if Ranch ETS tables are there? |
Beta Was this translation helpful? Give feedback.
-
When I try to run the for i in {1..1000}; do ./sbin/rabbitmq-upgrade drain; done
|
Beta Was this translation helpful? Give feedback.
-
So, I did add those two commands - and we're still failing in CI: `RUNNING HANDLER [rabbitmq : Make sure RabbitMQ node is fully booted] *********** RUNNING HANDLER [rabbitmq : Make sure RabbitMQ cluster is in good shape] ******* RUNNING HANDLER [rabbitmq : Put RabbitMQ node into maintenance mode] *********** RUNNING HANDLER [rabbitmq : Restart rabbitmq container] ************************ RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start] ********************* RUNNING HANDLER [rabbitmq : Restart remaining rabbitmq containers] ************* NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
And these are rabbitmq logs: https://0efb01d52986d97b758b-f2a50e6f05df4b0e5be60c7b98112049.ssl.cf1.rackcdn.com/884759/6/check/kolla-ansible-centos9s/42010bc/primary/logs/kolla/rabbitmq/index.html |
Beta Was this translation helpful? Give feedback.
-
It's still interesting that on 3.11.16 it doesn't behave that way, and breaks on 3.11.17 |
Beta Was this translation helpful? Give feedback.
-
RUNNING HANDLER [rabbitmq : Print listeners] *********************************** And then it fails as in previous snippets |
Beta Was this translation helpful? Give feedback.
-
Well, what is interesting, out of the whole set of jobs that we have (centos9, rocky9, debian bullseye and ubuntu jammy) it didn't fail on centos9 and here's rabbitmq-diagnostics listeners output: RUNNING HANDLER [rabbitmq : Print listeners] *********************************** |
Beta Was this translation helpful? Give feedback.
-
Ah, centos9 is pinned to 3.11.16 - that's why it works |
Beta Was this translation helpful? Give feedback.
-
After inspecting the function in question I so far fail to see what could have changed in We'll need to collect the list of ETS tables on the node. I will put together a Also, this happens in Docker, and I am trying this in a host OS. |
Beta Was this translation helpful? Give feedback.
-
I managed to reproduce the issue with the following config file: management.tcp.ip = 192.168.10.29
listeners.tcp.default = 192.168.10.29:5672
listeners.ssl.default = 192.168.10.29:5671
ssl_options.cacertfile = /path/to/tls-gen.git/basic/result/ca_certificate.pem
ssl_options.certfile = /path/to/tls-gen.git/basic/result/server_certificate.pem
ssl_options.keyfile = /path/to/tls-gen.git/basic/result/server_key.pem
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Trying to reproduce with |
Beta Was this translation helpful? Give feedback.
#8440