Running rabbitmqctl on node during startup of said node leads to an error #7622

tijoer · 2023-03-15T14:42:49Z

tijoer
Mar 15, 2023

Hi everybody,
I setup a small repository with a Devcontainer that provides a small script with reproducible behavior. You can find it here: https://github.com/tijoer/RabbitMQStartupBug/tree/main and click on the button in the Readme.md to start a Devcontainer and then run the script in a terminal.

Situation

https://github.com/tijoer/RabbitMQStartupBug/blob/main/demonstratedBug.sh

I want to start three RabbitMQ Nodes in a Docker network:

docker network create rabbits

docker volume create --name DataVolume1

docker run --detach --rm --net rabbits --hostname rabbit-1 --name rabbit-1 --publish 8081:15672 -v DataVolume1:/datavolume1 --env RABBITMQ_ERLANG_COOKIE=ASDFQWER rabbitmq:3.12-rc-management
docker run --detach --rm --net rabbits --hostname rabbit-2 --name rabbit-2 --publish 8082:15672 -v DataVolume1:/datavolume1 --env RABBITMQ_ERLANG_COOKIE=ASDFQWER rabbitmq:3.12-rc-management
docker run --detach --rm --net rabbits --hostname rabbit-3 --name rabbit-3 --publish 8083:15672 -v DataVolume1:/datavolume1 --env RABBITMQ_ERLANG_COOKIE=ASDFQWER rabbitmq:3.12-rc-management

In the next step I want to join these three instances in a cluster, using the following code:

docker exec -it rabbit-2 rabbitmqctl stop_app
docker exec -it rabbit-2 rabbitmqctl reset
docker exec -it rabbit-2 rabbitmqctl join_cluster rabbit@rabbit-1
docker exec -it rabbit-2 rabbitmqctl start_app
docker exec -it rabbit-2 rabbitmqctl cluster_status

docker exec -it rabbit-3 rabbitmqctl stop_app
docker exec -it rabbit-3 rabbitmqctl reset
docker exec -it rabbit-3 rabbitmqctl join_cluster rabbit@rabbit-1
docker exec -it rabbit-3 rabbitmqctl start_app
docker exec -it rabbit-3 rabbitmqctl cluster_status

However as soon as the first rabbitmqctl cluster_status is run, the startup will fail with the following text:

https://github.com/tijoer/RabbitMQStartupBug/blob/main/output.txt

Expected behavior

Three running nodes, all connected to a cluster. You can check the Ports tab in the Devcontainer and should see three running instances and be able to open the management boards.

What workaround did I try instead

Not working

Running rabbitmqctl cluster_status on rabbit-1 in a loop to check if the node is up an healthy.

Working

Adding a sleep 2 into the script. (Horrible solution but it works.)

Affected versions

I tried 3.11 and 3.12-RC1. I only tried the official images and did not install it myself.

Assumption

Starting the RabbitMQ nodes with a passed Erlang cookie and then running rabbitmqctl commands against the node results in unexpected behavior. There might also be some connection with running the containers with --detach, but I am not sure about that.

Additional information

This bug is mostly a problem in integration test scenarios, where you want to spin up and down RabbitMQ clusters to run your testcases against them. In most production scenarios the startup would just be repeated again and everything works fine.

mkuratczyk · 2023-03-15T14:58:08Z

mkuratczyk
Mar 15, 2023
Maintainer

Try rabbitmqctl await_online_nodes to wait for startup:
https://www.rabbitmq.com/rabbitmqctl.8.html#Cluster_management

Right now, you are asking the CLI to tell you about the status of all nodes, but some of them don't respond yet, so the error seems to be the correct behaviour. They could be unresponsive for all kinds of other reasons, including actual cluster issues.

Cluster membership will be completely reworked in 4.0, so the behaviour in this situation might change in the relatively near future.

2 replies

tijoer Mar 15, 2023
Author

Thank you very much for this answer. I will try the await tomorrow and I am sure that this will work. However it feels odd, that the whole process gets nuked and not only the rabbitmqctl.

If this is a problem with Docker I am sorry for opening this as an issue and not writing to your mailing list instead :).

mkuratczyk Mar 15, 2023
Maintainer

Yes, I didn't get that's what you mean, but this is absolutely a docker-level problem. RabbitMQ server would just continue the boot process, it's docker that aborts the whole thing because a command failed

michaelklishin · 2023-03-15T14:59:11Z

michaelklishin
Mar 15, 2023
Maintainer

rabbit@rabbit-2:
  * connected to epmd (port 4369) on rabbit-2
  * epmd reports: node 'rabbit' not running at all

suggests that the node is still booting, the hostname is off or CLI tools do not resolve the hostname of their own container the way the host or other containers do. This is not a crash.

See RabbitMQ node logs for evidence.

1 reply

tijoer Mar 15, 2023
Author

Thanks, I will try this tomorrow and look into the logs.

lukebakken · 2023-03-16T15:13:36Z

lukebakken
Mar 16, 2023
Maintainer

@tijoer - forming dev clusters in docker works much better using docker compose. Please see the following: https://github.com/lukebakken/docker-rabbitmq-cluster

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running rabbitmqctl on node during startup of said node leads to an error #7622

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running rabbitmqctl on node during startup of said node leads to an error #7622

Uh oh!

tijoer Mar 15, 2023

Situation

Expected behavior

What workaround did I try instead

Not working

Working

Affected versions

Assumption

Additional information

Replies: 3 comments · 3 replies

Uh oh!

mkuratczyk Mar 15, 2023 Maintainer

Uh oh!

tijoer Mar 15, 2023 Author

Uh oh!

mkuratczyk Mar 15, 2023 Maintainer

Uh oh!

michaelklishin Mar 15, 2023 Maintainer

Uh oh!

Uh oh!

tijoer Mar 15, 2023 Author

Uh oh!

lukebakken Mar 16, 2023 Maintainer

tijoer
Mar 15, 2023

Replies: 3 comments 3 replies

mkuratczyk
Mar 15, 2023
Maintainer

tijoer Mar 15, 2023
Author

mkuratczyk Mar 15, 2023
Maintainer

michaelklishin
Mar 15, 2023
Maintainer

tijoer Mar 15, 2023
Author

lukebakken
Mar 16, 2023
Maintainer