Node fails to start due to incompatible feature flags #6755
Replies: 8 comments 27 replies
-
Please ATTACH your complete Kubernetes configuration and In addition, RabbitMQ and Kubernetes logs will be helpful. At the moment you're asking us to guess how to reproduce this issue. |
Beta Was this translation helpful? Give feedback.
-
This does not happen in most environments so we need to know exactly how the nodes are started. It's also curious that this affects one specific feature flag. I can think of two workarounds:
As of #6682 this feature flag will become required (auto-enabled) but that would only ship in |
Beta Was this translation helpful? Give feedback.
-
#6701 can also be relevant here. |
Beta Was this translation helpful? Give feedback.
-
@ares-the-god-of-war, as @lukebakken wrote, please provide exact reproduction steps including any configuration files and how the RabbitMQ nodes get clustered. You don't use https://github.com/rabbitmq/cluster-operator, do you? |
Beta Was this translation helpful? Give feedback.
-
Dear all, I have done several tests. Attached you can find some minimalistic definitions.json which reproduces the error for me. Once I encountered not only the direct_exchange_routing_v2 feature_flag being disabled, but also the drop_unroutable_metric and empty_basic_get_metric. So the issue seems to not only affect the direct_exchange_routing_v2 feature flag, but it is the feature flag it at least occurs with in ALL reproduction attempts I use configuration in rabbitmq.conf for peer discovery: cluster_formation.k8s.address_type = hostname cluster_formation.node_cleanup.interval = 30 @ansd, we do not use the cluster-operator. |
Beta Was this translation helpful? Give feedback.
-
Dear @ansd, I received permission to test the provided image and have been able to perform the same upgrade with the new image. 3.11.x branch Unfortunately, the issue remains. There is a slight change in behaviour. The feature flag direct_exchange_routing_v2 is no longer disabled when node 2 (rabbitmq-1) starts up. (see image). Like previously, the nodes are started 1 after the other and waiting the previous node to complete its startup before starting the new node. We start with an empty database disk. Attached you find the complete logs of RabbitMQ node 2 (rabbitmq-1) which still crashes. |
Beta Was this translation helpful? Give feedback.
-
We're running rabbitmq-3.11.7 (includes a potential fix #6847) here in Azure Kubernetes Services, and are still running into this issue most of the time.
|
Beta Was this translation helpful? Give feedback.
-
@ansd im sorry for the unresponsiveness. I had a little vacation. In my mail i saw there might be another fix for this issue. I will try to find time tomorrow to test again, the following image as linked in I will confirm whether or not this is fixed for us as well. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Good day,
I am running RabbitMQ in an Kubernetes environment with 3 nodes. I want to start from scratch with predefined configuration with RabbitMQ version 3.11.5, (docker image rabbitmq:3.11.5). I have got the predefined configuration in a configuration map containing definitions.json. The following steps will lead to node 2 and 3 crash.
Beta Was this translation helpful? Give feedback.
All reactions