3.11: two Raft replicas are in the timeout state, one is a candidate #10934
-
Describe the bugWe have a 3 node MQ cluster deployed on Azure AKS with MQ version 3.11.24 Reproduction stepsCreate a 3 node MQ cluster on AKS Expected behaviorCluster should work well and raft state should not go into timeout state. Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
RabbitMQ 3.11 is out of community support. There will be no guidance besides the absolute basics.
Node identity matters for Raft-based features, so if you (or AKS) remove a node, it must be done explicitly, simply nuking a pod is not good enough, or rather it eventually may have side effects on the rest of the cluster, similar to what you are observing: the replicas cannot elect a leader for one reason or another. #10786 is one example of how removing nodes aggressively during a grow-then-shrink upgrade can lead to certain queues or streams not being able to elect a new leader because their original peers are gone without explicit removal. Please take it from here. |
Beta Was this translation helpful? Give feedback.
RabbitMQ 3.11 is out of community support. There will be no guidance besides the absolute basics.
Node identity matters for Raft-based features, so if you (or AKS) remove a node, it must be done explicitly, simply nuking a pod is not good enough, or rather it eventually may have side effects on the rest of the cluster, similar to what you are observing: the replicas cannot elect a leader for one reason or another.
I…