Skip to content

Commit 6387e43

Browse files
markgoddardyoctozepto
authored andcommitted
Fix RabbitMQ restart ordering
The host list order seen during Ansible handlers may differ to the usual play host list order, due to race conditions in notifying handlers. This means that restart_services.yml for RabbitMQ may be included in a different order than the rabbitmq group, resulting in a node other than the 'first' being restarted first. This can cause some nodes to fail to join the cluster. The include_tasks loop was introduced in [1]. This change fixes the issue by splitting the handler into two tasks, and restarting the first node before all others. [1] https://review.opendev.org/c/openstack/kolla-ansible/+/763137 Change-Id: I1823301d5889589bfd48326ed7de03c6061ea5ba Closes-Bug: #1930293 (cherry picked from commit 0cd5b02)
1 parent 595eec1 commit 6387e43

File tree

2 files changed

+22
-1
lines changed

2 files changed

+22
-1
lines changed
Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,26 @@
11
---
2-
- name: Restart rabbitmq container
2+
# NOTE(mgoddard): These tasks perform a 'full stop upgrade', which is necessary when moving between
3+
# major releases. In future kolla-ansible releases we may be able to change this to a rolling
4+
# restart. For info on this process see https://www.rabbitmq.com/upgrade.html
5+
6+
- name: Restart first rabbitmq container
7+
vars:
8+
service_name: "rabbitmq"
9+
service: "{{ rabbitmq_services[service_name] }}"
10+
include_tasks: 'restart_services.yml'
11+
when:
12+
- kolla_action != "config"
13+
- inventory_hostname == groups[service.group] | first
14+
listen: Restart rabbitmq container
15+
16+
- name: Restart remaining rabbitmq containers
317
vars:
418
service_name: "rabbitmq"
519
service: "{{ rabbitmq_services[service_name] }}"
620
include_tasks: 'restart_services.yml'
721
when:
822
- kolla_action != "config"
923
- inventory_hostname == item
24+
- inventory_hostname != groups[service.group] | first
1025
loop: "{{ groups[service.group] }}"
26+
listen: Restart rabbitmq container
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
fixes:
3+
- |
4+
Fixes more-than-2-node RabbitMQ upgrade failing randomly.
5+
`LP#1930293 <https://launchpad.net/bugs/1930293>`__.

0 commit comments

Comments
 (0)