Skip to content

Commit 3321964

Browse files
committed
[ceph_migrate] trigger mgr failover when cluster health is degraded
Add a task in post.yaml to trigger Ceph manager failover when the cluster health status is not HEALTH_OK. This helps recover from degraded cluster states after migration operations. The task installs the cephadm package on all ComputeHCI nodes and then executes 'cephadm shell -- ceph mgr fail' on the first compute node. This approach avoids container-based CLI complexity and uses the native cephadm tool available on compute nodes where Ceph daemons are running. The task only runs when ComputeHCI nodes are available and the cluster health is degraded (HEALTH_WARN or HEALTH_ERR). Signed-off-by: Roberto Alfieri <[email protected]>
1 parent 2df6d47 commit 3321964

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

tests/roles/ceph_migrate/tasks/post.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,25 @@
1111
vars:
1212
shell_header: "set -euo pipefail"
1313
when: ceph_daemons_layout.rgw | default(true) | bool
14+
15+
- name: Remove faulty mgr
16+
when:
17+
- groups['ComputeHCI'] is defined
18+
- groups['ComputeHCI'] | length > 0
19+
- ceph is defined
20+
- ceph.health.status is defined
21+
- ceph.health.status != 'HEALTH_OK'
22+
block:
23+
- name: Install cephadm on all compute nodes
24+
become: true
25+
ansible.builtin.package:
26+
name: cephadm
27+
state: present
28+
loop: "{{ groups['ComputeHCI'] }}"
29+
delegate_to: "{{ item }}"
30+
31+
- name: Force fail ceph mgr on first compute node
32+
become: true
33+
ansible.builtin.command: cephadm shell -- ceph mgr fail
34+
changed_when: false
35+
delegate_to: "{{ groups['ComputeHCI'][0] }}"

0 commit comments

Comments
 (0)