Skip to content

Commit 46909aa

Browse files
committed
[ceph_migrate] trigger mgr failover when cluster health is degraded
Add a task in post.yaml to trigger Ceph manager failover when the cluster health status is not HEALTH_OK. This helps recover from degraded cluster states after migration operations. The task delegates to ComputeHCI nodes and uses /etc/ceph as the config directory, ensuring proper execution context. It only runs when ComputeHCI nodes are available and the cluster health is degraded (HEALTH_WARN or HEALTH_ERR). Also update fail_mgr.yaml to accept ceph_config_home as a parameter, allowing callers to override the default temporary client home directory. This enables the task to work correctly when executed on ComputeHCI nodes where /etc/ceph is the appropriate config location. Signed-off-by: Roberto Alfieri <[email protected]>
1 parent 2df6d47 commit 46909aa

File tree

2 files changed

+21
-1
lines changed

2 files changed

+21
-1
lines changed

tests/roles/ceph_migrate/tasks/fail_mgr.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
# Get a client using -v /home/tripleo-admin/ceph_config:/etc/ceph:z as input
2+
- name: Set ceph_config_home if not provided
3+
ansible.builtin.set_fact:
4+
_ceph_config_home: "{{ ceph_config_home | default(ceph_config_tmp_client_home) }}"
5+
26
- name: Refresh ceph_cli
37
ansible.builtin.include_tasks: ceph_cli.yaml
48
vars:
5-
ceph_config_home: "{{ ceph_config_tmp_client_home }}"
9+
ceph_config_home: "{{ _ceph_config_home }}"
610
ceph_fsid: "{{ mon_dump.fsid }}"
711
ceph_cluster: ceph
812

tests/roles/ceph_migrate/tasks/post.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,19 @@
1111
vars:
1212
shell_header: "set -euo pipefail"
1313
when: ceph_daemons_layout.rgw | default(true) | bool
14+
15+
- name: Remove faulty mgr
16+
delegate_to: "{{ groups['ComputeHCI'][0] | default(inventory_hostname) }}"
17+
when:
18+
- groups['ComputeHCI'] is defined
19+
- groups['ComputeHCI'] | length > 0
20+
- ceph is defined
21+
- ceph.health.status is defined
22+
- ceph.health.status != 'HEALTH_OK'
23+
block:
24+
- name: Include fail_mgr tasks
25+
ansible.builtin.include_tasks: fail_mgr.yaml
26+
vars:
27+
ceph_config_home: /etc/ceph
28+
ceph_fsid: "{{ mon_dump.fsid }}"
29+
ceph_cluster: ceph

0 commit comments

Comments
 (0)