Skip to content

Commit 5ccd086

Browse files
averdaguelvgarrui
authored andcommitted
Migration revert plan
As a failsave the migration code can create a backup of the controllers to use in case that the migration fails and leaves the environment on a unusable state. The revert plan has two stages: 1- Backup stage: included on the current ovn-migration.yml. Can be configured using the env variable CREATE_BACKUP (True by default). This stage will run the new ansible role, recovery-backup. It will store the backup on `/ctl_plane_backup` on the host where the BACKUP_MIGRATION_IP belongs to (can be modified by modifing the env var). In order to restore the controllers, boot them using the iso created by ReaR (stored in /ctl_plane_backup) and perform `automatic recover` 2- Revert stage: this stage has its own ansible playbook (revert.yml) This playbook will clean the environment from all the OVN ressources that could had been created (breaking the data plane connectivity) to leave the environment in a stage where an overcloud deploy with the OVS templates can be run. Note: If the user creates new resources after running the backup stage and then performs the recovery of the controllers, those resources will be lost. Change-Id: I7093f6a5f282b06fb2267cf2c88c533c1eae685d (cherry picked from commit 7003817)
1 parent 8ab8495 commit 5ccd086

File tree

7 files changed

+168
-5
lines changed

7 files changed

+168
-5
lines changed

doc/source/ovn/migration.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,14 @@ Perform the following steps in the undercloud
147147
during migration to ensure a synchronized MTU switch across the networks.
148148
Default: 30
149149

150+
* CREATE_BACKUP - Flag to create a backup of the controllers that can be
151+
used as a revert mechanism.
152+
Default: True
153+
154+
* BACKUP_MIGRATION_IP - Only used if CREATE_BACKUP is enabled, IP of the
155+
server that will be used as a NFS server to store the backup.
156+
Default: 192.168.24.1
157+
150158
.. warning::
151159

152160
Please note that VALIDATE_MIGRATION requires enough quota (2

tools/ovn_migration/tripleo_environment/ovn_migration.sh

Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ LANG=C
4141
: ${SERVER_USER_NAME:=cirros}
4242
: ${VALIDATE_MIGRATION:=True}
4343
: ${DHCP_RENEWAL_TIME:=30}
44+
: ${CREATE_BACKUP:=True}
45+
: ${BACKUP_MIGRATION_IP:=192.168.24.1} # TODO: Document this new var
4446

4547

4648
check_for_necessary_files() {
@@ -51,11 +53,14 @@ check_for_necessary_files() {
5153
fi
5254

5355
# Check if the user has generated overcloud-deploy-ovn.sh file
56+
# With correct permissions
5457
# If it is not generated. Exit
55-
if [ ! -e $OVERCLOUD_OVN_DEPLOY_SCRIPT ]; then
58+
if [ ! -x $OVERCLOUD_OVN_DEPLOY_SCRIPT ]; then
5659
echo "overcloud deploy migration script :" \
57-
"$OVERCLOUD_OVN_DEPLOY_SCRIPT is not present. Please" \
58-
"make sure you create that file before running this script."
60+
"$OVERCLOUD_OVN_DEPLOY_SCRIPT is not present" \
61+
"or execution permission is missing. Please" \
62+
"make sure you create that file with correct" \
63+
"permissions before running this script."
5964
exit 1
6065
fi
6166

@@ -96,6 +101,17 @@ check_for_necessary_files() {
96101
fi
97102
exit 1
98103
fi
104+
# Check if backup is enabled
105+
if [[ $CREATE_BACKUP = True ]]; then
106+
# Check if backup server is reachable
107+
ping -c4 $BACKUP_MIGRATION_IP
108+
if [[ $? -eq 1 ]]; then
109+
echo -e "It is not possible to reach the backup migration server IP" \
110+
"($BACKUP_MIGRATION_IP). Make sure this IP is accessible before" \
111+
"starting the migration." \
112+
"Change this value by doing: export BACKUP_MIGRATION_IP=x.x.x.x"
113+
fi
114+
fi
99115
}
100116

101117
get_host_ip() {
@@ -297,14 +313,23 @@ reduce_network_mtu () {
297313
start_migration() {
298314
source $STACKRC_FILE
299315
echo "Starting the Migration"
316+
local inventory_file="$OOO_WORKDIR/$STACK_NAME/config-download/$STACK_NAME/tripleo-ansible-inventory.yaml"
317+
if ! test -f $inventory_file; then
318+
inventory_file=''
319+
fi
300320
ansible-playbook -vv $OPT_WORKDIR/playbooks/ovn-migration.yml \
301321
-i hosts_for_migration -e working_dir=$OPT_WORKDIR \
302322
-e public_network_name=$PUBLIC_NETWORK_NAME \
303323
-e image_name=$IMAGE_NAME \
304324
-e flavor_name=$FLAVOR_NAME \
325+
-e undercloud_node_user=$UNDERCLOUD_NODE_USER \
305326
-e overcloud_ovn_deploy_script=$OVERCLOUD_OVN_DEPLOY_SCRIPT \
306-
-e server_user_name=$SERVER_USER_NAME \
307-
-e overcloudrc=$OVERCLOUDRC_FILE \
327+
-e server_user_name=$SERVER_USER_NAME \
328+
-e overcloudrc=$OVERCLOUDRC_FILE \
329+
-e stackrc=$STACKRC_FILE \
330+
-e backup_migration_ip=$BACKUP_MIGRATION_IP \
331+
-e create_backup=$CREATE_BACKUP \
332+
-e ansible_inventory=$inventory_file \
308333
-e validate_migration=$VALIDATE_MIGRATION $*
309334

310335
rc=$?

tools/ovn_migration/tripleo_environment/playbooks/ovn-migration.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,25 @@
11
# This is the playbook used by ovn-migration.sh.
22

3+
#
4+
# Backup the controllers to have a backup in case the
5+
# migration fails leaving the testbed on a broken status.
6+
#
7+
8+
- name: Backup controllers pre-migration
9+
hosts: localhost
10+
roles:
11+
- recovery-backup
12+
tags:
13+
- recovery-backup
14+
15+
316
#
417
# Pre migration and validation tasks will make sure that the initial cloud
518
# is functional, and will create resources which will be checked after
619
# migration.
720
#
821

22+
923
- name: Pre migration and validation tasks
1024
hosts: localhost
1125
roles:
@@ -50,6 +64,7 @@
5064
- setup
5165
become: false
5266

67+
5368
#
5469
# Once everything is migrated prepare everything by syncing the neutron DB
5570
# into the OVN NB database, and then switching the dataplane to br-int
@@ -67,6 +82,7 @@
6782
tags:
6883
- migration
6984

85+
7086
#
7187
# Verify that the initial resources are still reachable, remove them,
7288
# and afterwards create new resources and repeat the connectivity tests.
@@ -80,6 +96,7 @@
8096
tags:
8197
- post-migration
8298

99+
83100
#
84101
# Final step to make sure tripleo knows about OVNIntegrationBridge == br-int.
85102
#
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- name: Clean computes
2+
hosts: ovn-controllers
3+
roles:
4+
- revert
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
3+
# Name of the group hosts where the NFS instalation will take place
4+
# If the NFS server is the undercloud (and there is only one) will
5+
# not be a problem, but if multiple servers exist on the server_name group
6+
# it is possible that the nfs will be installed on every server, eventho the
7+
# storage of the backup will only be done in the backup_ip.
8+
#
9+
# This can be solved if a new tripleo-inventory is manually created specifying
10+
# a [BackupNode] section, with the nfs server info
11+
revert_preparation_server_name: "Undercloud"
12+
backup_and_recover_temp_folder: /tmp/backup-recover-temp
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
3+
- name: Create controller's backup
4+
block:
5+
- name: Create temp folder related to backup
6+
file:
7+
state: directory
8+
path: "{{ backup_and_recover_temp_folder }}"
9+
10+
# Using this task on OSP17
11+
- name: Copy tripleo-inventory
12+
copy:
13+
src: "{{ ansible_inventory }}"
14+
dest: "{{ backup_and_recover_temp_folder }}/tripleo-inventory.yaml"
15+
when:
16+
- create_backup|bool
17+
- ansible_inventory is defined
18+
- ansible_inventory != ""
19+
20+
# Using this task in OSP16.x
21+
- name: Generate tripleo inventory
22+
shell: |
23+
source {{ stackrc }} &&
24+
tripleo-ansible-inventory \
25+
--ansible_ssh_user {{ undercloud_node_user }} \
26+
--static-yaml-inventory {{ backup_and_recover_temp_folder }}/tripleo-inventory.yaml
27+
when:
28+
- create_backup|bool
29+
- ansible_inventory is not defined or ansible_inventory == ""
30+
31+
- name: Setup NFS on the backup node using IP {{ backup_migration_ip }}
32+
shell: |
33+
source {{ stackrc }} &&
34+
openstack overcloud backup \
35+
--inventory {{ backup_and_recover_temp_folder }}/tripleo-inventory.yaml \
36+
--setup-nfs \
37+
--extra-vars '{
38+
"tripleo_backup_and_restore_server": {{ backup_migration_ip }},
39+
"nfs_server_group_name": {{ revert_preparation_server_name }}
40+
}'
41+
42+
- name: Setup REAR on the controllers
43+
shell: |
44+
source {{ stackrc }} &&
45+
openstack overcloud backup \
46+
--inventory {{ backup_and_recover_temp_folder }}/tripleo-inventory.yaml \
47+
--setup-rear \
48+
--extra-vars '{
49+
"tripleo_backup_and_restore_server": {{ backup_migration_ip }}
50+
}'
51+
52+
- name: Backup the controllers
53+
shell: |
54+
source {{ stackrc }} &&
55+
openstack overcloud backup \
56+
--inventory {{ backup_and_recover_temp_folder }}/tripleo-inventory.yaml
57+
58+
# Ensure that after the controller backups the api responds
59+
- name: Ensure that the OSP api is working
60+
shell: >
61+
source {{ overcloudrc }} && openstack flavor list
62+
retries: 20
63+
register: api_rc
64+
delay: 5
65+
ignore_errors: yes
66+
until: api_rc.rc == "0"
67+
when: create_backup|bool
68+
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
- name: Stop ovn containers
3+
become: yes
4+
shell: |
5+
for agent in $(podman ps -a --format {% raw %}"{{.ID}}"{% endraw %} --filter "name=(ovn_.*|ovnmeta)"); do
6+
echo "Cleaning up agent $agent"
7+
podman rm -f $agent
8+
done
9+
10+
- name: Clean OVN netns
11+
become: yes
12+
shell: |
13+
for netns in $(ip netns ls | grep ovnmeta | cut -d' ' -f1); do
14+
echo "delete netns $netns"
15+
ip netns del $netns
16+
done
17+
18+
- name: Delete OVN ports
19+
become: yes
20+
shell: |
21+
for port in $(ovs-vsctl list interface | grep ^name | grep 'ovn-\|patch-provnet\|patch-br-int-to' | cut -d':' -f2); do
22+
echo "Removing port $port"
23+
ovs-vsctl del-port $port
24+
done
25+
26+
- name: Revert cleanup completed.
27+
debug:
28+
msg: Revert cleanup done, please run overcloud deploy with the OVS configuration.
29+

0 commit comments

Comments
 (0)