Integrating MicroShift with Greenboot allows for automatic software upgrade rollbacks in case of a failure.
The current document describes a few techniques for:
- Adding user workload health check procedures in a production environment
- Simulating software upgrade failures in a development environment
These guidelines can be used by developers for implementing user workload health check using Greenboot facilities, as well as simulating failures for testing MicroShift integration with Greenboot in CI/CD pipelines.
Follow the instructions in Auto-applying Manifests section to install a dummy user workload, without restarting the MicroShift service at this time.
Proceed by creating a health check script in the /etc/greenboot/check/required.d
directory.
The name prefix of the user script should be chosen to make sure it runs after the
40_microshift_running_check.shscript, which implements the MicroShift health check procedure for its core services.
SCRIPT_FILE=/etc/greenboot/check/required.d/50_busybox_running_check.sh
sudo curl -s https://raw.githubusercontent.com/openshift/microshift/main/docs/config/busybox_running_check.sh \
-o ${SCRIPT_FILE} && echo SUCCESS || echo ERROR
sudo chmod 755 ${SCRIPT_FILE}
Reboot the system and run the following command to examine the output of the Greenboot health checks. Note that the MicroShift core service health checks are running before the user workload health checks.
sudo journalctl -o cat -u greenboot-healthcheck.serviceThe script utilizes the MicroShift healthcheck command that is part of the
microshift binary to reuse procedures already implemented for the MicroShift
core services.
The script starts by running sanity checks to verify that it is executed from
the root account.
Then, it executes microshift healthcheck command with following options:
-v=2to increase verbosity of the output--timeout="${WAIT_TIMEOUT_SECS}s"to override default 300s timeout value--namespace busyboxto specify the Namespace of the workloads--deployments busybox-deploymentto specify Deployment to check the readiness of
Internally, microshift healthcheck checks if workload of the provided type exists and verifies its
status for the specified timeout duration, so the amount of ready replicas (Pods) matches the expected amount.
microshift healthcheck also accepts other parameters to specify other kinds
of workload: --daemonsets and --statefulsets. These options take
comma-delimited list of resources, e.g.: --daemonsets ovnkube-master,ovnkube-node.
Alternatively, a --custom option can be used with a JSON string, for example:
microshift healthcheck --custom '{"openshift-storage":{"deployments": ["lvms-operator"], "daemonsets": ["vg-manager"]}, "openshift-ovn-kubernetes":{"daemonsets": ["ovnkube-master", "ovnkube-node"]}}'
To simulate a situation with the MicroShift service failure after an upgrade,
one can remove the hostname RPM package used by MicroShift during its startup.
Run the following command to remove the package and reboot.
sudo rpm-ostree override remove hostname -rExamine the system log to monitor the upgrade verification procedure that will fail
and reboot a few times before rolling back into the previous state. Note the values
of boot_success and boot_counter GRUB variables.
$ sudo journalctl -o cat -u greenboot-healthcheck.service
...
...
Running Required Health Check Scripts...
STARTED
GRUB boot variables:
boot_success=0
boot_indeterminate=0
boot_counter=2
...
...
Waiting 600s for MicroShift service to be active and not failed
FAILURE
...
...
Run the
sudo journalctl -o cat -u redboot-task-runner.servicecommand to see the output of the pre-rollback script.
After a few failed verification attempts, the system should roll back into the
previous state. Use the rpm-ostree command to verify the current deployment.
$ rpm-ostree status
State: idle
Deployments:
edge:rhel/9/x86_64/edge
Version: 9.1 (2022-12-26T10:28:32Z)
Diff: 1 removed
RemovedBasePackages: hostname 3.20-6.el8
* edge:rhel/9/x86_64/edge
Version: 9.1 (2022-12-26T10:28:32Z)Finish by checking that all MicroShift pods run normally and cleaning up the failed rollback deployment.
sudo rpm-ostree cleanup -b -rTo simulate a situation with the MicroShift pod failure after an upgrade,
one can set the network.serviceNetwork MicroShift configuration option to a
non-default 10.66.0.0/16 value without resetting the MicroShift data at the
/var/lib/microshift directory.
Start by checking out the current file system state and modifying it by adding
the config.yaml file to its usr/etc/microshift directory.
NEW_BRANCH="microshift-config"
OSTREE_REF=$(ostree refs | head -1)
OSTREE_COM=$(ostree log ${OSTREE_REF} | grep ^commit | awk '{print $2}')
sudo ostree checkout ${OSTREE_COM} ${NEW_BRANCH}
pushd ${NEW_BRANCH}
sudo tee usr/etc/microshift/config.yaml &>/dev/null <<EOF
network:
serviceNetwork:
- 10.66.0.0/16
EOFCommit the updated file system, clean up the checked out tree and compare the
base reference with the newly created one to verify that the config.yaml file
was added at the /usr/etc/microshift directory.
sudo ostree commit --subject="MicroShift config.yaml update" --branch="${NEW_BRANCH}"
popd && sudo rm -rf ${NEW_BRANCH}
ostree diff ${OSTREE_REF} ${NEW_BRANCH}Switch to the new branch as the default deployment and reboot for changes to become effective.
BRANCH_COM=$(ostree log ${NEW_BRANCH} | grep ^commit | awk '{print $2}')
sudo rpm-ostree rebase --branch=${NEW_BRANCH} ${BRANCH_COM} -rExamine the system log to monitor the upgrade verification procedure that will fail
and reboot a few times before rolling back into the previous state. Note the pod
readiness failure in the openshift-ingress namespace.
$ sudo journalctl -o cat -u greenboot-healthcheck.service
...
...
Running Required Health Check Scripts...
STARTED
...
...
Waiting 600s for 1 pod(s) from the 'openshift-ingress' namespace to be in 'Ready' state
FAILURE
...
...
Run the
sudo journalctl -o cat -u redboot-task-runner.servicecommand to see the output of the pre-rollback script.
After a few failed verification attempts, the system should roll back into the
previous state. Use the rpm-ostree command to verify the current deployment.
$ rpm-ostree status
State: idle
Deployments:
* edge:rhel/9/x86_64/edge
Version: 9.1 (2022-12-28T16:50:54Z)
edge:eae8486a204bd72eb56ac35ca9c911a46aff3c68e83855f377ae36a3ea4e87ef
Timestamp: 2022-12-29T14:44:48ZFinish by checking that all MicroShift pods run normally and cleaning up the failed rollback deployment.
NEW_BRANCH="microshift-config"
sudo rpm-ostree cleanup -b -r
sudo ostree refs --delete ${NEW_BRANCH}