Skip to content

Commit 5f7bb32

Browse files
committed
[WIP]OSDOCS-5265: Greenboot for MicroShift
1 parent 65ef3e0 commit 5f7bb32

14 files changed

+397
-4
lines changed

_topic_maps/_topic_map_ms.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,8 @@ Topics:
149149
File: microshift-applications
150150
- Name: Operators
151151
File: microshift-operators
152+
- Name: Greenboot health check
153+
File: microshift-greenboot
152154
# ---
153155
# Name: Networking
154156
# Dir: networking
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
:_content-type: ASSEMBLY
2+
[id="microshift-greenboot"]
3+
= The greenboot health check
4+
include::_attributes/attributes-microshift.adoc[]
5+
:context: microshift-greenboot
6+
7+
toc::[]
8+
9+
Greenboot is the generic health check framework for the `systemd` service on RPM-OSTree-based systems. The `microshift-greenboot` RPM and `greenboot-default-health-check` are optional RPM packages you can install. Greenboot is used to assess system health and automate a rollback to the last healthy state in the event of software trouble.
10+
11+
This health check framework is especially useful when you need to check for software problems and perform system rollbacks on edge devices where direct serviceability is either limited or non-existent. When health check scripts are installed and configured, health checks run every time the system starts.
12+
13+
Using greenboot can reduce your risk of being locked out of edge devices during updates and prevent a significant interruption of service if an update fails. When a failure is detected, the system boots into the last known working configuration using the `rpm-ostree` rollback capability.
14+
15+
A {product-title} health check script is included in the `microshift-greenboot` RPM. The `greenboot-default-health-check` RPM includes health check scripts verifying that DNS and `ostree` services are accessible. You can also create your own health check scripts based on the workloads you are running. You can write one that verifies that an application has started, for example.
16+
17+
[NOTE]
18+
====
19+
Health check scripts might run on a system not using an OSTree file system, but no rollback is possible in the case of an update failure.
20+
====
21+
22+
include::modules/microshift-greenboot-dir-structure.adoc[leveloffset=+1]
23+
include::modules/microshift-greenboot-microshift-health-script.adoc[leveloffset=+1]
24+
include::modules/microshift-greenboot-systemd-journal-data.adoc[leveloffset=+1]
25+
//include::modules/microshift-greenboot-create-health-check-script.adoc[leveloffset=+1]
26+
27+
[role="_additional-resources"]
28+
.Additional resources
29+
* xref:../microshift_running_apps/microshift-applications.adoc#microshift-manifests-example_applications-microshift[Auto applying manifests]
30+
31+
include::modules/microshift-greenboot-updates-workloads.adoc[leveloffset=+1]
32+
include::modules/microshift-greenboot-workloads-validation.adoc[leveloffset=+1]
33+
include::modules/microshift-greenboot-health-check-log.adoc[leveloffset=+1]
34+
include::modules/microshift-greenboot-prerollback-log.adoc[leveloffset=+1]
35+
include::modules/microshift-greenboot-check-update.adoc[leveloffset=+1]

modules/microshift-adding-service-to-blueprint.adoc

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Add the {product-title} RPM package to a blueprint and enable the {product-title
1010

1111
.Image Builder blueprint example
1212

13-
[source,toml]
13+
[source,text]
1414
----
1515
name = "minimal-microshift"
1616
@@ -21,8 +21,13 @@ groups = []
2121
2222
[[packages]]
2323
name = "microshift"
24-
version = "4.12.0-1"
24+
version = "4.13.0-1"
25+
26+
[[packages]]
27+
name = "microshift-greenboot" <1>
28+
version = "4.13.0-1"
2529
2630
[customizations.services]
2731
enabled = ["microshift"]
2832
----
33+
<1> Optional `microshift-greenboot` RPM. For more information, read the "Greenboot health check" guide in the "Running Applications" section.

modules/microshift-configuring-ovn.adoc

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ To customize your configuration, use the following table that lists the valid va
3636
|bool
3737
|false
3838
|Skip configuring OVS bridge `br-ex` in `microshift-ovs-init.service`
39-
|true ^1^
39+
|true ^[1]^
4040

4141
|`ovsInit.gatewayInterface`
4242
|Alpha
@@ -56,7 +56,10 @@ To customize your configuration, use the following table that lists the valid va
5656
|MTU value used for the pods
5757
|1300
5858
|===
59-
^1^ The OVS bridge is required. When `disableOVSInit` is true, OVS bridge `br-ex` must be configured manually.
59+
[.small]
60+
--
61+
1. The OVS bridge is required. When `disableOVSInit` is true, OVS bridge `br-ex` must be configured manually.
62+
--
6063

6164
.Example `ovn.yaml` config file:
6265

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * microshift_running applications/microshift-greenboot.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="greenboot-check-updates_{context}"]
7+
= Checking updates with a health script
8+
9+
Access the output of health check scripts in the system log after an update by using the following procedure.
10+
11+
.Procedure
12+
13+
* To access the result of update checks, run the following command:
14+
+
15+
[source, terminal]
16+
----
17+
$ sudo grub2-editenv - list | grep ^boot_success
18+
----
19+
20+
.Example output for a successful update
21+
22+
[source, terminal]
23+
----
24+
boot_success=1
25+
----
26+
27+
If your command returns `boot_success=0`, either the greenboot health check is still running, or the update is a failure.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * microshift_running applications/microshift-greenboot.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="microshift-greenboot-create-health-check-script_{context}"]
7+
= Creating a health check script
8+
9+
You can create a health check script for installed workloads by placing them in the `/etc/greenboot/check/required.d` directory. The following procedure provides an example of installing the busybox application and creating a health check script for busybox. You can use this example as a general guide for creating health check scripts for your applications.
10+
11+
.Prerequisite
12+
13+
* You have installed a workload. For this example, the busybox application is used as a workload. The "Additional resources" section that follows this procedure has a link to instructions on deploying workloads using manifests.
14+
15+
.Procedure
16+
17+
. To create a health check script, run the following command:
18+
+
19+
[source, terminal]
20+
----
21+
$ SCRIPT_FILE=/etc/greenboot/check/required.d/50_busybox_running_check.sh
22+
sudo curl -s https://raw.githubusercontent.com/openshift/microshift/3b7f6025cd77bd1bf827416fd026783ead82b7c8/docs/config/busybox_running_check.sh \
23+
-o ${SCRIPT_FILE} && echo SUCCESS || echo ERROR
24+
sudo chmod 755 ${SCRIPT_FILE}
25+
----
26+
+
27+
In this example, the script verifies that busybox is running as expected. You can replace `/etc/greenboot/check/required.d/50_busybox_running_check.sh` with your own workload details.
28+
+
29+
[NOTE]
30+
====
31+
In this example, the {product-title} core service health checks run before the user workload health checks.
32+
====
33+
34+
. To test that your script is running as expected:
35+
36+
.. Restart the system.
37+
38+
.. Once the system has restarted, run the following command:
39+
+
40+
[source, terminal]
41+
----
42+
$ sudo journalctl -o cat -u greenboot-healthcheck.service
43+
----
44+
+
45+
.Example output for the busybox health check script
46+
+
47+
[source, terminal]
48+
----
49+
...
50+
...
51+
STARTED
52+
Waiting 300s for pod image(s) from the 'busybox' namespace to be downloaded
53+
Waiting 300s for 1 pod(s) from the 'busybox' namespace to be in 'Ready' state
54+
Checking pod restart count in the 'busybox' namespace
55+
FINISHED
56+
Script '50_busybox_running_check.sh' SUCCESS
57+
----
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * microshift_running applications/microshift-greenboot.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="microshift-greenboot-dir-structure_{context}"]
7+
= How greenboot uses directories to run scripts
8+
9+
Health check scripts run from four `/etc/greenboot` directories. These scripts run in alphabetical order. Keep this in mind when you configure the scripts for your workloads.
10+
11+
When the system starts, greenboot runs the scripts in the `required.d` and `wanted.d` directories. Depending on the outcome of those scripts, greenboot continues the startup or attempts a rollback as follows:
12+
13+
. System as expected: When all of the scripts in the `required.d` directory are successful, greenboot runs any scripts present in the `/etc/greenboot/green.d` directory.
14+
15+
. System trouble: If any of the scripts in the `required.d` directory fail, greenboot runs any prerollback scripts present in the `red.d` directory, then restarts the system.
16+
17+
[NOTE]
18+
====
19+
Greenboot redirects script and health check output to the system log. When you are logged in, a daily message provides the overall system health output.
20+
====
21+
22+
[id="greenboot-directories-details_{context}"]
23+
== Greenboot directories details
24+
25+
Returning a nonzero exit code from any script means that script has failed. Greenboot restarts the system a few times to retry the scripts before attempting to roll back to the previous version.
26+
27+
* `/etc/greenboot/check/required.d` contains the health checks that must not fail.
28+
29+
** If the scripts fail, greenboot retries them three times by default. You can configure the number of retries in the `/etc/greenboot/greenboot.conf` file by setting the `GREENBOOT_MAX_BOOTS` parameter to the desired number of retries.
30+
31+
** After all retries fail, greenboot automatically initiates a rollback if one is available. If a rollback is not available, the system log output shows that manual intervention is required.
32+
33+
** The `40_microshift_running_check.sh` health check script for {product-title} is installed into this directory.
34+
35+
* `/etc/greenboot/check/wanted.d` contains health scripts that are allowed to fail without causing the system to be rolled back.
36+
37+
** If any of these scripts fail, greenboot logs the failure but does not initiate a rollback.
38+
39+
* `/etc/greenboot/green.d` contains scripts that run after greenboot has declared the start successful.
40+
41+
* `/etc/greenboot/red.d` contains scripts that run after greenboot has declared the startup as failed, including the `40_microshift_pre_rollback.sh` prerollback script. This script is executed right before a system rollback. The script performs {product-title} pod and OVN-Kubernetes cleanup to avoid potential conflicts after the system is rolled back to a previous version.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * microshift_running applications/microshift-greenboot.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="microshift-greenboot-access-health-check_{context}"]
7+
= Accessing health check output in the system log
8+
9+
You can manually access the output of health checks in the system log by using the following procedure.
10+
11+
.Procedure
12+
13+
* To access the results of a health check, run the following command:
14+
+
15+
[source, terminal]
16+
----
17+
$ sudo journalctl -o cat -u greenboot-healthcheck.service
18+
----
19+
20+
.Example output of a failed health check
21+
[source, terminal]
22+
----
23+
...
24+
...
25+
Running Required Health Check Scripts...
26+
STARTED
27+
GRUB boot variables:
28+
boot_success=0
29+
boot_indeterminate=0
30+
boot_counter=2
31+
...
32+
...
33+
Waiting 300s for MicroShift service to be active and not failed
34+
FAILURE
35+
...
36+
...
37+
----
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * microshift_running applications/microshift-greenboot.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="microshift-health-script_{context}"]
7+
= The {product-title} health script
8+
9+
The `40_microshift_running_check.sh` health check script only performs validation of core {product-title} services. Install your customized workload validation scripts in the greenboot directories to ensure successful application operations after system updates. Scripts run in alphabetical order.
10+
11+
{product-title} health checks are listed in the following table:
12+
13+
.Validation statuses and outcome for {product-title}
14+
15+
[cols="3", options="header"]
16+
|===
17+
|Validation
18+
|Pass
19+
|Fail
20+
21+
|Check that the script runs with `root` permissions
22+
|Next
23+
|`exit 0`
24+
25+
|Check that the `microshift.service` is enabled
26+
|Next
27+
|`exit 0`
28+
29+
|Wait for the `microshift.service` to be active (!failed)
30+
|Next
31+
|`exit 1`
32+
33+
|Wait for Kubernetes API health endpoints to be working and receiving traffic
34+
|Next
35+
|`exit 1`
36+
37+
|Wait for any pod to start
38+
|Next
39+
|`exit 1`
40+
41+
|For each core namespace, wait for images to be pulled
42+
|Next
43+
|`exit 1`
44+
45+
|For each core namespace, wait for pods to be ready
46+
|Next
47+
|`exit 1`
48+
49+
|For each core namespace, check if pods are not restarting
50+
|`exit 0`
51+
|`exit 1`
52+
|===
53+
54+
[id="validation-wait-period"]
55+
== Validation wait period
56+
The wait period in each validation is five minutes by default. After the wait period, if the validation has not succeeded, it is declared a failure. This wait period is incrementally increased by the base wait period after each boot in the verification loop.
57+
58+
* You can override the base-time wait period by setting the `MICROSHIFT_WAIT_TIMEOUT_SEC` environment variable in the `/etc/greenboot/greenboot.conf` configuration file. For example, you can change the wait time to three minutes by resetting the value to 180 seconds, such as `MICROSHIFT_WAIT_TIMEOUT_SEC=180`.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
2+
// Module included in the following assemblies:
3+
//
4+
// * microshift_running applications/microshift-greenboot.adoc
5+
6+
:_content-type: PROCEDURE
7+
[id="microshift-greenboot-access-prerollback-check_{context}"]
8+
= Accessing prerollback health check output in the system log
9+
10+
You can access the output of health check scripts in the system log. For example, to check the results of a prerollback script, use the following procedure.
11+
12+
.Procedure
13+
14+
* To access the results of a prerollback script, run the following command:
15+
+
16+
[source, terminal]
17+
----
18+
$ sudo journalctl -o cat -u redboot-task-runner.service
19+
----
20+
21+
.Example output of a prerollback script
22+
[source, terminal]
23+
----
24+
...
25+
...
26+
Running Red Scripts...
27+
STARTED
28+
GRUB boot variables:
29+
boot_success=0
30+
boot_indeterminate=0
31+
boot_counter=0
32+
The ostree status:
33+
* rhel c0baa75d9b585f3dd989a9cf05f647eb7ca27ee0dbd4b94fe8c93ed3a4b9e4a5.0
34+
Version: 9.1
35+
origin: <unknown origin type>
36+
rhel 6869c1347b0e0ba1bbf0be750cdf32da5138a1fcbc5a4c6325ab9eb647b64663.0 (rollback)
37+
Version: 9.1
38+
origin refspec: edge:rhel/9/x86_64/edge
39+
System rollback imminent - preparing MicroShift for a clean start
40+
Stopping MicroShift services
41+
Removing MicroShift pods
42+
Killing conmon, pause and OVN processes
43+
Removing OVN configuration
44+
Finished greenboot Failure Scripts Runner.
45+
Cleanup succeeded
46+
Script '40_microshift_pre_rollback.sh' SUCCESS
47+
FINISHED
48+
redboot-task-runner.service: Deactivated successfully.
49+
----

0 commit comments

Comments
 (0)