|
| 1 | +The `upgrade` suite is used to verify that upgrades can complete successfully |
| 2 | +without disrupting any ongoing workloads. |
| 3 | + |
| 4 | +The diagram below represents the upgrade test directory from the squid release |
| 5 | +branch. Each release branch upgrade directory includes X-2 upgrade testing. That |
| 6 | +means, we can test the upgrade from 2 previous releases to the current one. |
| 7 | + |
| 8 | +``` |
| 9 | +upgrade |
| 10 | +├── quincy-x |
| 11 | +│ ├── filestore-remove-check |
| 12 | +│ ├── parallel |
| 13 | +│ │ ├── 0-start.yaml |
| 14 | +│ │ ├── 1-tasks.yaml |
| 15 | +│ │ ├── upgrade-sequence.yaml |
| 16 | +│ │ └── workload |
| 17 | +│ └── stress-split |
| 18 | +| |
| 19 | +├── reef-x |
| 20 | +│ ├── parallel |
| 21 | +│ │ └── workload |
| 22 | +│ └── stress-split |
| 23 | +| |
| 24 | +├── squid-p2p |
| 25 | +│ ├── squid-p2p-parallel |
| 26 | +│ └── squid-p2p-stress-split |
| 27 | +| |
| 28 | +└── telemetry-upgrade |
| 29 | + ├── quincy-x |
| 30 | + └── reef-x |
| 31 | +
|
| 32 | +``` |
| 33 | + |
| 34 | +Based on the above example where X=squid, it is possible to test the upgrades |
| 35 | +from Quincy (X-2) or from Reef (X-1) to Squid (X). |
| 36 | + |
| 37 | +- The `upgrade/quincy-x/parallel` and `upgrade/reef-x/parallel` sub-suite |
| 38 | + installs a Quincy or Reef cluster, then upgrades the cluster to Squid (X). In |
| 39 | + parallel, some workloads are run against the cluster, including telemetry |
| 40 | + workunits. |
| 41 | +- The `upgrade/telemetry-upgrade` sub-suite is identical to |
| 42 | + `upgrade/quincy-x/parallel` and `upgrade/reef-x/parallel` sub-suites above, |
| 43 | + but these only test the telemetry workunits and do not run any other |
| 44 | + workloads. |
| 45 | + |
| 46 | +A simple upgrade test contains these steps in order, divided into separate yaml |
| 47 | +files: |
| 48 | +``` |
| 49 | +├── 0-start.yaml |
| 50 | +├── 1-tasks.yaml |
| 51 | +├── upgrade-sequence.yaml |
| 52 | +└── workload |
| 53 | +``` |
| 54 | + |
| 55 | +- `0-start.yaml`: This file contains the information about the ceph cluster |
| 56 | + configuration (number of osds, monitors etc) for the test |
| 57 | +- `1-tasks.yaml`: This file contains the information of the tasks we want to run |
| 58 | + on the cluster. It is here that we install an older release, then begin |
| 59 | + running the given `workload` and `upgrade-sequence` in parallel. |
| 60 | +- `upgrade-sequence.yaml`: This file contains the steps for upgrading the |
| 61 | + cluster to the designated release |
| 62 | +- `workloads`: A set of yaml file with workloads we want to run while the |
| 63 | + upgrade is in progress |
| 64 | + |
| 65 | +``` |
| 66 | +- print: "**** done start parallel" |
| 67 | +- parallel: |
| 68 | + - workload |
| 69 | + - upgrade-sequence |
| 70 | +- print: "**** done end parallel" |
| 71 | +``` |
| 72 | + |
| 73 | +The `workload` directory contains the workload yaml files just like any other |
| 74 | +suite and the `upgrade-sequence` is responsible for initiating the upgrade and |
| 75 | +waiting for it to complete. |
| 76 | + |
| 77 | +``` |
| 78 | +# renamed tasks: to upgrade-sequence: |
| 79 | +upgrade-sequence: |
| 80 | + sequential: |
| 81 | + - print: "**** done start upgrade, wait" |
| 82 | + ... |
| 83 | + mon.a: |
| 84 | + - ceph config set global log_to_journald false --force |
| 85 | + - ceph orch upgrade start --image quay.ceph.io/ceph-ci/ceph:$sha1 |
| 86 | + - while ceph orch upgrade status | jq '.in_progress' | grep true && ! ceph orch upgrade status | jq '.message' | grep Error ; do ceph orch ps ; ceph versions ; ceph orch upgrade status ; sleep 30 ; done |
| 87 | + ... |
| 88 | + - print: "**** done end upgrade, wait..." |
| 89 | +``` |
| 90 | + |
| 91 | +## Telemetry Upgrade tests |
| 92 | + |
| 93 | +The telemetry upgrade sub-suite verifies that telemetry is emitting the correct |
| 94 | +collections after the upgrade. This integration test coverage is done via |
| 95 | +workunits. Workunits are basically bash scripts that run commands against a Ceph |
| 96 | +Cluster. |
| 97 | + |
| 98 | +In the same manner as the `upgrade/parallel` tests, each release branch |
| 99 | +references the `qa/workunits` directory, which includes telemetry bash scripts |
| 100 | +for X-2 releases. That means we can test telemetry before and after the upgrade |
| 101 | +from previous two releases to the current one. |
| 102 | + |
| 103 | +For instance, the relevant telemetry workunits for the `squid` release are: |
| 104 | +``` |
| 105 | +qa/workunits |
| 106 | +├── test_telemetry_quincy.sh |
| 107 | +├── test_telemetry_quincy_x.sh |
| 108 | +├── test_telemetry_reef.sh |
| 109 | +└── test_telemetry_reef_x.sh |
| 110 | +``` |
| 111 | + |
| 112 | +- `test_telemetry_quincy.sh`, tests the presence of telemetry collection on a |
| 113 | + Quincy cluster before the upgrade. |
| 114 | +- `test_telemetry_quincy_x.sh`, tests the presence of new telemetry collection |
| 115 | + on the X-version cluster after it has been upgraded from Quincy. |
| 116 | +- `test_telemetry_reef.sh`, tests the presence of telemetry collection on a Reef |
| 117 | + cluster before the upgrade |
| 118 | +- `test_telemetry_reef_x.sh`, tests the presence of new telemetry collection on |
| 119 | + the X-version cluster after it has been upgraded from Reef. |
| 120 | + |
| 121 | +A sample telemetry upgrade test file contains the following test: |
| 122 | +``` |
| 123 | +... |
| 124 | +# Assert that new collections are available |
| 125 | +COLLECTIONS=$(ceph telemetry collection ls) |
| 126 | +NEW_COLLECTIONS=("perf_perf" "basic_mds_metadata" "basic_pool_usage" |
| 127 | + "basic_rook_v01" "perf_memory_metrics" "basic_pool_options_bluestore") |
| 128 | +for col in ${NEW_COLLECTIONS[@]}; do |
| 129 | + if ! [[ $COLLECTIONS == *$col* ]]; |
| 130 | + then |
| 131 | + echo "COLLECTIONS does not contain" "'"$col"'." |
| 132 | + exit 1 |
| 133 | + fi |
| 134 | +done |
| 135 | +... |
| 136 | +``` |
| 137 | + |
| 138 | +These workunits are used in the `upgrade` suite, specifically in: |
| 139 | +- [upgrade/quincy-x/parallel](https://github.com/ceph/ceph/blob/squid/qa/suites/upgrade/quincy-x/parallel/1-tasks.yaml) |
| 140 | +- [upgrade/reef-x/parallel](https://github.com/ceph/ceph/blob/squid/qa/suites/upgrade/reef-x/parallel/1-tasks.yaml) |
| 141 | +- [upgrade/telemetry-upgrade](https://github.com/ceph/ceph/tree/squid/qa/suites/upgrade/telemetry-upgrade) |
| 142 | + |
| 143 | +``` |
| 144 | +
|
| 145 | +upgrade |
| 146 | +├── reef-x |
| 147 | +│ ├── parallel |
| 148 | +│ │ └── 1-tasks.yaml |
| 149 | +├── squid-x |
| 150 | +│ ├── parallel |
| 151 | +│ │ └── 1-tasks.yaml |
| 152 | +└── telemetry-upgrade |
| 153 | + ├── quincy-x |
| 154 | + └── reef-x |
| 155 | +
|
| 156 | +
|
| 157 | +``` |
| 158 | + |
| 159 | +The `upgrade/quincy-x/parallel` and `upgrade/reef-x/parallel` sub-suite installs |
| 160 | +a Quincy or Reef cluster, then upgrades the cluster to Squid. In parallel, some |
| 161 | +workloads are run against the cluster, including telemetry workunits. The |
| 162 | +`1-tasks.yaml` file is the place where the workunits are run. |
| 163 | + |
| 164 | +For instance, the `upgrade/quincy-x/parallel/1-tasks.yaml` file from the |
| 165 | +`squid` release branch looks like this: |
| 166 | + |
| 167 | +``` |
| 168 | +... |
| 169 | +- print: "**** done start telemetry quincy..." |
| 170 | +- workunit: |
| 171 | + clients: |
| 172 | + client.0: |
| 173 | + - test_telemetry_quincy.sh |
| 174 | +- print: "**** done end telemetry quincy..." |
| 175 | +
|
| 176 | +- print: "**** done start parallel" |
| 177 | +- parallel: |
| 178 | + - workload |
| 179 | + - upgrade-sequence |
| 180 | +- print: "**** done end parallel" |
| 181 | +
|
| 182 | +- print: "**** done start telemetry x..." |
| 183 | +- workunit: |
| 184 | + clients: |
| 185 | + client.0: |
| 186 | + - test_telemetry_quincy_x.sh |
| 187 | +- print: "**** done end telemetry x..." |
| 188 | +``` |
| 189 | + |
| 190 | +The `test_telemetry_quincy.sh` workunit is run on the Quincy cluster before the |
| 191 | +upgrade and `test_telemetry_quincy_x.sh` is run on the X-version cluster (in |
| 192 | +this example `squid`) after the upgrade. |
| 193 | + |
| 194 | +The `upgrade/telemetry-upgrade` sub-suite is identical to the |
| 195 | +`upgrade/quincy-x/parallel` and `upgrade/reef-x/parallel` suites as above, but |
| 196 | +these tests ONLY test the telemetry workunits and do not run with any other |
| 197 | +workloads. |
| 198 | + |
| 199 | +So `upgrade/telemetry-upgrade` is nice to schedule when you just want to verify |
| 200 | +that the telemetry workunits are working as expected (they complete much |
| 201 | +faster). The `upgrade/[quincy|reef]-x/parallel` suites are nice to schedule if |
| 202 | +you want to verify that the workunits are working fine with all the other |
| 203 | +workloads also running. |
| 204 | + |
| 205 | +### What tests to update when a telemetry collection is added/removed |
| 206 | + |
| 207 | +- If the collection is added only to the `main` branch or the current release |
| 208 | + (`tentacle`), - only update the `test_telemetry_{X-2}_x.sh` and |
| 209 | + `test_telemetry_{X-1}_x.sh`, (where *`x-2` is Reef and `x-1` is Squid for |
| 210 | + `tentacle` release branch*) |
| 211 | +- If the collection are backported to the `X-2` releases then update the - |
| 212 | + `test_telemetry_{X-2}.sh` and `test_telemetry_{X-1}.sh` (where *`x-2` is Reef |
| 213 | + and `x-1` is Squid for `tentacle` release branch*) files to reflect the |
| 214 | + collection changes there |
0 commit comments