Skip to content

Commit 9058107

Browse files
akrzoschaitanyaenr
authored andcommitted
Adding workload guidelines and contributing docs
1 parent 422cec6 commit 9058107

File tree

3 files changed

+188
-4
lines changed

3 files changed

+188
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
Tool an OpenShift cluster and run OpenShift Performance and Scale Workloads.
44

5+
## Documentation, Usage and Examples
6+
7+
[See docs directory](docs/)
8+
59
## Usage
610

711
1. Git clone the repo
@@ -46,7 +50,3 @@ oc get job -n scale-ci-tooling scale-ci-nodevertical -o json | jq -e '.status.su
4650
```
4751

4852
For Pass/Fail functionality, jobs will not have a succeeded status and thus have failed in CI due to the last statement in the above build job.
49-
50-
## More Docs
51-
52-
[See docs directory](docs/)

docs/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@
2828
| Pod to Pod with HostNetwork | Labeling Nodes, Open firewall ports at Host |
2929
| Pod to Service | Labeling Nodes |
3030

31+
## Workload Contributing Guidelines
32+
33+
[See this page for workload contributing guidelines.](workload_guidelines.md)
34+
3135
## CI Pass/Fail
3236

3337
Each workload will implement a form of pass/fail criteria in order to flag if the tests have failed in CI.

docs/workload_guidelines.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Workload Contributing Guidelines
2+
3+
## Overview
4+
5+
Each workload in the openshift-scale/workloads repo attempts to as closely as possible adhere to a set of guidelines regarding structure and format.
6+
7+
The guidelines help reduce the overall debugging time to get a workload up and running. With each workload following the same set of guidelines, it makes debugging and fixing a problem much easier as each workload does not require you to learn an entirely different method and/or structure for doing something.
8+
9+
For example, each workload slurps the kubeconfig and provides it as a kubernetes secret to the scale-ci-workload pod, thus there is only one way that the kubeconfig is obtained and provided. You only have to know one method to do so and where the lines of code are for it in-case there is an issue with the kubeconfig.
10+
11+
Guidelines:
12+
* Each workload should map to a single playbook
13+
* All files for a workload (playbook, vars, docs file) should strive to use the same name
14+
* One single vars file per workload playbook
15+
* All externally-provided variables to a workload should be defined in the workload vars file
16+
* Variable plumbing for workloads and [scale-ci-deploy](https://github.com/openshift-scale/scale-ci-deploy) playbooks should "mimic" each other
17+
* Externally-provided variables are plumbed in via environment variables
18+
* Environment variables are always UPPER case, keep Ansible vars lower case
19+
* Binaries and executable code should be built into [scale-ci-workload](https://github.com/openshift-scale/images/blob/master/scale-ci-workload/Dockerfile) image
20+
* Nothing should be installed while the workload is running (Unless that is the workload for whatever reason)
21+
* Workloads should be as deterministic as possible (Nothing should change itself in a non-reproducible/non-predictable way)
22+
* Each workload needs a docs file that describes its variables
23+
* Fail as early as possible
24+
* Workload scripts if possible should strive to have the least number of templating sources (Example: consider not having the workload script be a jinja2 template as a workload script can template bash environment variables and OpenShift templates already. The less templating means less confusion over the source of a variable)
25+
* Workloads should avoid `git clone` on another repo as a best practice
26+
27+
## What does a typical workload consist of:
28+
29+
A workload consists of
30+
31+
* Workload playbook
32+
* Workload vars file
33+
* Workload configmap script
34+
* Workload environment vars
35+
* Workload job pod
36+
* Binaries/executables provided in the container image
37+
* Workload docs file
38+
39+
Example (Network workload)
40+
* [Network playbook](../workloads/network.yml)
41+
* [Network vars](../workloads/vars/network.yml)
42+
* [Network configmap script](../workloads/files/workload-network-script-cm.yml)
43+
* [Network environment vars](../workloads/templates/workload-env.yml.j2)
44+
* [Workload job resource](../workloads/templates/workload-job.yml.j2)
45+
* [Network uperf image Dockerfile](https://github.com/openshift-scale/images/blob/master/scale-ci-uperf/Dockerfile)
46+
* [Network workload docs](network.md)
47+
48+
## Creating your own workload
49+
50+
The recommended method to creating or adding your own workload is to use the NodeVertical workload as a "Template" to follow.
51+
52+
### Suggested steps to add a workload:
53+
54+
1. Determine if any required binaries are not in the scale-ci-workload container image
55+
1. Create a pull request to add the binary to the [workload container image](https://github.com/openshift-scale/images)
56+
2. Determine if any additional workload container images are required
57+
1. Create a pull request to the [images repo](https://github.com/openshift-scale/images) to add a container image
58+
3. Determine what external variables you will need to populate for your workload
59+
4. Copy and use the nodevertical playbook, vars file, configmap script, and readme as a template
60+
5. Adjust playbook and vars to what your workload needs
61+
6. Replace configmap script with required test/workload
62+
7. Test your workload before opening a pull request
63+
64+
### But my workload also needs a binary/tool/thing or special container image
65+
66+
Create a pull request against the [openshift-scale/images](https://github.com/openshift-scale/images) repo adding the appropriate command to install the tool into the image. Once the pull request is merged, those container images will be rebuilt by Quay.io and available for consumption.
67+
68+
If you need another container image (For example the uperf or fio container images) also just create a pull request with the files to build your container image into the [openshift-scale/images](https://github.com/openshift-scale/images) repo
69+
70+
### What exactly should the workload script do
71+
72+
Basic workflow for a workload script:
73+
74+
1. Configure pbench agents
75+
2. Perform any pretest configuration
76+
3. Invoke workload script
77+
1. Invoke with pbench-user-benchmark
78+
2. Invoke without pbench-user-benchmark
79+
4. Check for Pass/Fail Criteria
80+
81+
Example `run.sh` file that is mounted into scale-ci-workload container image and executed. Also note the above steps match with comments in below example.
82+
83+
```sh
84+
# (1) Configure pbench agents
85+
#!/bin/sh
86+
set -eo pipefail
87+
workload_log() { echo "$(date -u) $@" >&2; }
88+
export -f workload_log
89+
workload_log "Configuring pbench for Example"
90+
mkdir -p /var/lib/pbench-agent/tools-default/
91+
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd
92+
if [ "${ENABLE_PBENCH_AGENTS}" = true ]; then
93+
echo "" > /var/lib/pbench-agent/tools-default/disk
94+
echo "" > /var/lib/pbench-agent/tools-default/iostat
95+
echo "workload" > /var/lib/pbench-agent/tools-default/label
96+
echo "" > /var/lib/pbench-agent/tools-default/mpstat
97+
echo "" > /var/lib/pbench-agent/tools-default/oc
98+
echo "" > /var/lib/pbench-agent/tools-default/perf
99+
echo "" > /var/lib/pbench-agent/tools-default/pidstat
100+
echo "" > /var/lib/pbench-agent/tools-default/sar
101+
master_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/master= --no-headers | awk '{print $1}'`
102+
for node in $master_nodes; do
103+
echo "master" > /var/lib/pbench-agent/tools-default/remote@$node
104+
done
105+
infra_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/infra= --no-headers | awk '{print $1}'`
106+
for node in $infra_nodes; do
107+
echo "infra" > /var/lib/pbench-agent/tools-default/remote@$node
108+
done
109+
worker_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/worker= --no-headers | awk '{print $1}'`
110+
for node in $worker_nodes; do
111+
echo "worker" > /var/lib/pbench-agent/tools-default/remote@$node
112+
done
113+
fi
114+
source /opt/pbench-agent/profile
115+
workload_log "Done configuring pbench Example"
116+
117+
workload_log "Configuring Example"
118+
# (2) Perform any pretest configuration
119+
workload_log "Done configuring Example"
120+
121+
workload_log "Running Example workload"
122+
if [ "${PBENCH_INSTRUMENTATION}" = "true" ]; then
123+
# (3i) Invoke with pbench-user-benchmark
124+
pbench-user-benchmark -- sh /root/workload/workload.sh
125+
result_dir="/var/lib/pbench-agent/$(ls -t /var/lib/pbench-agent/ | grep "pbench-user" | head -2 | tail -1)"/1/sample1
126+
if [ "${ENABLE_PBENCH_COPY}" = "true" ]; then
127+
pbench-copy-results --prefix ${EXAMPLE_TEST_PREFIX}
128+
fi
129+
else
130+
# (3ii) Invoke without pbench-user-benchmark
131+
sh /root/workload/workload.sh
132+
result_dir=/tmp
133+
fi
134+
workload_log "Completed Example workload run"
135+
136+
workload_log "Checking Test Results"
137+
# (4) Check for Pass/Fail Criteria
138+
workload_log "Checking Exit Code"
139+
if [ "$(jq '.exit_code==0' ${result_dir}/exit.json)" = "false" ]; then
140+
workload_log "Example workload Failure"
141+
workload_log "Test Analysis: Failed"
142+
exit 1
143+
fi
144+
workload_log "Comparing Example duration to expected duration"
145+
workload_log "Example Duration: $(jq '.duration' ${result_dir}/exit.json)"
146+
if [ "$(jq '.duration>'${EXPECTED_EXAMPLE_DURATION}'' ${result_dir}/exit.json)" = "true" ]; then
147+
workload_log "EXPECTED_EXAMPLE_DURATION (${EXPECTED_EXAMPLE_DURATION}) exceeded ($(jq '.duration' ${result_dir}/exit.json))"
148+
workload_log "Test Analysis: Failed"
149+
exit 1
150+
fi
151+
# TODO: Check pbench-agent collected metrics for Pass/Fail
152+
# TODO: Check prometheus collected metrics for Pass/Fail
153+
workload_log "Test Analysis: Passed"
154+
```
155+
156+
Example `workload.sh` which is also mounted in the container image. A separate `workload.sh` script is used for workloads which require `pbench-user-benchmark` and desire the ability to run with and without pbench agent instrumentation as well as pbench results collection.
157+
158+
```sh
159+
#!/bin/sh
160+
set -o pipefail
161+
162+
result_dir=/tmp
163+
if [ "${PBENCH_INSTRUMENTATION}" = "true" ]; then
164+
result_dir=${benchmark_results_dir}
165+
fi
166+
start_time=$(date +%s)
167+
# Insert Your workload here
168+
sleep 60 # Sleep just for example
169+
exit_code=$?
170+
end_time=$(date +%s)
171+
duration=$((end_time-start_time))
172+
173+
workload_log "Writing Exit Code"
174+
jq -n '. | ."exit_code"='${exit_code}' | ."duration"='${duration}'' > "${result_dir}/exit.json"
175+
workload_log "Finished workload script"
176+
```
177+
178+
## Adding the workload to the scale-ci-pipeline
179+
180+
Each workload if conforming to the above guidelines, should be easy to implement into the scale-ci-pipeline for continuous scale testing of OpenShift releases. Follow this [docs](https://github.com/openshift-scale/scale-ci-pipeline#modifyingadding-new-workloads-to-the-scale-ci-pipeline) to add a job with the new workload once the workload has been merged into the repo.

0 commit comments

Comments
 (0)