Skip to content

Commit cd5541c

Browse files
authored
Merge pull request #69 from chaitanyaenr/conformance
Add conformance to the workloads framework
2 parents 8357cfc + 4642b94 commit cd5541c

File tree

6 files changed

+323
-1
lines changed

6 files changed

+323
-1
lines changed

docs/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
| [Network](network.md) | TCP/UDP Throughput/Latency | Labeling Nodes, [See below](#network) |
1414
| [Deployments Per Namespace](deployments-per-ns.md) | Maximum Deployments | None |
1515
| [PVCscale](pvscale.md) | PVCScale test | Working storageclass |
16-
16+
| [Conformance](conformance.md) | OCP/Kubernetes e2e tests | None |
1717
* Baseline job without a tooled cluster just idles a cluster. The goal is to capture resource consumption over a period of time to characterize resource requirements thus tooling is required. (For now)
1818

1919
## Network
@@ -41,3 +41,4 @@ Each workload will implement a form of pass/fail criteria in order to flag if th
4141
| [Network](network.md) | No |
4242
| [Deployments Per Namespace](deployments-per-ns.md) | No |
4343
| [PVCscale](pvscale.md) | No |
44+
| [Conformance](conformance.md) | No |

docs/conformance.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Conformance Workload
2+
3+
The conformance workload playbook is `workloads/conformance.yml` and will run the conformance workload on your cluster.
4+
5+
Conformance workload's purpose is to validate if the OpenShift cluster is sane by running e2e tests.
6+
7+
Running from CLI:
8+
9+
```sh
10+
$ cp workloads/inventory.example inventory
11+
$ # Add orchestration host to inventory
12+
$ # Edit vars in workloads/vars/conformance.yml or define Environment vars (See below)
13+
$ time ansible-playbook -vv -i inventory workloads/conformance.yml
14+
```
15+
16+
## Environment variables
17+
18+
### PUBLIC_KEY
19+
Default: `~/.ssh/id_rsa.pub`
20+
Public ssh key file for Ansible.
21+
22+
### PRIVATE_KEY
23+
Default: `~/.ssh/id_rsa`
24+
Private ssh key file for Ansible.
25+
26+
### ORCHESTRATION_USER
27+
Default: `root`
28+
User for Ansible to log in as. Must authenticate with PUBLIC_KEY/PRIVATE_KEY.
29+
30+
### WORKLOAD_IMAGE
31+
Default: `quay.io/openshift-scale/scale-ci-workload`
32+
Container image that runs the workload script.
33+
34+
### WORKLOAD_JOB_NODE_SELECTOR
35+
Default: `false`
36+
Enables/disables the node selector that places the workload job on the `workload` node.
37+
38+
### WORKLOAD_JOB_TAINT
39+
Default: `false`
40+
Enables/disables the toleration on the workload job to permit the `workload` taint.
41+
42+
### WORKLOAD_JOB_PRIVILEGED
43+
Default: `false`
44+
Enables/disables running the workload pod as privileged.
45+
46+
### KUBECONFIG_FILE
47+
Default: `~/.kube/config`
48+
Location of kubeconfig on orchestration host.
49+
50+
### PBENCH_INSTRUMENTATION
51+
Default: `false`
52+
Enables/disables running the workload wrapped by pbench-user-benchmark. When enabled, pbench agents can then be enabled (`ENABLE_PBENCH_AGENTS`) for further instrumentation data and pbench-copy-results can be enabled (`ENABLE_PBENCH_COPY`) to export captured data for further analysis.
53+
54+
### ENABLE_PBENCH_AGENTS
55+
Default: `false`
56+
Enables/disables the collection of pbench data on the pbench agent Pods. These Pods are deployed by the tooling playbook.
57+
58+
### ENABLE_PBENCH_COPY
59+
Default: `false`
60+
Enables/disables the copying of pbench data to a remote results server for further analysis.
61+
62+
### PBENCH_SSH_PRIVATE_KEY_FILE
63+
Default: `~/.ssh/id_rsa`
64+
Location of ssh private key to authenticate to the pbench results server.
65+
66+
### PBENCH_SSH_PUBLIC_KEY_FILE
67+
Default: `~/.ssh/id_rsa.pub`
68+
Location of the ssh public key to authenticate to the pbench results server.
69+
70+
### PBENCH_SERVER
71+
Default: There is no public default.
72+
DNS address of the pbench results server.
73+
74+
### SCALE_CI_RESULTS_TOKEN
75+
Default: There is no public default.
76+
Future use for pbench and prometheus scraper to place results into git repo that holds results data.
77+
78+
### JOB_COMPLETION_POLL_ATTEMPTS
79+
Default: `360`
80+
Number of retries for Ansible to poll if the workload job has completed. Poll attempts delay 10s between polls with some additional time taken for each polling action depending on the orchestration host setup.
81+
82+
### CONFORMANCE_TEST_PREFIX
83+
Default: `conformance`
84+
Test to prefix the pbench results.
85+
86+
## Smoke test variables
87+
88+
```
89+
CONFORMANCE_TEST_PREFIX=conformance_smoke
90+
```

workloads/conformance.yml

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
#
3+
# Runs Conformance on OpenShift 4.x cluster
4+
#
5+
6+
- name: Runs conformance on a RHCOS OpenShift cluster
7+
hosts: orchestration
8+
gather_facts: true
9+
remote_user: "{{orchestration_user}}"
10+
vars_files:
11+
- vars/conformance.yml
12+
vars:
13+
workload_job: "conformance"
14+
tasks:
15+
- name: Create scale-ci-tooling directory
16+
file:
17+
path: "{{ansible_user_dir}}/scale-ci-tooling"
18+
state: directory
19+
20+
- name: Copy workload files
21+
copy:
22+
src: "{{item.src}}"
23+
dest: "{{item.dest}}"
24+
with_items:
25+
- src: scale-ci-tooling-ns.yml
26+
dest: "{{ansible_user_dir}}/scale-ci-tooling/scale-ci-tooling-ns.yml"
27+
- src: workload-conformance-script-cm.yml
28+
dest: "{{ansible_user_dir}}/scale-ci-tooling/workload-conformance-script-cm.yml"
29+
30+
- name: Slurp kubeconfig file
31+
slurp:
32+
src: "{{kubeconfig_file}}"
33+
register: kubeconfig_file_slurp
34+
35+
- name: Slurp ssh private key file
36+
slurp:
37+
src: "{{pbench_ssh_private_key_file}}"
38+
register: pbench_ssh_private_key_file_slurp
39+
40+
- name: Slurp ssh public key file
41+
slurp:
42+
src: "{{pbench_ssh_public_key_file}}"
43+
register: pbench_ssh_public_key_file_slurp
44+
45+
- name: Template workload templates
46+
template:
47+
src: "{{item.src}}"
48+
dest: "{{item.dest}}"
49+
with_items:
50+
- src: pbench-cm.yml.j2
51+
dest: "{{ansible_user_dir}}/scale-ci-tooling/pbench-cm.yml"
52+
- src: pbench-ssh-secret.yml.j2
53+
dest: "{{ansible_user_dir}}/scale-ci-tooling/pbench-ssh-secret.yml"
54+
- src: kubeconfig-secret.yml.j2
55+
dest: "{{ansible_user_dir}}/scale-ci-tooling/kubeconfig-secret.yml"
56+
- src: workload-job.yml.j2
57+
dest: "{{ansible_user_dir}}/scale-ci-tooling/workload-job.yml"
58+
- src: workload-env.yml.j2
59+
dest: "{{ansible_user_dir}}/scale-ci-tooling/workload-conformance-env.yml"
60+
61+
- name: Check if scale-ci-tooling namespace exists
62+
shell: |
63+
oc get project scale-ci-tooling
64+
ignore_errors: true
65+
changed_when: false
66+
register: scale_ci_tooling_ns_exists
67+
68+
- name: Ensure any stale scale-ci-conformance job is deleted
69+
shell: |
70+
oc delete job scale-ci-conformance -n scale-ci-tooling
71+
register: scale_ci_tooling_project
72+
failed_when: scale_ci_tooling_project.rc == 0
73+
until: scale_ci_tooling_project.rc == 1
74+
retries: 60
75+
delay: 1
76+
when: scale_ci_tooling_ns_exists.rc == 0
77+
78+
- name: Block for non-existing tooling namespace
79+
block:
80+
- name: Create tooling namespace
81+
shell: |
82+
oc create -f {{ansible_user_dir}}/scale-ci-tooling/scale-ci-tooling-ns.yml
83+
84+
- name: Create tooling service account
85+
shell: |
86+
oc create serviceaccount useroot -n scale-ci-tooling
87+
oc adm policy add-scc-to-user privileged -z useroot -n scale-ci-tooling
88+
when: enable_pbench_agents|bool
89+
when: scale_ci_tooling_ns_exists.rc != 0
90+
91+
- name: Create/replace kubeconfig secret
92+
shell: |
93+
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/kubeconfig-secret.yml"
94+
95+
- name: Create/replace the pbench configmap
96+
shell: |
97+
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/pbench-cm.yml"
98+
99+
- name: Create/replace pbench ssh secret
100+
shell: |
101+
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/pbench-ssh-secret.yml"
102+
103+
- name: Create/replace workload script configmap
104+
shell: |
105+
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/workload-conformance-script-cm.yml"
106+
107+
- name: Create/replace workload script environment configmap
108+
shell: |
109+
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/workload-conformance-env.yml"
110+
111+
- name: Create/replace workload job to that runs workload script
112+
shell: |
113+
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/workload-job.yml"
114+
115+
- name: Poll until job pod is running
116+
shell: |
117+
oc get pods --selector=job-name=scale-ci-conformance -n scale-ci-tooling -o json
118+
register: pod_json
119+
retries: 60
120+
delay: 2
121+
until: pod_json.stdout | from_json | json_query('items[0].status.phase==`Running`')
122+
123+
- name: Poll until job is complete
124+
shell: |
125+
oc get job scale-ci-conformance -n scale-ci-tooling -o json
126+
register: job_json
127+
retries: "{{job_completion_poll_attempts}}"
128+
delay: 10
129+
until: job_json.stdout | from_json | json_query('status.succeeded==`1` || status.failed==`1`')
130+
failed_when: job_json.stdout | from_json | json_query('status.succeeded==`1`') == false
131+
when: job_completion_poll_attempts|int > 0
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
name: scale-ci-workload-script
5+
data:
6+
run.sh: |
7+
#!/bin/sh
8+
set -eo pipefail
9+
workload_log() { echo "$(date -u) $@" >&2; }
10+
export -f workload_log
11+
workload_log "Configuring pbench for Conformance"
12+
mkdir -p /var/lib/pbench-agent/tools-default/
13+
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd
14+
if [ "${ENABLE_PBENCH_AGENTS}" = true ]; then
15+
echo "" > /var/lib/pbench-agent/tools-default/disk
16+
echo "" > /var/lib/pbench-agent/tools-default/iostat
17+
echo "workload" > /var/lib/pbench-agent/tools-default/label
18+
echo "" > /var/lib/pbench-agent/tools-default/mpstat
19+
echo "" > /var/lib/pbench-agent/tools-default/oc
20+
echo "" > /var/lib/pbench-agent/tools-default/perf
21+
echo "" > /var/lib/pbench-agent/tools-default/pidstat
22+
echo "" > /var/lib/pbench-agent/tools-default/sar
23+
master_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/master= --no-headers | awk '{print $1}'`
24+
for node in $master_nodes; do
25+
echo "master" > /var/lib/pbench-agent/tools-default/remote@$node
26+
done
27+
infra_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/infra= --no-headers | awk '{print $1}'`
28+
for node in $infra_nodes; do
29+
echo "infra" > /var/lib/pbench-agent/tools-default/remote@$node
30+
done
31+
worker_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/worker= --no-headers | awk '{print $1}'`
32+
for node in $worker_nodes; do
33+
echo "worker" > /var/lib/pbench-agent/tools-default/remote@$node
34+
done
35+
fi
36+
source /opt/pbench-agent/profile
37+
workload_log "Done configuring pbench for Conformance"
38+
39+
workload_log "Running Conformance"
40+
if [ "${PBENCH_INSTRUMENTATION}" = "true" ]; then
41+
pbench-user-benchmark -- sh /root/workload/workload.sh
42+
result_dir="/var/lib/pbench-agent/$(ls -t /var/lib/pbench-agent/ | grep "pbench-user" | head -2 | tail -1)"/1/sample1
43+
if [ "${ENABLE_PBENCH_COPY}" = "true" ]; then
44+
pbench-copy-results --prefix ${CONFORMANCE_TEST_PREFIX}
45+
fi
46+
else
47+
sh /root/workload/workload.sh
48+
result_dir=/tmp
49+
fi
50+
workload_log "Completed Conformance run"
51+
workload.sh: |
52+
#!/bin/sh
53+
set -o pipefail
54+
55+
result_dir=/tmp
56+
if [ "${PBENCH_INSTRUMENTATION}" = "true" ]; then
57+
result_dir=${benchmark_results_dir}
58+
fi
59+
start_time=$(date +%s)
60+
export KUBECONFIG=/root/.kube/config; /usr/bin/openshift-tests run openshift/conformance/parallel || exit 0
61+
end_time=$(date +%s)
62+
duration=$((end_time-start_time))
63+
workload_log "Finished running conformance and it took $duration"

workloads/templates/workload-env.yml.j2

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,4 +68,8 @@ data:
6868
SCALE_WORKER_COUNT: "{{scale_worker_count}}"
6969
SCALE_POLL_ATTEMPTS: "{{scale_poll_attempts}}"
7070
EXPECTED_SCALE_DURATION: "{{expected_scale_duration}}"
71+
{% elif workload_job == "conformance" %}
72+
PBENCH_INSTRUMENTATION: "{{pbench_instrumentation|bool|lower}}"
73+
ENABLE_PBENCH_COPY: "{{enable_pbench_copy|bool|lower}}"
74+
CONFORMANCE_TEST_PREFIX: "{{conformance_test_prefix}}"
7175
{% endif %}

workloads/vars/conformance.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
###############################################################################
3+
# Ansible SSH variables.
4+
###############################################################################
5+
ansible_public_key_file: "{{ lookup('env', 'PUBLIC_KEY')|default('~/.ssh/id_rsa.pub', true) }}"
6+
ansible_private_key_file: "{{ lookup('env', 'PRIVATE_KEY')|default('~/.ssh/id_rsa', true) }}"
7+
8+
orchestration_user: "{{ lookup('env', 'ORCHESTRATION_USER')|default('root', true) }}"
9+
###############################################################################
10+
# Conformance workload variables.
11+
###############################################################################
12+
workload_image: "{{ lookup('env', 'WORKLOAD_IMAGE')|default('quay.io/openshift-scale/scale-ci-workload', true) }}"
13+
14+
workload_job_node_selector: "{{ lookup('env', 'WORKLOAD_JOB_NODE_SELECTOR')|default(false, true)|bool }}"
15+
workload_job_taint: "{{ lookup('env', 'WORKLOAD_JOB_TAINT')|default(false, true)|bool }}"
16+
workload_job_privileged: "{{ lookup('env', 'WORKLOAD_JOB_PRIVILEGED')|default(false, true)|bool }}"
17+
18+
kubeconfig_file: "{{ lookup('env', 'KUBECONFIG_FILE')|default('~/.kube/config', true) }}"
19+
20+
# pbench variables
21+
pbench_instrumentation: "{{ lookup('env', 'PBENCH_INSTRUMENTATION')|default(false, true)|bool|lower }}"
22+
enable_pbench_agents: "{{ lookup('env', 'ENABLE_PBENCH_AGENTS')|default(false, true)|bool }}"
23+
enable_pbench_copy: "{{ lookup('env', 'ENABLE_PBENCH_COPY')|default(false, true)|bool|lower }}"
24+
pbench_ssh_private_key_file: "{{ lookup('env', 'PBENCH_SSH_PRIVATE_KEY_FILE')|default('~/.ssh/id_rsa', true) }}"
25+
pbench_ssh_public_key_file: "{{ lookup('env', 'PBENCH_SSH_PUBLIC_KEY_FILE')|default('~/.ssh/id_rsa.pub', true) }}"
26+
pbench_server: "{{ lookup('env', 'PBENCH_SERVER')|default('', true) }}"
27+
28+
# Other variables for workload tests
29+
scale_ci_results_token: "{{ lookup('env', 'SCALE_CI_RESULTS_TOKEN')|default('', true) }}"
30+
job_completion_poll_attempts: "{{ lookup('env', 'JOB_COMPLETION_POLL_ATTEMPTS')|default(360, true)|int }}"
31+
32+
# PodVertical workload specific parameters:
33+
conformance_test_prefix: "{{ lookup('env', 'CONFORMANCE_TEST_PREFIX')|default('conformance', true) }}"

0 commit comments

Comments
 (0)