Migrate stressng and uperf workloads from benchmark-operator/snafu to native resource creation by arpsharm · Pull Request #1186 · redhat-performance/benchmark-runner

arpsharm · 2026-03-13T05:12:14Z

What changed

Remove benchmark-operator and snafu/benchmark-wrapper dependencies
Pod workloads use native Kubernetes Jobs
VM workloads use native KubeVirt VirtualMachines with cloud-init
VM results extracted via qemu-guest-agent
Added helper methods to oc.py for pod/VM introspection and guest-agent operations
Replaced all subprocess.run calls with oc.py methods
Added @typechecked annotations, moved initializations to __init__
New templates for native Job and VirtualMachine creation
Fixed stressng_timeout variable collision with general timeout env var
ES upload handled by benchmark-runner (pod workloads) and cloud-init curl (stressng pod)
95th percentile latency with numpy-equivalent linear interpolation
All ES fields match OG schema for Grafana compatibility
Prometheus metrics populated
380 golden files auto-regenerated from template changes

openshift-ci · 2026-03-13T05:12:18Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

arpsharm · 2026-03-13T08:28:16Z

/test all

ebattat · 2026-03-15T09:24:22Z

@arpsharm,
let's have a meeting regarding it

arpsharm · 2026-03-17T12:44:08Z

/test all

arpsharm · 2026-03-17T13:34:50Z

/test all

openshift-ci · 2026-03-24T06:04:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: arpsharm
Once this PR has been reviewed and has the lgtm label, please assign robertkrawitz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ebattat · 2026-03-24T10:02:26Z

benchmark_runner/workloads/uperf_vm.py

+            'node_range': self._environment_variables_dict.get('node_range', ''),
+            'pod_id': '',
+            'hostnetwork': self._environment_variables_dict.get('hostnetwork', 'False')
+        }


Try to create uperf_data.yaml with default values

ebattat · 2026-03-24T10:25:10Z

benchmark_runner/workloads/uperf_vm.py

+            self._environment_variables_dict['test_user'] = os.environ.get('TEST_USER', 'ripsaw')
+            self._environment_variables_dict['port'] = os.environ.get('PORT', '30000')
+            self._environment_variables_dict['run_id'] = os.environ.get('RUN_ID', 'NA')
+


Add it to init and put it inside yaml
Pls check in elastic if we need all the fields and if not remove it, for example:
self._environment_variables_dict['test_user'] = os.environ.get('TEST_USER', 'ripsaw')

ebattat · 2026-03-24T10:28:54Z

benchmark_runner/workloads/uperf_vm.py

+            for _ in range(30):
+                if not self._oc.vm_exists(vm_name=self.__client_vm_name):
+                    break
+                time.sleep(1)


Pls use existing method delete_vm_sync
Go over all the places with subprocess.run and check for existing method in oc class and use always sync method.

ebattat · 2026-03-24T10:30:17Z

benchmark_runner/workloads/uperf_vm.py

+            yaml_path = os.path.join(f'{self._run_artifacts_path}', f'{self.__name}.yaml')
+            apply_cmd = f"oc apply -f {yaml_path}"
+            result = subprocess.run(apply_cmd, shell=True, capture_output=True, text=True)
+            if result.returncode != 0:


create_vm_sync

ebattat · 2026-03-24T10:37:34Z

benchmark_runner/workloads/uperf_vm.py

+
+            # Wait for client workload to complete by polling for signal file via guest agent
+            logger.info("Waiting for uperf client workload to complete...")
+            max_wait = 600  # 10 minutes timeout


take timout from env variable or from oc class

ebattat · 2026-03-24T10:54:08Z

benchmark_runner/workloads/uperf_pod.py

+            self._environment_variables_dict['clustername'] = cluster_name
+            self._environment_variables_dict['test_user'] = os.environ.get('TEST_USER', 'ripsaw')
+            self._environment_variables_dict['port'] = os.environ.get('PORT', '30000')
+            self._environment_variables_dict['run_id'] = os.environ.get('RUN_ID', 'NA')


pls try to put it in init

ebattat · 2026-03-24T10:55:07Z

benchmark_runner/workloads/uperf_pod.py

+            time.sleep(5)
+
+            # Re-generate client YAML with server IP (template needs it)
+            from benchmark_runner.common.template_operations.template_operations import TemplateOperations


add it on the begining

ebattat · 2026-03-24T10:55:35Z

benchmark_runner/workloads/uperf_pod.py

+            # Re-generate client YAML with server IP (template needs it)
+            from benchmark_runner.common.template_operations.template_operations import TemplateOperations
+            template_ops = TemplateOperations(workload=self._workload)
+            template_ops.set_environment_variables(self._environment_variables_dict)


add it in init

ebattat · 2026-03-24T10:56:22Z

benchmark_runner/workloads/uperf_pod.py

+            logger.info(f"Client IP: {client_ip}")
+
+            # Get pod logs using oc command
+            logs_cmd = f"oc logs -n {self._environment_variables_dict['namespace']} {client_pod}"


exit in oc file, save_pod_log

ebattat · 2026-03-24T11:02:18Z

benchmark_runner/workloads/uperf_pod.py

+        self.__server_job_name = ''
+        self.__client_job_name = ''
+
+    def _parse_uperf_pod_logs(self, pod_logs, server_ip, server_node, client_node, pod_id, client_ip):


pls add data type pod_logs, server_ip, server_node, client_node, pod_id, client_ip and also check by using
@TypeChecked
@logger_time_stamp

ebattat · 2026-03-29T14:47:14Z

benchmark_runner/workloads/uperf_vm.py

+            logger.info("Server VM is ready, getting server IP")
+
+            # Get server VMI IP - retry until IP is assigned
+            namespace = self._environment_variables_dict['namespace']


should be on init

ebattat · 2026-03-31T13:51:33Z

benchmark_runner/common/oc/oc.py

+            logger.warning(f"virtctl ssh error: {e}")
+            return None
+
+    def wait_for_virtctl_ssh(self, vm_name: str, namespace: str = '', key_path: str = '', username: str = 'fedora', timeout: int = 180) -> bool:


username: str = 'fedora', make environment variable

ebattat · 2026-03-31T14:00:34Z

benchmark_runner/workloads/uperf_vm.py

+        self.__server_vm_name = f'uperf-server-{self._trunc_uuid}'
+        self.__client_vm_name = f'uperf-client-{self._trunc_uuid}'
+        self.__template_ops = TemplateOperations(workload=self._workload)
+        self.__ssh_key_path = self._environment_variables_dict.get('ssh_key_path', '/tmp/benchmark-runner-ssh-key')


We need dynamic key that generate for every workload

ebattat · 2026-03-31T14:01:51Z

benchmark_runner/workloads/uperf_vm.py

+
+            # Wait for SSH to be ready on client VM
+            logger.info("Waiting for SSH on client VM...")
+            self._oc.wait_for_virtctl_ssh(vm_name=self.__client_vm_name, namespace=namespace, key_path=self.__ssh_key_path, username='fedora', timeout=180)


username should be environment variable

ebattat · 2026-03-31T14:21:23Z

benchmark_runner/workloads/uperf_vm.py

+            workload_complete = False
+
+            for elapsed in range(0, max_wait, poll_interval):
+                check_result = self._oc.virtctl_ssh(vm_name=self.__client_vm_name, command='test -f /opt/uperf/workload_complete.signal && echo done', namespace=namespace, key_path=self.__ssh_key_path, username='fedora')


Virtctl class

def wait_for_vm_workload_completed (file_path, local_path) => should be on virtctl dir

def ssh ready

def wait for file created

def scp the file to local
** not use hard coded pem secret

Uperf_vm.py

def parse uperf vm result

workload_operation.py => if uperf and stessng log parser is the same

ebattat · 2026-03-31T14:39:31Z

...ommon/template_operations/templates/uperf/internal_data/uperf_client_cloudinit_template.yaml

@@ -0,0 +1,43 @@
+apiVersion: v1


uperf_vm_secret_template.yaml

ebattat · 2026-03-31T14:44:02Z

benchmark_runner/workloads/workloads_operations.py

+                           check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+            with open(f'{ssh_key_path}.pub', 'r') as f:
+                self._environment_variables_dict['ssh_public_key'] = f.read().strip()
+            self._environment_variables_dict['ssh_key_path'] = ssh_key_path


self._ssh_key_path= generate_ssh_key()

ebattat · 2026-03-31T15:28:11Z

...unner/common/template_operations/templates/uperf/internal_data/uperf_vm_direct_template.yaml

+                - mkdir -p /opt/uperf && chmod 777 /opt/uperf
+                - systemctl enable --now qemu-guest-agent
+                - for nic in $(ls /sys/class/net/ | grep -v lo); do ethtool -L $nic combined $(nproc) 2>/dev/null; done || true
+                - uperf -s -P 30000 > /opt/uperf/server.log 2>&1 &


export HOME=/root

export TMP=/tmp

export TEMP=/tmp

/tmp/uperf.log

python3 uperf_parser.py /tmp/uperf.log => should generate /tmp/uperf.json ( create configmap in same yaml of cloudinit with uperf_parser.py)

uperf_vm.py
-- so we need to wait for /tmp/uperf.json
-- copy /tmp/uperf.json to local

arpsharm · 2026-04-02T07:58:35Z

/test all

ebattat · 2026-04-02T15:54:40Z

benchmark_runner/common/virtctl/virtctl.py

+
+    @typechecked
+    def wait_for_file_created(self, vm_name: str, file_path: str, namespace: str = '', key_path: str = '', username: str = '', timeout: int = 3600) -> bool:
+        """


timeout: int = 3600 => pls use the timeout from env variable because there are workload the run more than hour

…rapper/operator)

openshift-ci bot added the do-not-merge/work-in-progress label Mar 13, 2026

arpsharm force-pushed the migrate-native-workloads branch 2 times, most recently from 594eece to 950a41c Compare March 13, 2026 05:21

arpsharm requested a review from ebattat March 13, 2026 05:28

arpsharm changed the title ~~Migrate stressng and uperf workloads to native Kubernetes~~ Migrate stressng and uperf workloads from benchmark-operator to direct resource creation Mar 16, 2026

arpsharm force-pushed the migrate-native-workloads branch 2 times, most recently from 98c26bb to fe01110 Compare March 17, 2026 12:18

arpsharm force-pushed the migrate-native-workloads branch from fe01110 to 360e410 Compare March 17, 2026 13:32

arpsharm force-pushed the migrate-native-workloads branch 8 times, most recently from fb7a63b to 8012f77 Compare March 24, 2026 06:04

arpsharm force-pushed the migrate-native-workloads branch from 8012f77 to 66fb387 Compare March 24, 2026 08:10

arpsharm marked this pull request as ready for review March 24, 2026 11:26

openshift-ci bot removed the do-not-merge/work-in-progress label Mar 24, 2026

openshift-ci bot requested a review from RobertKrawitz March 24, 2026 11:26

arpsharm marked this pull request as draft March 24, 2026 11:26

openshift-ci bot added the do-not-merge/work-in-progress label Mar 24, 2026

arpsharm marked this pull request as ready for review March 24, 2026 11:26

openshift-ci bot removed the do-not-merge/work-in-progress label Mar 24, 2026

arpsharm marked this pull request as draft March 24, 2026 11:27

openshift-ci bot added the do-not-merge/work-in-progress label Mar 24, 2026

ebattat reviewed Mar 24, 2026

View reviewed changes

arpsharm force-pushed the migrate-native-workloads branch from 66fb387 to ddb6980 Compare March 25, 2026 11:36

arpsharm changed the title ~~Migrate stressng and uperf workloads from benchmark-operator to direct resource creation~~ Migrate stressng and uperf workloads from benchmark-operator/snafu to native resource creation Mar 25, 2026

arpsharm force-pushed the migrate-native-workloads branch 2 times, most recently from 7922313 to 15cb446 Compare March 26, 2026 13:57

arpsharm marked this pull request as ready for review March 26, 2026 14:15

openshift-ci bot removed the do-not-merge/work-in-progress label Mar 26, 2026

openshift-ci bot requested a review from ebattat March 26, 2026 14:15

arpsharm force-pushed the migrate-native-workloads branch from 15cb446 to c86e5fd Compare March 31, 2026 13:33

ebattat reviewed Mar 31, 2026

View reviewed changes

arpsharm force-pushed the migrate-native-workloads branch from c86e5fd to c6ea527 Compare April 2, 2026 07:54

arpsharm force-pushed the migrate-native-workloads branch from c6ea527 to cbde4b2 Compare April 2, 2026 09:58

ebattat reviewed Apr 2, 2026

View reviewed changes

Migrate stressng and uperf workloads to native Kubernetes (no snafu/w…

acc8202

…rapper/operator)

arpsharm force-pushed the migrate-native-workloads branch from cbde4b2 to acc8202 Compare April 3, 2026 04:58

Conversation

arpsharm commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Uh oh!

openshift-ci bot commented Mar 13, 2026

Uh oh!

arpsharm commented Mar 13, 2026

Uh oh!

ebattat commented Mar 15, 2026

Uh oh!

arpsharm commented Mar 17, 2026

Uh oh!

arpsharm commented Mar 17, 2026

Uh oh!

openshift-ci bot commented Mar 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Virtctl class

Uperf_vm.py

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arpsharm commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arpsharm commented Mar 13, 2026 •

edited

Loading