Skip to content

Conversation

@bogdando
Copy link
Contributor

@bogdando bogdando commented Aug 13, 2025

Add DT and related VA for full PCI device passthrough
for testing GPU workloads on RHOSO.

The VA is based on nvidia-mdev-passthrough, DT is based on
nova02beta, plus bmo01 DT for provisioning outside of ci-framework stages

Jira: OSPRH-18904

@openshift-ci
Copy link

openshift-ci bot commented Aug 13, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@bogdando
Copy link
Contributor Author

Naming conflicts/consolidate nova03gamma created from nova02beta with James's nova03gamma as a virtualized solution to cover gaps in Nova team CI for the component line: #486, or simply rename to nova04delta

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,d4f84c5285176dbf188cbe3ed9f727085d89c9af

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,9018ac6fad3c1f14c660a891e4da9c722f958315

@bogdando bogdando changed the title Create Nova GPU full device PCI passthrough job Create Nova GPU full device PCI passthrough VA/DT Sep 9, 2025
@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,6e6a27ad1975aa67112042345148a7a59202c4b0

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,33073bf6eb95a761b45d2ba75e44b2f59438b6ee

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,eda2ae67dabd0bd4ed428ddc39942d6bbc21fc8c

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,fb37ed90db9056cffb67048051bef910e75af4e6

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,8ebd055983545101aa7bb90f12e3b7ffc6ea8544

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,9c7ef06b8b1c0385b6f292fc9b976b67f39e94a6

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,db074b59e4ab04fc9e4b2f043fecc9817db5011a

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,c9994ccfe96ef7e45624a413e1ba32ceaa35005f

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,74b09c5325988550926e68525f1b847e91b4d382

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,99c9c74b63515abeff20d282beeb2126af23efbb

@softwarefactory-project-zuul
Copy link
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/architecture for 602,74b09c5325988550926e68525f1b847e91b4d382

Comment on lines +47 to +50
Wait for the BareMetalHosts to become available. You can monitor the status with:
```
oc get bmh -w
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up, if you split the BMHs into their own stage, this can be improved like so:

oc wait -n openstack -l <labels added to BMH> --for jsonpath='{.status.provisioning.state}=available'

# or if the generated YAML file has only BMHs in it:
oc wait -f <path to generated YAML> --for jsonpath='{.status.provisioning.state}=available'

Copy link
Contributor Author

@bogdando bogdando Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I planned that for follow up pull requests, thanks

@bogdando bogdando force-pushed the OSPRH-18904 branch 2 times, most recently from 14d3e36 to 1c0026c Compare November 4, 2025 13:36
@softwarefactory-project-zuul
Copy link
Contributor

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3db8f13e9ed74cc9bd56d26de973478e

✔️ noop SUCCESS in 0s
rhoso-architecture-validate-nova04delta FAILURE in 3m 58s
✔️ rhoso-architecture-validate-nova04delta-adoption SUCCESS in 4m 30s
✔️ rhoso-architecture-validate-nvidia-vfio-passthrough SUCCESS in 5m 03s
✔️ rhoso-architecture-validate-nvidia-vfio-passthrough-adoption SUCCESS in 4m 33s

@bogdando bogdando requested a review from abays November 4, 2025 13:56
@bogdando
Copy link
Contributor Author

bogdando commented Nov 4, 2025

recheck-rdo

@softwarefactory-project-zuul
Copy link
Contributor

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2c1ad407a0214fafae8e75ec73508088

✔️ noop SUCCESS in 0s
rhoso-architecture-validate-nova04delta FAILURE in 3m 49s
✔️ rhoso-architecture-validate-nova04delta-adoption SUCCESS in 3m 45s
✔️ rhoso-architecture-validate-nvidia-vfio-passthrough SUCCESS in 5m 06s
✔️ rhoso-architecture-validate-nvidia-vfio-passthrough-adoption SUCCESS in 4m 32s

@fultonj
Copy link
Contributor

fultonj commented Nov 4, 2025

TASK [ci_gen_kustomize_values : Generate CI snippet backup=True, dest={{
  (snippet_datadir,
   '02_ci_data.yaml') | path_join
}}, src={{ _tmpl_check_path | first }}, mode=0644] ***
Tuesday 04 November 2025  14:01:35 +0000 (0:00:00.170)       0:00:43.568 ****** 
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleUndefinedVariable: 'cifmw_baremetal_hosts' is undefined. 'cifmw_baremetal_hosts' is undefined
task path: /home/zuul/src/github.com/openstack-k8s-operators/ci-framework/roles/ci_gen_kustomize_values/tasks/generate_snippets.yml:118
fatal: [localhost]: FAILED! => 
    changed: false
    msg: 'AnsibleUndefinedVariable: ''cifmw_baremetal_hosts'' is undefined. ''cifmw_baremetal_hosts''
      is undefined'

from https://softwarefactory-project.io/zuul/t/rdoproject.org/build/061010eddb254320ac1fa7627344dca4

Copy link
Contributor

@fultonj fultonj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks reasonable to me.

Once the CI errors are fixed I'd be fine with merging.

Add DT and related VA for full PCI device passthrough
for testing GPU workloads on RHOSO.

The VA is based on nvidia-mdev-passthrough, DT is based on
nova02beta.

Jira: #OSPRH-18904

Signed-off-by: Bohdan Dobrelia <[email protected]>
@bogdando
Copy link
Contributor Author

bogdando commented Nov 5, 2025

All done.

@bogdando bogdando requested a review from fultonj November 5, 2025 09:33
Copy link
Contributor

@abays abays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Nov 5, 2025
@openshift-ci
Copy link

openshift-ci bot commented Nov 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, bogdando

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Nov 5, 2025
@abays
Copy link
Contributor

abays commented Nov 5, 2025

recheck-gate

@softwarefactory-project-zuul
Copy link
Contributor

@softwarefactory-project-zuul softwarefactory-project-zuul bot merged commit 8a41ca4 into openstack-k8s-operators:main Nov 5, 2025
9 checks passed
@bogdando
Copy link
Contributor Author

bogdando commented Nov 5, 2025

/cherry-pick 18.0-fr4

@openshift-cherrypick-robot

@bogdando: new pull request created: #649

In response to this:

/cherry-pick 18.0-fr4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

softwarefactory-project-zuul bot added a commit that referenced this pull request Nov 7, 2025
…02-to-18.0-fr4

[18.0-fr4] Create a DT and VA for GPU full device PCI pass-through for GPU workloads testing

This is an automated cherry-pick of #602
/assign bogdando

Reviewed-by: Andrew Bays <[email protected]>
@bogdando bogdando deleted the OSPRH-18904 branch November 7, 2025 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants