Skip to content

Conversation

@igsilya
Copy link
Contributor

@igsilya igsilya commented Aug 14, 2025

OVN-Kubernetes is always lagging behind on the version of OVN it pins. This is causing a lot of trouble with keeping up with bug fixes and especially CVE fixes on older branches, resulting in scanners flagging this image with poor security grades and much longer time for bug fixes to be delivered to customers as the PR backporting process can take weeks or even months.

Removing the pin, so every time the new build is released in FDP, it automatically gets into versions of OpneShift that use it. There is a pre-release testing process in place between FDP and OCP QE that ensures the required test coverage before the new build is released through FDP.

Keeping OKD versions separate since sometimes new major versions are not released at the same time in FDP/RHEL and CentOS, so we may need them different at some point in time.

OVN-Kubernetes is currently using the latest OVN builds already, so this PR doesn't actually change anything for the current images. But it will bring newer OVN builds automatically as soon as they are released in the future. Major version upgrades still require a separate PR.

OVN-Kubernetes is always lagging behind on the version of OVN it pins.
This is causing a lot of trouble with keeping up with bug fixes and
especially CVE fixes on older branches, resulting in scanners flagging
this image with poor security grades and much longer time for bug
fixes to be delivered to customers as the PR backporting process can
take weeks or even months.

Removing the pin, so every time the new build is released in FDP, it
automatically gets into versions of OpneShift that use it.  There is
a pre-release testing process in place between FDP and OCP QE that
ensures the required test coverage before the new build is released
through FDP.

Keeping OKD versions separate since sometimes new major versions are
not released at the same time in FDP/RHEL and CentOS, so we may need
them different at some point in time.

Signed-off-by: Ilya Maximets <[email protected]>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 14, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 14, 2025

@igsilya: This pull request references CORENET-6055 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

OVN-Kubernetes is always lagging behind on the version of OVN it pins. This is causing a lot of trouble with keeping up with bug fixes and especially CVE fixes on older branches, resulting in scanners flagging this image with poor security grades and much longer time for bug fixes to be delivered to customers as the PR backporting process can take weeks or even months.

Removing the pin, so every time the new build is released in FDP, it automatically gets into versions of OpneShift that use it. There is a pre-release testing process in place between FDP and OCP QE that ensures the required test coverage before the new build is released through FDP.

Keeping OKD versions separate since sometimes new major versions are not released at the same time in FDP/RHEL and CentOS, so we may need them different at some point in time.

OVN-Kubernetes is currently using the latest OVN builds already, so this PR doesn't actually change anything for the current images. But it will bring newer OVN builds automatically as soon as they are released in the future. Major version upgrades still require a separate PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@igsilya
Copy link
Contributor Author

igsilya commented Aug 14, 2025

The change doesn't affect the version of RPMs being installed, but just in case:
/test e2e-aws-ovn-fdp-qe

@igsilya
Copy link
Contributor Author

igsilya commented Aug 15, 2025

This PR contains no functional chnages, so all the failures are just failures on the current master...
/retest

@tssurya
Copy link
Contributor

tssurya commented Aug 15, 2025

/hold

remove hold after 22nd feature freeze for 4.20 - this should land in 4.21

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 15, 2025
@asood-rh
Copy link
Contributor

asood-rh commented Sep 4, 2025

/test e2e-aws-ovn-fdp-qe

1 similar comment
@asood-rh
Copy link
Contributor

asood-rh commented Sep 5, 2025

/test e2e-aws-ovn-fdp-qe

@igsilya
Copy link
Contributor Author

igsilya commented Sep 9, 2025

@tssurya I guess we can remove the hold now, right?
The test are failing, but, as previously said, all the failures here are just failures on master as this PR doesn't bring any functional changes.

@asood-rh
Copy link
Contributor

OVN FDP CI
IPSec Issue can be ignored as it is being worked on.

OCP-83672:anusaxen:SDN:[sig-networking] SDN IPSEC EW [FdpOvnOvs][Skipped Setup] IPSec Functionality check for FDP usecase. [Disruptive] [Serial] 
fail [github.com/openshift/openshift-tests-private/test/extended/networking/ipsec.go:859]: Interrupted by User

 OCP-47028:huirwang:SDN:[sig-networking] SDN OVN EgressIP [FdpOvnOvs] After remove EgressIP node tag, EgressIP will failover to other availabel egress nodes. [Serial] 
error: Failed to update egress node:timed out waiting for the condition

@martinkennelly
Copy link
Contributor

/unhold
/lgtm
/approve

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 30, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: igsilya, martinkennelly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 30, 2025
@martinkennelly
Copy link
Contributor

@asood-rh can you add the verified label?
Its already verified by CI + OVN FDP CI.

@martinkennelly
Copy link
Contributor

/retest

@asood-rh
Copy link
Contributor

/verified by CI + OVN FDP CI.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Sep 30, 2025
@openshift-ci-robot
Copy link
Contributor

@asood-rh: This PR has been marked as verified by CI + OVN FDP CI..

In response to this:

/verified by CI + OVN FDP CI.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD e712193 and 2 for PR HEAD 8aa4fb7 in total

@igsilya
Copy link
Contributor Author

igsilya commented Sep 30, 2025

Failures seem not related to the change.
/retest-required

@igsilya
Copy link
Contributor Author

igsilya commented Oct 1, 2025

/retest-required

@martinkennelly
Copy link
Contributor

@igsilya CI is a bit challenging for the last two weeks. Theres lots of unrelated issues now especially on aws.

@igsilya
Copy link
Contributor Author

igsilya commented Oct 1, 2025

@martinkennelly yeah, e2e-aws-ovn-edge-zones seen the last green day on Sep 22nd and the lint has never worked. Should these be overridden or are we awaiting fixes for them in near future?

@martinkennelly
Copy link
Contributor

/override ci/prow/lint

https://issues.redhat.com/browse/CORENET-6207

@martinkennelly
Copy link
Contributor

@igsilya ill merge this shortly, just want to give the fdp job another run. Its not looking good recently so its probably unrelated to your PR..

@martinkennelly
Copy link
Contributor

/override ci/prow/e2e-aws-ovn-edge-zones

Job is permafailing and the single test that failed is unrelated to this patch.

pod/edge-app-f5f866c79-4m9q6   0/1     ImagePullBackOff   0          5m1s
pod/edge-app-f5f866c79-7zk9w   0/1     ImagePullBackOff   0          5m1s

Hunting for a bug..

https://issues.redhat.com/browse/OCPBUGS-60182# < but it doesnt capture the error we see here. I asked the person who originally added the code for this edge-app who maintains it.

@asood-rh
Copy link
Contributor

asood-rh commented Oct 3, 2025

@martinkennelly FDP job is also tracked in openshift/release PR.

I had marked it verified based on the run that had only two failures. Since then there have been changes to openshift test repo.

@martinkennelly
Copy link
Contributor

martinkennelly commented Oct 3, 2025

/test e2e-aws-ovn-fdp-qe

A bunch of tests failed for unknown reasons. Don't think this job is stable enough to make as required. (Unrelated to this PR but proposed on release)

@martinkennelly
Copy link
Contributor

@martinkennelly FDP job is also tracked in openshift/release PR.

I had marked it verified based on the run that had only two failures. Since then there have been changes to openshift test repo.

Ya, this PR is fine with the FDP job after looking at the history. I'll unhold it shortly.

@martinkennelly
Copy link
Contributor

/unhold

FDP qe job is too unstable to get a signal.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 9, 2025
@martinkennelly
Copy link
Contributor

martinkennelly commented Oct 9, 2025

It was fine in the run before that except unrelated failure and known failure for ipsec.

@asood-rh
Copy link
Contributor

asood-rh commented Oct 9, 2025

@martinkennelly If it is of concern, I will create a cluster with PR image and execute the test, taking CI out of picture.

There are couple of CI related PRs out which stabilizes and results in better signal.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2663105 and 1 for PR HEAD 8aa4fb7 in total

@martinkennelly
Copy link
Contributor

@martinkennelly If it is of concern, I will create a cluster with PR image and execute the test, taking CI out of picture.

There are couple of CI related PRs out which stabilizes and results in better signal.

Its OK. I saw a clean run minus known issue with ipsec. Thanks.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 627ddba and 0 for PR HEAD 8aa4fb7 in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 8aa4fb7 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2025
@igsilya
Copy link
Contributor Author

igsilya commented Oct 10, 2025

/retest-required
On a chance that e2e-metal-ipi-ovn-dualstack-bgp-local-gw will pass. It timed out for some reason.
The other two failed jobs has never passed in their lifetime, so need to get back the overrides that were cleared by the previous automated reruns.

@martinkennelly
Copy link
Contributor

/override ci/prow/lint

https://issues.redhat.com/browse/CORENET-6207

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

@martinkennelly: Overrode contexts on behalf of martinkennelly: ci/prow/lint

In response to this:

/override ci/prow/lint

https://issues.redhat.com/browse/CORENET-6207

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@martinkennelly
Copy link
Contributor

/override ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

It hit the timeout in the previous run and its an issue thats being addressed here: openshift/release#69287

The job is borked because of flakes and that the job now consistently reaches the timeout. Unrelated to this PR.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

@martinkennelly: Overrode contexts on behalf of martinkennelly: ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

In response to this:

/override ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

It hit the timeout in the previous run and its an issue thats being addressed here: openshift/release#69287

The job is borked because of flakes and that the job now consistently reaches the timeout. Unrelated to this PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@martinkennelly
Copy link
Contributor

@igsilya bgp-local-gw is bad recently. It did pass previously on this PR. Ill let it run to completion and take action then later today if it flakes for the reasons Ive seen on other PRs..

Ill try to create bugs for the flakes I am seeing. The toil is intolerable.

@martinkennelly
Copy link
Contributor

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2025
@igsilya
Copy link
Contributor Author

igsilya commented Oct 10, 2025

FWIW, bgp-local-gw passed this time. But why th re-run of half the jobs was triggered again is beyond my understanding...

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 627ddba and 2 for PR HEAD 8aa4fb7 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 11, 2025

@igsilya: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview 8aa4fb7 link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/okd-scos-e2e-aws-ovn 8aa4fb7 link false /test okd-scos-e2e-aws-ovn
ci/prow/security 8aa4fb7 link false /test security
ci/prow/qe-perfscale-aws-ovn-small-udn-density-l3 8aa4fb7 link false /test qe-perfscale-aws-ovn-small-udn-density-l3
ci/prow/qe-perfscale-aws-ovn-small-udn-density-churn-l3 8aa4fb7 link false /test qe-perfscale-aws-ovn-small-udn-density-churn-l3
ci/prow/e2e-aws-ovn-hypershift-kubevirt 8aa4fb7 link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/e2e-aws-ovn-fdp-qe 8aa4fb7 link false /test e2e-aws-ovn-fdp-qe

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@igsilya
Copy link
Contributor Author

igsilya commented Oct 13, 2025

/test e2e-aws-ovn-local-to-shared-gateway-mode-migration
/test e2e-aws-ovn-shared-to-local-gateway-mode-migration

@openshift-merge-bot openshift-merge-bot bot merged commit 53c8c29 into openshift:master Oct 13, 2025
48 of 53 checks passed
@igsilya
Copy link
Contributor Author

igsilya commented Oct 16, 2025

/cherry-pick release-4.20

@openshift-cherrypick-robot

@igsilya: new pull request created: #2808

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants