Skip to content

Conversation

@jcaamano
Copy link
Contributor

@jcaamano jcaamano commented Oct 13, 2025

Cherry-picks from master for upstream fix ovn-kubernetes/ovn-kubernetes#5626 and ovn-kubernetes/ovn-kubernetes#5658.
Extra commit cherry-picked "Fix naming of "Secondary" to be "User-Defined" to avoid conflicts.
No conflicts

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 13, 2025
@jcaamano jcaamano changed the title Ocpbugs real 56783 4.20 kubevirt: fix bad release of IPs of live migratable pods Oct 13, 2025
@jcaamano
Copy link
Contributor Author

/jira cherrypick OCPBUGS-56783

@openshift-ci-robot
Copy link
Contributor

@jcaamano: Jira Issue OCPBUGS-56783 has been cloned as Jira Issue OCPBUGS-63007. Will retitle bug to link to clone.
/retitle OCPBUGS-63007: kubevirt: fix bad release of IPs of live migratable pods

In response to this:

/jira cherrypick OCPBUGS-56783

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title kubevirt: fix bad release of IPs of live migratable pods OCPBUGS-63007: kubevirt: fix bad release of IPs of live migratable pods Oct 13, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 13, 2025
@openshift-ci-robot
Copy link
Contributor

@jcaamano: This pull request references Jira Issue OCPBUGS-63007, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected dependent Jira Issue OCPBUGS-56783 to target a version in 4.21.0, but it targets "4.21" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Cherry-picks from master for upstream fix ovn-kubernetes/ovn-kubernetes#5626.
Extra commit cherry-picked "Fix naming of "Secondary" to be "User-Defined" to avoid conflicts.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcaamano
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 13, 2025
@openshift-ci-robot
Copy link
Contributor

@jcaamano: This pull request references Jira Issue OCPBUGS-63007, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-56783 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-56783 targets the "4.21.0" version, which is one of the valid target versions: 4.21.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcaamano
Copy link
Contributor Author

/retest

@martinkennelly
Copy link
Contributor

martinkennelly commented Oct 14, 2025

/payload 4.20 ci blocking
/payload 4.20 nightly blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 14, 2025

@martinkennelly: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 14, 2025

@martinkennelly: trigger 5 job(s) of type blocking for the ci release of OCP 4.20

  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aks
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/efc04de0-a8e5-11f0-8974-124b476eb6ac-0

trigger 13 job(s) of type blocking for the nightly release of OCP 4.20

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial-1of2
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial-2of2
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-3of3
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/efc04de0-a8e5-11f0-8974-124b476eb6ac-1

@jcaamano
Copy link
Contributor Author

/override ci/prow/lint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 14, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@martinkennelly
Copy link
Contributor

payloads that passed and got infra up look fine.

@martinkennelly
Copy link
Contributor

martinkennelly commented Oct 14, 2025

qe-perfscale-payload-control-plane-6nodes - cannot trust the job numbers unfortunately. Discussion last fri to remove this as required job until its stable and results comprehensible : https://redhat-internal.slack.com/archives/CU9HKBZKJ/p1759941599018259

@martinkennelly
Copy link
Contributor

For job e2e-metal-ipi-ovn-dualstack-bgp-local-gw unrelated failure.

Looks like it got killed prematurely but unsure. I also see an error for loggin in.

ERRO[2025-10-14T13:55:59Z] Some steps failed:                           
ERRO[2025-10-14T13:55:59Z] 
  * could not run steps: execution cancelled 
INFO[2025-10-14T13:55:59Z] Reporting job state 'failed' with reason 'executing_graph:interrupted' 
error: build error: Failed to push image: trying to reuse blob sha256:25c75c34b2e2b68ba9245d9cddeb6b8a0887371ed30744064f85241a75704d87 at destination: unable to retrieve auth token: invalid username/password: authentication required

@martinkennelly
Copy link
Contributor

for job e2e-gcp-ovn
Seems unrelated.. :

: [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] removes definition from spec when one version gets changed to not be served [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s] expand_less	28s
{  fail [k8s.io/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:475]: failed to wait for definition "com.example.crd-publish-openapi-test-multi-to-single-ver.v5.e2e-test-crd-publish-openapi-4112-crd" to be served with the right OpenAPI schema: failed to wait for OpenAPI spec validating condition: Get "https://api.ci-op-nh7bd963-04dd4.XXXXXXXXXXXXXXXXXXXXXX:6443/openapi/v2": EOF; lastMsg:}

@martinkennelly
Copy link
Contributor

for job 4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade i see some extra disruption when talking to api server but unknown if its related or flaking. Ill let the retry decide i guess.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2025
@jcaamano
Copy link
Contributor Author

/hold

I think I might have introduced an issue. Will keep you updated @martinkennelly

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2025
@martinkennelly
Copy link
Contributor

/backport-risk-assessed

After looking at the non-test and non-renaming code. I think risk is low. Id try to avoid putting this in until 4.20 is cut but ill let staff make final decision.

@jcaamano
Copy link
Contributor Author

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: martinkennelly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jcaamano
Copy link
Contributor Author

@martinkennelly ovn-kubernetes/ovn-kubernetes#5658

trozet and others added 8 commits October 24, 2025 09:04
When multiple networks support was first added, all controllers that
were added used the label "Secondary" to indicate they were not
"Default". When UDN was added, it allowed "Secondary" networks to
function as the primary network for a pod, creating terminology
confusion. We now treat non-default networks all as "User-Defined
Networks". This commit changes all naming to conform to the latter.

The only places secondary is used now is for distinguishing whether or
not a UDN is acting as a primary or secondary network for a pod (it's
role).

The only exception to this is udn-isolation. I did not touch this
because it relies on dbIDs, which would impact functionality for
upgrade.

There is no functional change in this commit.

Signed-off-by: Tim Rozet <[email protected]>
(cherry picked from commit bbca874)
The k8s e2e utility functions AddOrUpdateLabelOnNode/RemoveLabelOffNode
don't work for labels without a value. The incorrect handling of these
labels caused an incorrect sequence of nodes whem migrating different
than what the tests intended to test.

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit 434b48f)
There's two circumstances when IPs were being released incorrectly:

* when a live migratable pod completed with no migration ongoing it was
  not being released due to IsMigratedSourcePodStale outright assuming a
  completed pod was stale.
* when a live migratable pod completed on a different node than the VM's
  original as part of a migration it was being released when it
  shouldn't, we were simply not checking if it was a migration.

It also improves the tests to check for IP release.

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit 4c34982)
Don't attempt to release IPs that are not managed by the local zone
which can happen with live migratable pods, otherwise we would get
distracting error logs on release.

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit 7a155cc)
ConditionalIPRelease would always return false when checking IPs not
tracked in the local zone so in that case we were not correctly checking
for colliding pods.

This was hidden by the fact that IsMigratedSourcePodStale was used just
before instead of AllVMPodsAreCompleted until a very recent fix and that
would always return false for a completed live migratable pod.

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit 0dc8f27)
Or completion of a failed target pod

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit c1b02b5)
As it is the most complex scenario and a superset of testing without it

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit ef92f78)
I accidentally removed the check in recent PR [1] which could have
performance consequences as checking agains other pods has a cost.
Reintroduce the check with a hopefully useful comment to prevent it form
happening again.

[1] ovn-kubernetes/ovn-kubernetes#5626

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
(cherry picked from commit 76f6439)
@openshift-ci-robot
Copy link
Contributor

@jcaamano: This pull request references Jira Issue OCPBUGS-63007, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.z) matches configured target version for branch (4.20.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-56783 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-56783 targets the "4.21.0" version, which is one of the valid target versions: 4.21.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

Cherry-picks from master for upstream fix ovn-kubernetes/ovn-kubernetes#5626 and ovn-kubernetes/ovn-kubernetes#5658.
Extra commit cherry-picked "Fix naming of "Secondary" to be "User-Defined" to avoid conflicts.
No conflicts

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcaamano
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2025
@jcaamano jcaamano force-pushed the ocpbugs-real-56783-4.20 branch from d6bab55 to d644e2a Compare October 24, 2025 09:07
@jcaamano
Copy link
Contributor Author

/override ci/prow/lint
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw, ci/prow/lint

In response to this:

/override ci/prow/lint
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jcaamano
Copy link
Contributor Author

/override ci/prow/lint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jcaamano
Copy link
Contributor Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

@jcaamano: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security d644e2a link false /test security
ci/prow/4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade-ipsec d644e2a link false /test 4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade-ipsec
ci/prow/qe-perfscale-payload-control-plane-6nodes d644e2a link true /test qe-perfscale-payload-control-plane-6nodes
ci/prow/lint d644e2a link true /test lint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants