- 
                Notifications
    You must be signed in to change notification settings 
- Fork 166
OCPBUGS-63007: kubevirt: fix bad release of IPs of live migratable pods #2801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-4.20
Are you sure you want to change the base?
Conversation
| /jira cherrypick OCPBUGS-56783 | 
| @jcaamano: Jira Issue OCPBUGS-56783 has been cloned as Jira Issue OCPBUGS-63007. Will retitle bug to link to clone. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| @jcaamano: This pull request references Jira Issue OCPBUGS-63007, which is invalid: 
 Comment  The bug has been updated to refer to the pull request using the external bug tracker. In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| /jira refresh | 
| @jcaamano: This pull request references Jira Issue OCPBUGS-63007, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
 Requesting review from QA contact: In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| /retest | 
| /payload 4.20 ci blocking | 
| @martinkennelly: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info. | 
| @martinkennelly: trigger 5 job(s) of type blocking for the ci release of OCP 4.20 
 See details on https://pr-payload-tests.ci.openshift.org/runs/ci/efc04de0-a8e5-11f0-8974-124b476eb6ac-0 trigger 13 job(s) of type blocking for the nightly release of OCP 4.20 
 See details on https://pr-payload-tests.ci.openshift.org/runs/ci/efc04de0-a8e5-11f0-8974-124b476eb6ac-1 | 
| /override ci/prow/lint | 
| @jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. | 
| payloads that passed and got infra up look fine. | 
| 
 | 
| For job  Looks like it got killed prematurely but unsure. I also see an error for loggin in.  | 
| for job   | 
| for job  /lgtm | 
| /hold I think I might have introduced an issue. Will keep you updated @martinkennelly | 
| /backport-risk-assessed After looking at the non-test and non-renaming code. I think risk is low. Id try to avoid putting this in until 4.20 is cut but ill let staff make final decision. | 
| /lgtm cancel | 
| [APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: martinkennelly The full list of commands accepted by this bot can be found here. The pull request process is described here 
Needs approval from an approver in each of these files:
 
 Approvers can indicate their approval by writing  | 
When multiple networks support was first added, all controllers that were added used the label "Secondary" to indicate they were not "Default". When UDN was added, it allowed "Secondary" networks to function as the primary network for a pod, creating terminology confusion. We now treat non-default networks all as "User-Defined Networks". This commit changes all naming to conform to the latter. The only places secondary is used now is for distinguishing whether or not a UDN is acting as a primary or secondary network for a pod (it's role). The only exception to this is udn-isolation. I did not touch this because it relies on dbIDs, which would impact functionality for upgrade. There is no functional change in this commit. Signed-off-by: Tim Rozet <[email protected]> (cherry picked from commit bbca874)
The k8s e2e utility functions AddOrUpdateLabelOnNode/RemoveLabelOffNode don't work for labels without a value. The incorrect handling of these labels caused an incorrect sequence of nodes whem migrating different than what the tests intended to test. Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 434b48f)
There's two circumstances when IPs were being released incorrectly: * when a live migratable pod completed with no migration ongoing it was not being released due to IsMigratedSourcePodStale outright assuming a completed pod was stale. * when a live migratable pod completed on a different node than the VM's original as part of a migration it was being released when it shouldn't, we were simply not checking if it was a migration. It also improves the tests to check for IP release. Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 4c34982)
Don't attempt to release IPs that are not managed by the local zone which can happen with live migratable pods, otherwise we would get distracting error logs on release. Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 7a155cc)
ConditionalIPRelease would always return false when checking IPs not tracked in the local zone so in that case we were not correctly checking for colliding pods. This was hidden by the fact that IsMigratedSourcePodStale was used just before instead of AllVMPodsAreCompleted until a very recent fix and that would always return false for a completed live migratable pod. Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 0dc8f27)
Or completion of a failed target pod Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit c1b02b5)
As it is the most complex scenario and a superset of testing without it Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit ef92f78)
I accidentally removed the check in recent PR [1] which could have performance consequences as checking agains other pods has a cost. Reintroduce the check with a hopefully useful comment to prevent it form happening again. [1] ovn-kubernetes/ovn-kubernetes#5626 Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 76f6439)
| @jcaamano: This pull request references Jira Issue OCPBUGS-63007, which is valid. 7 validation(s) were run on this bug
 Requesting review from QA contact: In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. | 
| /hold cancel | 
d6bab55    to
    d644e2a      
    Compare
  
    | /override ci/prow/lint | 
| @jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw, ci/prow/lint In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. | 
| /override ci/prow/lint | 
| @jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint In response to this: 
 Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. | 
| /retest | 
| @jcaamano: The following tests failed, say  
 Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. | 
Cherry-picks from master for upstream fix ovn-kubernetes/ovn-kubernetes#5626 and ovn-kubernetes/ovn-kubernetes#5658.
Extra commit cherry-picked "Fix naming of "Secondary" to be "User-Defined" to avoid conflicts.
No conflicts