Skip to content

Conversation

@ormergi
Copy link
Contributor

@ormergi ormergi commented Oct 23, 2025

📑 Description

This is manual cherry-pick of IP conflict detection [1] (commits 1-3) and MAC conflict detection [2] (commits 4-12).
Commits were cherry-picked from release-4.21 branch, one by one, using git cherry-pick -x.

[1] ovn-kubernetes/ovn-kubernetes#5411
[2] ovn-kubernetes/ovn-kubernetes#5492

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

RamLavi and others added 12 commits October 23, 2025 16:15
Currently when a pod is reconciled and IPs are requested, the
ipAllocator's "already allocated" errors are ignored.

In order to block IP conflicts with other pods, only ignore duplicate IP
errors in cases when PreconfiguredUDNAddressesEnabled FG is enabled and:
- the NAD's network annotation has already persiste/allocated,
indicating that the pod was already reconciled and assigned with IPs.
- the ipamClaim already has been updated with status.IPs, indicating
that OVNK already allocated the IPs for this object requesting
persistent IPs (pod/VM).

Under these new terms, when ErrAllocated is discovered and not ignored -
publish this as a pod event.

Signed-off-by: Ram Lavi <[email protected]>
(cherry picked from commit 66298cf)
Signed-off-by: Ram Lavi <[email protected]>
(cherry picked from commit 9b16a7f)
Enable tracking used MAC addresses with owner identification.
Enabling MAC addresses conflict detection when multiple entities try
to use the same address within the same network.

The MAC manager will be integrated with cluster-manager's pod-allocator
code in follow-up commits.

Since pod-allocator run by multiple Goroutines, use Mutex to prevent
race conditions on reserve and release MACs.

Signed-off-by: Ram Lavi <[email protected]>
Co-authored-by: Or Mergi <[email protected]>
(cherry picked from commit c8b18bd)
Integrate the MAC manager to podAllocator, instantiate the MAC manager
on primary L2 networks UDNs with persistentIPs enabled, when
EnablePreconfiguredUDNAddresses is enabled.

The pod-allocator is instantiated for each network, thus network
isolation is maintained. MACs can reused in different UDNs.

On pod allocation, record the used MAC address and its owner-id,
if already used raise MAC conflict error.
Compose the owner-id in the following format:
  <pod.metadata.namespace>/<metadata.name>
E.g: Given pod namespace=blue, name=mypod, owner-id is blue/mypod

To allow VM migration scenario, where two pods should use the same MAC,
relax MAC conflicts by composing the owner-id from the associated VM name:
  <pod.metadata.namespace>/<VM name label value>
E.g: Given pod namespace=blue, name=virt-launcher-myvm-abc123 VM name=myvm,
owner id is "blue/mypod".
The VM name is reflected by the "vm.kubevirt.io/name" label

In addition, in a scenario of repeated request (same mac & owner) that was
already handled, being rollback due to failure (e.g.: pod update failure),
do not release the reserved MAC as part of the pod-allocation rollback.

MAC addresses release on pod deletion, and initializing the MAC manager
on start up will be done in follow-up commits.

Signed-off-by: Ram Lavi <[email protected]>
Co-authored-by: Or Mergi <[email protected]>
(cherry picked from commit 1d5045a)
Emit pod event when MAC conflict is detected during pod allocation
process.

Avoid user-defined network name leak to pod events, as they are
visible by non cluster-admin users.

Signed-off-by: Or Mergi <[email protected]>
(cherry picked from commit 0b38dd8)
On pod deletion, remove the MAC address used by the pod from the MAC
manager store.

To allow VM migration scenario, do not release the MAC when there is
at least one VM pod that is not in complete state.
Resolve the VM pod owner-id by composing the owner-id from the
associated VM name.

Initializing the MAC manager on start up will be done in follow-up
commits.

Signed-off-by: Ram Lavi <[email protected]>
Co-authored-by: Or Mergi <[email protected]>
(cherry picked from commit b6965d9)
Initialize the pod allocator MAC manager MACs of the network
GW and management ports, preventing conflicts with new pods
requesting those MACs.

The MAC manager is instantiated on primary L2 UDNs with
persistent IPs enabled, when EnablePreconfiguredUDNAddresses.

The network logical switch has GW (.1) and management (.2) ports.
Their MAC address is generated from the IP address.
Calculate the GW and management MAC addresses from their IP addresses.

Signed-off-by: Or Mergi <[email protected]>
Co-authored-by: Ram Lavi <[email protected]>
(cherry picked from commit eb4789b)
Initialize the pod-allocator MAC manager with MACs of existing pods
in the network.
Preventing unexpected conflicts in scenarios where the control-plane
restarts.

The MAC manager is instantiated on primary L2 UDNs with
persistent IPs enabled, when EnablePreconfiguredUDNAddresses.

VMs can have multiple associated pods with the same MAC address
(migration scenario). Allow VM associated pods have the same MAC,
by composing the owner-id from the associated VM name.

Signed-off-by: Ram Lavi <[email protected]>
(cherry picked from commit 89d1696)
Signed-off-by: Or Mergi <[email protected]>
(cherry picked from commit f46a21c)
Signed-off-by: Or Mergi <[email protected]>
(cherry picked from commit e4835ab)
In a scenario of primary CUDN where multiple NAD exist all with the same spec,
NetworkInfo.GetNADs return multiple NADs of the selected namespaces.

The GetPodNADToNetworkMappingWithActiveNetwork helper, assume the
active-network (NetworkInfo{}) consist of single NAD, and return
the mapping with the first NAD of the active-network it found.

This approach fall short when the given pod is connected to CUDN that span
over multiple namespaces, i.e.: active network consist of multiple NADs.
The helper return inconsistent mapping where the NAD key doesn't match
the pod namespace (NAD of another namespaces).

Chagne the helper to find the active-network matching NAD; the NAD that
reside at the same namespace as the given pod (matching namespace)

Change test to always set an appropriate namespace to the tested pod.

Extend the test suite to allow injecting multiple NADs for the
active-network, and simulating the CUDN use-case.

Signed-off-by: Or Mergi <[email protected]>
(cherry picked from commit cedfc13)
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 23, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 23, 2025

@ormergi: This pull request references CORENET-6475 which is a valid jira issue.

In response to this:

📑 Description

This is manual cherry-pick of IP conflict detection [1] and MAC conflict detection [2].
Commits were cherry-picked from release-4.21 branch.

[1] ovn-kubernetes/ovn-kubernetes#5411
[2] ovn-kubernetes/ovn-kubernetes#5492

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ormergi
Once this PR has been reviewed and has the lgtm label, please assign martinkennelly for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ormergi
Copy link
Contributor Author

ormergi commented Oct 23, 2025

/test unit

@ormergi
Copy link
Contributor Author

ormergi commented Oct 23, 2025

/test lint

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 23, 2025

@ormergi: This pull request references CORENET-6475 which is a valid jira issue.

In response to this:

📑 Description

This is manual cherry-pick of IP conflict detection [1] (commits 1-3) and MAC conflict detection [2] (commits 4-12).
Commits were cherry-picked from release-4.21 branch.

[1] ovn-kubernetes/ovn-kubernetes#5411
[2] ovn-kubernetes/ovn-kubernetes#5492

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 23, 2025

@ormergi: This pull request references CORENET-6475 which is a valid jira issue.

In response to this:

📑 Description

This is manual cherry-pick of IP conflict detection [1] (commits 1-3) and MAC conflict detection [2] (commits 4-12).
Commits were cherry-picked from release-4.21 branch, one by one, using git cherry-pick -x.
The last commit "utils,multinet: GetPodNADToNetworkMappingWithActiveNetwork CUDN support" introduced conflicts that were resolved manually.

[1] ovn-kubernetes/ovn-kubernetes#5411
[2] ovn-kubernetes/ovn-kubernetes#5492

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 23, 2025

@ormergi: This pull request references CORENET-6475 which is a valid jira issue.

In response to this:

📑 Description

This is manual cherry-pick of IP conflict detection [1] (commits 1-3) and MAC conflict detection [2] (commits 4-12).
Commits were cherry-picked from release-4.21 branch, one by one, using git cherry-pick -x.

[1] ovn-kubernetes/ovn-kubernetes#5411
[2] ovn-kubernetes/ovn-kubernetes#5492

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ormergi ormergi marked this pull request as ready for review October 23, 2025 14:21
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2025

@ormergi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn eae34b3 link true /test e2e-aws-ovn
ci/prow/e2e-azure-ovn-upgrade eae34b3 link true /test e2e-azure-ovn-upgrade
ci/prow/lint eae34b3 link true /test lint
ci/prow/security eae34b3 link false /test security
ci/prow/qe-perfscale-payload-control-plane-6nodes eae34b3 link true /test qe-perfscale-payload-control-plane-6nodes

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants