Skip to content

Conversation

@kyrtapz
Copy link
Contributor

@kyrtapz kyrtapz commented Oct 3, 2025

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kyrtapz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2025
RamLavi and others added 13 commits October 8, 2025 13:09
Enable tracking used MAC addresses with owner identification.
Enabling MAC addresses conflict detection when multiple entities try
to use the same address within the same network.

The MAC manager will be integrated with cluster-manager's pod-allocator
code in follow-up commits.

Since pod-allocator run by multiple Goroutines, use Mutex to prevent
race conditions on reserve and release MACs.

Signed-off-by: Ram Lavi <[email protected]>
Co-authored-by: Or Mergi <[email protected]>
Integrate the MAC manager to podAllocator, instantiate the MAC manager
on primary L2 networks UDNs with persistentIPs enabled, when
EnablePreconfiguredUDNAddresses is enabled.

The pod-allocator is instantiated for each network, thus network
isolation is maintained. MACs can reused in different UDNs.

On pod allocation, record the used MAC address and its owner-id,
if already used raise MAC conflict error.
Compose the owner-id in the following format:
  <pod.metadata.namespace>/<metadata.name>
E.g: Given pod namespace=blue, name=mypod, owner-id is blue/mypod

To allow VM migration scenario, where two pods should use the same MAC,
relax MAC conflicts by composing the owner-id from the associated VM name:
  <pod.metadata.namespace>/<VM name label value>
E.g: Given pod namespace=blue, name=virt-launcher-myvm-abc123 VM name=myvm,
owner id is "blue/mypod".
The VM name is reflected by the "vm.kubevirt.io/name" label

In addition, in a scenario of repeated request (same mac & owner) that was
already handled, being rollback due to failure (e.g.: pod update failure),
do not release the reserved MAC as part of the pod-allocation rollback.

MAC addresses release on pod deletion, and initializing the MAC manager
on start up will be done in follow-up commits.

Signed-off-by: Ram Lavi <[email protected]>
Co-authored-by: Or Mergi <[email protected]>
Emit pod event when MAC conflict is detected during pod allocation
process.

Avoid user-defined network name leak to pod events, as they are
visible by non cluster-admin users.

Signed-off-by: Or Mergi <[email protected]>
On pod deletion, remove the MAC address used by the pod from the MAC
manager store.

To allow VM migration scenario, do not release the MAC when there is
at least one VM pod that is not in complete state.
Resolve the VM pod owner-id by composing the owner-id from the
associated VM name.

Initializing the MAC manager on start up will be done in follow-up
commits.

Signed-off-by: Ram Lavi <[email protected]>
Co-authored-by: Or Mergi <[email protected]>
Initialize the pod allocator MAC manager MACs of the network
GW and management ports, preventing conflicts with new pods
requesting those MACs.

The MAC manager is instantiated on primary L2 UDNs with
persistent IPs enabled, when EnablePreconfiguredUDNAddresses.

The network logical switch has GW (.1) and management (.2) ports.
Their MAC address is generated from the IP address.
Calculate the GW and management MAC addresses from their IP addresses.

Signed-off-by: Or Mergi <[email protected]>
Co-authored-by: Ram Lavi <[email protected]>
Initialize the pod-allocator MAC manager with MACs of existing pods
in the network.
Preventing unexpected conflicts in scenarios where the control-plane
restarts.

The MAC manager is instantiated on primary L2 UDNs with
persistent IPs enabled, when EnablePreconfiguredUDNAddresses.

VMs can have multiple associated pods with the same MAC address
(migration scenario). Allow VM associated pods have the same MAC,
by composing the owner-id from the associated VM name.

Signed-off-by: Ram Lavi <[email protected]>
In a scenario of primary CUDN where multiple NAD exist all with the same spec,
NetworkInfo.GetNADs return multiple NADs of the selected namespaces.

The GetPodNADToNetworkMappingWithActiveNetwork helper, assume the
active-network (NetworkInfo{}) consist of single NAD, and return
the mapping with the first NAD of the active-network it found.

This approach fall short when the given pod is connected to CUDN that span
over multiple namespaces, i.e.: active network consist of multiple NADs.
The helper return inconsistent mapping where the NAD key doesn't match
the pod namespace (NAD of another namespaces).

Chagne the helper to find the active-network matching NAD; the NAD that
reside at the same namespace as the given pod (matching namespace)

Change test to always set an appropriate namespace to the tested pod.

Extend the test suite to allow injecting multiple NADs for the
active-network, and simulating the CUDN use-case.

Signed-off-by: Or Mergi <[email protected]>
Not waiting for `killall` to terminate can cause the Kubevirt console
expecter/matcher to incorrectly match the negative case.
This occurs because the "Exit 1" string may prematurely appear in the output.

Signed-off-by: Enrique Llorente <[email protected]>
…heck

udn, primary, layer2: Detect MAC conflicts
docs: Add instructions for CI failures
@kyrtapz kyrtapz marked this pull request as ready for review October 9, 2025 18:41
@openshift-ci openshift-ci bot requested review from jcaamano and tssurya October 9, 2025 18:42
@kyrtapz
Copy link
Contributor Author

kyrtapz commented Oct 10, 2025

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

@kyrtapz: The following commands are available to trigger required jobs:

/test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade
/test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade
/test 4.21-upgrade-from-stable-4.20-images
/test e2e-aws-ovn
/test e2e-aws-ovn-edge-zones
/test e2e-aws-ovn-hypershift
/test e2e-aws-ovn-local-gateway
/test e2e-aws-ovn-local-to-shared-gateway-mode-migration
/test e2e-aws-ovn-serial
/test e2e-aws-ovn-shared-to-local-gateway-mode-migration
/test e2e-aws-ovn-upgrade
/test e2e-aws-ovn-upgrade-local-gateway
/test e2e-aws-ovn-windows
/test e2e-azure-ovn-upgrade
/test e2e-gcp-ovn
/test e2e-gcp-ovn-techpreview
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-dualstack-bgp
/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
/test e2e-metal-ipi-ovn-ipv6
/test gofmt
/test images
/test lint
/test okd-scos-images
/test qe-perfscale-payload-control-plane-6nodes
/test unit

The following commands are available to trigger optional jobs:

/test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-ipsec
/test e2e-agent-compact-ipv4
/test e2e-aws-ovn-clusternetwork-cidr-expansion
/test e2e-aws-ovn-fdp-qe
/test e2e-aws-ovn-serial-ipsec
/test e2e-aws-ovn-single-node-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-upgrade-ipsec
/test e2e-azure-ovn
/test e2e-azure-ovn-techpreview
/test e2e-metal-ipi-ovn-bgp-virt-dualstack
/test e2e-metal-ipi-ovn-bgp-virt-dualstack-techpreview
/test e2e-metal-ipi-ovn-dualstack-local-gateway
/test e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-metal-ipi-ovn-ipv4
/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-metal-ipi-ovn-techpreview
/test e2e-openstack-ovn
/test e2e-ovn-hybrid-step-registry
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-techpreview
/test e2e-vsphere-windows
/test okd-scos-e2e-aws-ovn
/test openshift-e2e-gcp-ovn-techpreview-upgrade
/test ovncore-perfscale-aws-ovn-large-cluster-density-v2
/test ovncore-perfscale-aws-ovn-large-node-density-cni
/test ovncore-perfscale-aws-ovn-xlarge-cluster-density-v2
/test ovncore-perfscale-aws-ovn-xlarge-node-density-cni
/test perfscale-aws-ovn-medium-cluster-density-v2
/test perfscale-aws-ovn-medium-node-density-cni
/test perfscale-aws-ovn-small-cluster-density-v2
/test perfscale-aws-ovn-small-node-density-cni
/test qe-perfscale-aws-ovn-small-udn-density-churn-l3
/test qe-perfscale-aws-ovn-small-udn-density-l2
/test qe-perfscale-aws-ovn-small-udn-density-l3
/test security

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-ovn-kubernetes-master-4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade
pull-ci-openshift-ovn-kubernetes-master-4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade-ipsec
pull-ci-openshift-ovn-kubernetes-master-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade
pull-ci-openshift-ovn-kubernetes-master-4.21-upgrade-from-stable-4.20-images
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-edge-zones
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-hypershift
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-gateway
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-serial
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-shared-to-local-gateway-mode-migration
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade-local-gateway
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-windows
pull-ci-openshift-ovn-kubernetes-master-e2e-azure-ovn-upgrade
pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn
pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn-techpreview
pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack
pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp
pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp-local-gw
pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-ipv6
pull-ci-openshift-ovn-kubernetes-master-gofmt
pull-ci-openshift-ovn-kubernetes-master-images
pull-ci-openshift-ovn-kubernetes-master-lint
pull-ci-openshift-ovn-kubernetes-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-ovn-kubernetes-master-okd-scos-images
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-small-udn-density-churn-l3
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-small-udn-density-l3
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-payload-control-plane-6nodes
pull-ci-openshift-ovn-kubernetes-master-security
pull-ci-openshift-ovn-kubernetes-master-unit

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Oct 10, 2025

/test e2e-metal-ipi-ovn-bgp-virt-dualstack-techpreview
/test e2e-metal-ipi-ovn-bgp-virt-dualstack

npinaeva and others added 19 commits October 13, 2025 17:20
Make sure it reserves already allocated ids on startup.

Signed-off-by: Nadia Pinaeva <[email protected]>
Add transit router info to use for layer2 interconnect.

Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
gateway: Remove old GW router to layer2 switch ports together with stale
routes, policies and NATs.
layer2_controller: Create an extra switch to transit router link with
MAC-only router port. Add fake join subnet IPs to the transit router
to switch port.

Signed-off-by: Nadia Pinaeva <[email protected]>
It is only triggered on restart now

Signed-off-by: Nadia Pinaeva <[email protected]>
"UDN pod to the same node nodeport service in different UDN network"
test used to work on Layer2 UDN for ipv6 because of the SNAT on the
GR. Now SNAT was moved to the transit router and works the same
way as Layer3 networks.

Signed-off-by: Nadia Pinaeva <[email protected]>
Previously default gateway for layer2 was on the GR, so we had to use
it's primary joinIP to evaluate expected MAC and LLA, now the default
gateway is on the transit router with the first subnet IP.

Signed-off-by: Nadia Pinaeva <[email protected]>
cni/NetNS is replaced with
github.com/containernetworking/plugins/pkg/ns/NetNS
node.ManagementPort was moved to its own package.

Signed-off-by: Nadia Pinaeva <[email protected]>
Fix unit tests for the introduced changes.

Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
Signed-off-by: Nadia Pinaeva <[email protected]>
After topology upgrade a new default gateway for layer2
VMs will be on the transit router, so we need to remove
previously learned MAC.

Co-authored-by: Enrique Llorente <[email protected]>
Signed-off-by: Nadia Pinaeva <[email protected]>
Add a transitSubnet field similar to joinSubnet to the NetConf,
but only set it for Primary Layer2 networks.
Set transit subnets for NADs

Signed-off-by: Nadia Pinaeva <[email protected]>
@kyrtapz
Copy link
Contributor Author

kyrtapz commented Oct 16, 2025

/test e2e-metal-ipi-ovn-bgp-virt-dualstack-techpreview
/test e2e-metal-ipi-ovn-bgp-virt-dualstack

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Oct 17, 2025

/retest

@kyrtapz
Copy link
Contributor Author

kyrtapz commented Oct 17, 2025

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
/test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 17, 2025

@kyrtapz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw a72a98e link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/security a72a98e link false /test security
ci/prow/4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade a72a98e link true /test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-hypershift a72a98e link true /test e2e-aws-ovn-hypershift
ci/prow/lint a72a98e link true /test lint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants