Skip to content

Conversation

@anuragthehatter
Copy link

TOMOFUMI-KONDO and others added 30 commits September 1, 2025 09:30
Update the comment to reflect the actual CIDR configuration.
The podNetwork value uses /16 as the top level range and /24 for
node assignment, not /14 and /23 as stated in the outdated comment.

Fixes #5538

Signed-off-by: TOMOFUMI-KONDO <[email protected]>
Adds check to ensure that the status on an EIP matches the IP we have
allocated in our cache. If it doesn't consider the status invalid and
attempt reallocation.

There is a race exposed by the unit test "should update invalid
assignments on duplicated node", where an EIP has 2 IPs assigned to
node1, and a bogus IP assigned to node2. The test expects that EIP
should rebalance the EIPs with one on node 1 and one on node 2. However
a race can happen where EIP node allocator cache becomes corrupt due to
lack of validation of status. This can happen:

1. EgressIP has node1: EIP1, EIP2
2. Test starts WatchEgressIP
3. EIP->Sync->Annotates 50000 mark on EIP
4. This triggers an update event with 50000 mark, node1: EIP1, EIP2
5. EIP initial add starts with original event of node1: EIP1, EIP2
6. EIP rebalances one EIP per node, patches EIP object, creates update
   event with 50000 mark, node1: EIP1, node2: EIP2
7. Next update event processed -> RACE happens, retry framework will
   grab "latest" object. If the informer was lagging, informer cache has
   object from step 4
8. EIP validation logic considers this a pass, because cache has the
   both nodes, each with 1 egress IP. No patch is done to the EIP
   object, but when validStatus is achieved the eNode cache is updated
9. enode cache is updated so now node1 has 2 allocations for EIP
10. Update event from 6 is processed. Validation state fails because
   node1 has 2 EIPs. During rebalancing it wont be able to find a
   suitable node for the EIP due to cache corruption, so nothing will
   be updated.

Signed-off-by: Tim Rozet <[email protected]>
Aligning it to what we do for primary nic EIPv6 addresses. NODAD is
required since it is possible for an EIPv6 to be configured in two nodes
at the same time during a short failover time window.

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
Fix incorrect comment for podNetwork in Helm values
Signed-off-by: Surya Seetharaman <[email protected]>
Signed-off-by: Surya Seetharaman <[email protected]>
Fixes: #5552

Signed-off-by: Tim Rozet <[email protected]>
OKEP-5552: Add support for dynamic UDN allocation per node
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 4dd3152)
This subnet is now also used for transit routers in the layer2 topology.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit f1b0ee0)
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit f3eff25)
Make sure it reserves already allocated ids on startup.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit b850172)
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 0eaf304)
Add transit router info to use for layer2 interconnect.

Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
(cherry picked from commit 45e46dd)
Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
(cherry picked from commit a673981)
Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
(cherry picked from commit 5af0cc6)
Signed-off-by: Nadia Pinaeva <[email protected]>
Co-authored-by: Enrique Llorente <[email protected]>
(cherry picked from commit ac0d6a5)
gateway: Remove old GW router to layer2 switch ports together with stale
routes, policies and NATs.
layer2_controller: Create an extra switch to transit router link with
MAC-only router port. Add fake join subnet IPs to the transit router
to switch port.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 7a84ce8)
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit b3507c4)
It is only triggered on restart now

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 28be45c)
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 1fe049c)
"UDN pod to the same node nodeport service in different UDN network"
test used to work on Layer2 UDN for ipv6 because of the SNAT on the
GR. Now SNAT was moved to the transit router and works the same
way as Layer3 networks.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit db32068)
Previously default gateway for layer2 was on the GR, so we had to use
it's primary joinIP to evaluate expected MAC and LLA, now the default
gateway is on the transit router with the first subnet IP.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 3258d62)
cni/NetNS is replaced with
github.com/containernetworking/plugins/pkg/ns/NetNS
node.ManagementPort was moved to its own package.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit b34196c)
Fix unit tests for the introduced changes.

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit cfd538e)
npinaeva and others added 4 commits October 2, 2025 18:05
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit f31d2d5)
Co-authored-by: Enrique Llorente <[email protected]>
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 4d02af3)
After topology upgrade a new default gateway for layer2
VMs will be on the transit router, so we need to remove
previously learned MAC.

Co-authored-by: Enrique Llorente <[email protected]>
Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit c94039c)
Add a transitSubnet field similar to joinSubnet to the NetConf,
but only set it for Primary Layer2 networks.
Set transit subnets for NADs

Signed-off-by: Nadia Pinaeva <[email protected]>
(cherry picked from commit 62bd1eb)
@openshift-ci openshift-ci bot requested review from jcaamano and kyrtapz October 2, 2025 22:10
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 2, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anuragthehatter
Once this PR has been reviewed and has the lgtm label, please assign knobunc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

@anuragthehatter: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-techpreview 4aecc87 link false /test e2e-azure-ovn-techpreview
ci/prow/e2e-openstack-ovn 4aecc87 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-upgrade-ipsec 4aecc87 link false /test e2e-aws-ovn-upgrade-ipsec
ci/prow/e2e-azure-ovn-upgrade 4aecc87 link true /test e2e-azure-ovn-upgrade
ci/prow/e2e-azure-ovn 4aecc87 link false /test e2e-azure-ovn
ci/prow/e2e-aws-ovn-hypershift-kubevirt 4aecc87 link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/qe-perfscale-aws-ovn-small-udn-density-churn-l3 4aecc87 link false /test qe-perfscale-aws-ovn-small-udn-density-churn-l3
ci/prow/e2e-aws-ovn-techpreview 4aecc87 link false /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview 4aecc87 link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/okd-scos-e2e-aws-ovn 4aecc87 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-single-node-techpreview 4aecc87 link false /test e2e-aws-ovn-single-node-techpreview
ci/prow/e2e-aws-ovn-edge-zones 4aecc87 link true /test e2e-aws-ovn-edge-zones
ci/prow/e2e-ovn-hybrid-step-registry 4aecc87 link false /test e2e-ovn-hybrid-step-registry
ci/prow/security 4aecc87 link false /test security
ci/prow/lint 4aecc87 link true /test lint
ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade 4aecc87 link true /test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kyrtapz
Copy link
Contributor

kyrtapz commented Oct 10, 2025

/retest
/payload 4.21 ci blocking
/payload 4.21 nightly blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

@kyrtapz: trigger 5 job(s) of type blocking for the ci release of OCP 4.21

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.21-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.21-periodics-e2e-aks
  • periodic-ci-openshift-hypershift-release-4.21-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8bef5270-a5bb-11f0-8446-ba0b23e853d9-0

trigger 13 job(s) of type blocking for the nightly release of OCP 4.21

  • periodic-ci-openshift-release-master-ci-4.21-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.21-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.21-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-serial-1of2
  • periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-serial-2of2
  • periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-3of3
  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8bef5270-a5bb-11f0-8446-ba0b23e853d9-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants