-
Notifications
You must be signed in to change notification settings - Fork 166
[DNM] L2 Router Upgrade #2776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[DNM] L2 Router Upgrade #2776
Conversation
Update the comment to reflect the actual CIDR configuration. The podNetwork value uses /16 as the top level range and /24 for node assignment, not /14 and /23 as stated in the outdated comment. Fixes #5538 Signed-off-by: TOMOFUMI-KONDO <[email protected]>
Adds check to ensure that the status on an EIP matches the IP we have allocated in our cache. If it doesn't consider the status invalid and attempt reallocation. There is a race exposed by the unit test "should update invalid assignments on duplicated node", where an EIP has 2 IPs assigned to node1, and a bogus IP assigned to node2. The test expects that EIP should rebalance the EIPs with one on node 1 and one on node 2. However a race can happen where EIP node allocator cache becomes corrupt due to lack of validation of status. This can happen: 1. EgressIP has node1: EIP1, EIP2 2. Test starts WatchEgressIP 3. EIP->Sync->Annotates 50000 mark on EIP 4. This triggers an update event with 50000 mark, node1: EIP1, EIP2 5. EIP initial add starts with original event of node1: EIP1, EIP2 6. EIP rebalances one EIP per node, patches EIP object, creates update event with 50000 mark, node1: EIP1, node2: EIP2 7. Next update event processed -> RACE happens, retry framework will grab "latest" object. If the informer was lagging, informer cache has object from step 4 8. EIP validation logic considers this a pass, because cache has the both nodes, each with 1 egress IP. No patch is done to the EIP object, but when validStatus is achieved the eNode cache is updated 9. enode cache is updated so now node1 has 2 allocations for EIP 10. Update event from 6 is processed. Validation state fails because node1 has 2 EIPs. During rebalancing it wont be able to find a suitable node for the EIP due to cache corruption, so nothing will be updated. Signed-off-by: Tim Rozet <[email protected]>
Aligning it to what we do for primary nic EIPv6 addresses. NODAD is required since it is possible for an EIPv6 to be configured in two nodes at the same time during a short failover time window. Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
Fix incorrect comment for podNetwork in Helm values
Signed-off-by: Surya Seetharaman <[email protected]>
Signed-off-by: Surya Seetharaman <[email protected]>
Add OKEP for `ovn-kubernetes-mcp` repo
Enhances EIP status validation
Fixes: #5552 Signed-off-by: Tim Rozet <[email protected]>
OKEP-5552: Add support for dynamic UDN allocation per node
Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 7dca930)
Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 4dd3152)
This subnet is now also used for transit routers in the layer2 topology. Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit f1b0ee0)
Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit f3eff25)
Make sure it reserves already allocated ids on startup. Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit b850172)
Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 0eaf304)
Add transit router info to use for layer2 interconnect. Signed-off-by: Nadia Pinaeva <[email protected]> Co-authored-by: Enrique Llorente <[email protected]> (cherry picked from commit 45e46dd)
Signed-off-by: Nadia Pinaeva <[email protected]> Co-authored-by: Enrique Llorente <[email protected]> (cherry picked from commit a673981)
Signed-off-by: Nadia Pinaeva <[email protected]> Co-authored-by: Enrique Llorente <[email protected]> (cherry picked from commit 5af0cc6)
Signed-off-by: Nadia Pinaeva <[email protected]> Co-authored-by: Enrique Llorente <[email protected]> (cherry picked from commit ac0d6a5)
gateway: Remove old GW router to layer2 switch ports together with stale routes, policies and NATs. layer2_controller: Create an extra switch to transit router link with MAC-only router port. Add fake join subnet IPs to the transit router to switch port. Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 7a84ce8)
Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit b3507c4)
It is only triggered on restart now Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 28be45c)
Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 1fe049c)
"UDN pod to the same node nodeport service in different UDN network" test used to work on Layer2 UDN for ipv6 because of the SNAT on the GR. Now SNAT was moved to the transit router and works the same way as Layer3 networks. Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit db32068)
Previously default gateway for layer2 was on the GR, so we had to use it's primary joinIP to evaluate expected MAC and LLA, now the default gateway is on the transit router with the first subnet IP. Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 3258d62)
cni/NetNS is replaced with github.com/containernetworking/plugins/pkg/ns/NetNS node.ManagementPort was moved to its own package. Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit b34196c)
Fix unit tests for the introduced changes. Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit cfd538e)
Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit f31d2d5)
Co-authored-by: Enrique Llorente <[email protected]> Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 4d02af3)
After topology upgrade a new default gateway for layer2 VMs will be on the transit router, so we need to remove previously learned MAC. Co-authored-by: Enrique Llorente <[email protected]> Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit c94039c)
Add a transitSubnet field similar to joinSubnet to the NetConf, but only set it for Primary Layer2 networks. Set transit subnets for NADs Signed-off-by: Nadia Pinaeva <[email protected]> (cherry picked from commit 62bd1eb)
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: anuragthehatter The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@anuragthehatter: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest |
|
@kyrtapz: trigger 5 job(s) of type blocking for the ci release of OCP 4.21
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8bef5270-a5bb-11f0-8446-ba0b23e853d9-0 trigger 13 job(s) of type blocking for the nightly release of OCP 4.21
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8bef5270-a5bb-11f0-8446-ba0b23e853d9-1 |
D/S test for ovn-kubernetes/ovn-kubernetes#5561