OCPBUGS-63459: Branch Sync release-4.19 to release-4.18 [10-22-2025] #2825

openshift-pr-manager · 2025-10-22T19:39:19Z

Automated branch sync: release-4.19 to release-4.18.

Fixes regression from 1448d5a The previous commit dropped matching on in_port so that localnet ports would also use table 1. This allows reply packets from a localnet pod towards the shared OVN/LOCAL IP to be sent to the correct port. However, a regression was introduced where traffic coming from these localnet ports to any destination would be sent to table 1. Egress traffic from the localnet ports is not committed to conntrack, so by sending to table=1 via CT we were getting a miss. This is especially bad for hardware offload where a localnet port is being used as the Geneve encap port. In this case all geneve traffic misses in CT lookup and is not offloaded. Table 1 is intended to be for handling IP traffic destined to the shared Gateway IP/MAC that both the Host and OVN use. It is also used to handle reply traffic for Egress IP. To fix this problem, we can add dl_dst match criteria to this flow, ensuring that only traffic destined to the Host/OVN goes to table 1. Furthermore, after fixing this problem there still exists the issue that localnet -> host/OVN egress traffic will still enter table 1 and CT miss. Potentially this can be fixed with always committing egress traffic, but it might have performance penalty, so deferring that fix to a later date. Signed-off-by: Tim Rozet <[email protected]> (cherry picked from commit 318f8ce)

We did this for IPv4 in 1448d5a, but forgot about IPv6. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 66d8f14)

Add dl_dst=$breth0 to table=0, prio=50 for IPv6 We want to match in table=1 only conntrack'ed reply traffic whose next hop is either OVN or the host. As a consequence, localnet traffic whose next hop is an external router (and that might or might not be destined to OVN/host) should bypass table=1 and just hit the NORMAL flow in table=0. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit ef1aa99)

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 4ce92a9)

We already tested localnet -> host, let's also cover connections initiated from the host. The localnet uses IPs in the same subnet as the host network. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit a5029f8)

We have two non-InterConnect CI lanes for multihoming, while only one with IC enabled (and local gw). We need coverage with IC enabled for both gateway modes, so let's make an existing non-IC lane IC enabled, set it as dualstack and gateway=shared to have better coverage. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit bf6f9c1)

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 6de44ef)

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit c4cc25a)

This is needed because we will need to generate IPs from different subnets than just the host subnet. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit eb5f3c1)

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit f82e101)

The localnet is on a subnet different than the host subnet, the corresponding NAD is configured with a VLAN ID, the localnet pod uses an external router to communicate to cluster pods. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 69ec569)

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 51eae7a)

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit dea42b4)

In testing we saw how an invalid conntrack state would drop all echo requests after the first one. Let's send three pings in each test then. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit b004ed0)

Currently, we are force exiting with the trap before the background processes can end, container is removed and the orphaned processes end early causing our config to go into an unknown state because we dont end in an orderly manner. Wait until the pid file for ovnkube controller with node is removed which shows the process has completed. Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit 8b29419)

Prevent ovn-controller from sending stale GARP by adding drop flows on external bridge patch ports until ovnkube-controller synchronizes the southbound database - henceforth known as "drop flows". This addresses race conditions where ovn-controller processes outdated SB DB state before ovnkube-controller updates it, particularly affecting EIP SNAT configurations attached to logical router ports. Fixes: https://issues.redhat.com/browse/FDP-1537 ovnkube-controller controls the lifecycle of the drop flows. ovs / ovn-controller running is required to configure external bridge. Downstream, the external bridge maybe precreated and ovn-controller will use this. This fix considers three primary scenarios: node, container and pod restart. On Node restart means the ovs flows installed priotior to reboot on the node are cleared but the external bridge exists. Add the flows before ovnkube controller with node starts. The reason to add it here is that our gateway code depends on ovn-controller started and running... There is now a race here between ovn-controller starting (and garping) before we set this flow but I think the risk is low however it needs serious testing. The reason I did not naturally at the drop flows before ovn-controller started is because I have no way to detect if its a node reboot or pod reboot and i dont want to inject drop flows for simple ovn-controller container restart which could disrupt traffic. ovnkube-controller starts, we create a new gateway and apply flows the same flows in-order to ensure we always drop GARP when ovnkube controller hasn't sync. Remove the flows when ovnkube-controller has syncd. There is also a race here between ovnkube-controller removing the flows and ovn-controller GARPing with stale SB DB info. There is no easy way to detect what SB DB data ovn-controller has consumed. On Pod restart, we add the drop flows before exit. ovnkube-controller-with-node will also add it before it starts the go code. Container restart: - ovnkube-controller: adds flows upon start and exit - ovn-controller: no changes While the drop flows are set, OVN may not be able to resolve IPs it doesn't know about in its Logical Router pipelines generation. Following removal of the drop flows, OVN may resolve the IPs using GARP requests. OVN-Controller always sends out GARPs with op code 1 on startup. Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit 82fc3bf)

PR 5373 to drop the GARP flows didnt consider that we set the default network controller and later we set the gateway obj. In-between this period, ovnkube node may receive a stop signal and we do not guard against accessing the gateway if its not yet set. OVNKube controller may have sync'd before the gateway obj is set. There is nothing to reconcile if the gateway is not set. Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit e60220a)

Ensure ovn-controller has processed the SB DB updates before removing the GARP drop flows by utilizing the hv_cfg field in NB_Global [1] OVNKube controller increments the nb_cfg value post sync, which is copied to SB DB by northd. OVN-Controllers copy this nb_cfg value from SB DB and write it to their chassis_private tables nb_cfg field after they have processed the SB DB changes. Northd will then look at all the chassis_private tables nb_cfg value and set the NB DBs Nb_global hv_cfg value to the min integer found. Since IC currently only supports one node per zone, we can be sure ovn-controller is running locally and therefore its ok to block removing the drop GARP flows. [1] https://man7.org/linux/man-pages/man5/ovn-nb.5.html Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit 3b5da01)

OCPBUGS-61453: [4.20] allow default network -> localnet on the same node for any localnet subnet

Aligning it to what we do for primary nic EIPv6 addresses. NODAD is required since it is possible for an EIPv6 to be configured in two nodes at the same time during a short failover time window. Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit b828680)

[release-4.20] OCPBUGS-62273: Fix EgressIP stale GARP post reboot + pod restart

OCPBUGS-62913: Configure sec nic EIPv6 address with NODAD and maximum lifetime

…rom-4.20-10-16-2025

OCPBUGS-63234: [release-4.19] DownStream Merge Sync from 4.20 [10-16-2025]

…4.19-to-release-4.18-10-22-2025

openshift-pr-manager · 2025-10-22T19:39:20Z

/ok-to-test
/payload 4.18 ci blocking
/payload 4.18 nightly blocking

openshift-ci-robot · 2025-10-22T19:39:23Z

@openshift-pr-manager[bot]: This pull request explicitly references no jira issue.

In response to this:

Automated branch sync: release-4.19 to release-4.18.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-22T19:39:38Z

@openshift-pr-manager[bot]: trigger 4 job(s) of type blocking for the ci release of OCP 4.18

periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-upgrade
periodic-ci-openshift-hypershift-release-4.18-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ced74580-af7e-11f0-9c1c-4e5efcf8a0f3-0

trigger 10 job(s) of type blocking for the nightly release of OCP 4.18

periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview-serial
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-hypershift-release-4.18-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ced74580-af7e-11f0-9c1c-4e5efcf8a0f3-1

openshift-ci · 2025-10-22T19:40:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: openshift-pr-manager[bot]
Once this PR has been reviewed and has the lgtm label, please assign tssurya for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jcaamano · 2025-10-23T10:21:33Z

/payload-job periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-upgrade-ovn-single-node
/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-bm
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6

openshift-ci · 2025-10-23T10:21:37Z

@jcaamano: trigger 6 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0cfc1fe0-affa-11f0-9c10-bbabe6ad74f1-0

jcaamano · 2025-10-23T10:22:24Z

/override ci/prow/lint

openshift-ci · 2025-10-23T10:23:22Z

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jcaamano · 2025-10-23T10:23:31Z

/retest

jcaamano · 2025-10-23T10:25:33Z

/jira cherrypick OCPBUGS-63234

openshift-ci-robot · 2025-10-23T10:25:46Z

@jcaamano: Jira Issue OCPBUGS-63234 has been cloned as Jira Issue OCPBUGS-63459. Will retitle bug to link to clone.
/retitle OCPBUGS-63459: NO-JIRA: Branch Sync release-4.19 to release-4.18 [10-22-2025]

In response to this:

/jira cherrypick OCPBUGS-63234

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-23T10:25:59Z

@openshift-pr-manager[bot]: This pull request references Jira Issue OCPBUGS-63459, which is invalid:

expected dependent Jira Issue OCPBUGS-63234 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Automated branch sync: release-4.19 to release-4.18.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jcaamano · 2025-10-23T10:28:14Z

/retitle OCPBUGS-63459: Branch Sync release-4.19 to release-4.18 [10-22-2025]

openshift-ci · 2025-10-23T15:08:41Z

@openshift-pr-manager[bot]: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`1551a11`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/security	`1551a11`	link	false	`/test security`
ci/prow/lint	`1551a11`	link	true	`/test lint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jluhrsen · 2025-10-23T16:35:24Z

/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6

openshift-ci · 2025-10-23T16:35:29Z

@jluhrsen: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/470f88f0-b02e-11f0-9769-26097345b319-0

jluhrsen · 2025-10-23T16:38:58Z

/retest

jcaamano · 2025-10-24T09:21:45Z

/payload-job periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-upgrade-ovn-single-node
/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-bm
/payload-job periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6

openshift-ci · 2025-10-24T09:21:48Z

@jcaamano: trigger 6 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/dcd15f80-b0ba-11f0-9a3f-bf9ea5c58397-0

trozet and others added 25 commits September 9, 2025 16:32

Openflow: drop in_port from IPv6 dispatch OF rule at prio=50

79ef291

We did this for IPv4 in 1448d5a, but forgot about IPv6. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 66d8f14)

E2E localnet: remove double import of ginkgo

1bf08b9

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 4ce92a9)

E2E localnet: remove references to downstream bugs and stories

c317be2

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 6de44ef)

E2E localnet: specify that the localnet uses IPs from host subnet

878d540

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit c4cc25a)

E2E localnet: make IP request for localnet pod extensible

35faf85

This is needed because we will need to generate IPs from different subnets than just the host subnet. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit eb5f3c1)

E2E localnet: Fix requirement on number of schedulable nodes

2118ba6

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit f82e101)

E2E localnet: host network -> localnet on VLAN with external router

164d9f8

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit 51eae7a)

E2E localnet: localnet -> host network on VLAN with external router

63bb48f

Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit dea42b4)

E2E localnet: send three pings instead of just one

5611e7b

In testing we saw how an invalid conntrack state would drop all echo requests after the first one. Let's send three pings in each test then. Signed-off-by: Riccardo Ravaioli <[email protected]> (cherry picked from commit b004ed0)

Merge pull request #2751 from ricky-rav/OCPBUGS-59657_420

92bb5ee

OCPBUGS-61453: [4.20] allow default network -> localnet on the same node for any localnet subnet

Merge pull request #2767 from martinkennelly/420-garp

84cdb99

[release-4.20] OCPBUGS-62273: Fix EgressIP stale GARP post reboot + pod restart

Merge pull request #2797 from jcaamano/ocpbugs-56783-4.20

050ed2c

OCPBUGS-62913: Configure sec nic EIPv6 address with NODAD and maximum lifetime

Merge remote-tracking branch 'upstream/release-4.20' into 4.19-sync-f…

a20ecf4

…rom-4.20-10-16-2025

Merge pull request #2810 from jluhrsen/4.19-sync-from-4.20-10-16-2025

f573d53

OCPBUGS-63234: [release-4.19] DownStream Merge Sync from 4.20 [10-16-2025]

Merge remote-tracking branch 'origin/release-4.19' into sync-release-…

1551a11

…4.19-to-release-4.18-10-22-2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 22, 2025

openshift-ci bot requested a review from kyrtapz October 22, 2025 19:40

openshift-ci bot requested a review from tssurya October 22, 2025 19:40

openshift-ci bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 22, 2025

openshift-ci bot changed the title ~~NO-JIRA: Branch Sync release-4.19 to release-4.18 [10-22-2025]~~ OCPBUGS-63459: NO-JIRA: Branch Sync release-4.19 to release-4.18 [10-22-2025] Oct 23, 2025

openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 23, 2025

openshift-ci bot changed the title ~~OCPBUGS-63459: NO-JIRA: Branch Sync release-4.19 to release-4.18 [10-22-2025]~~ OCPBUGS-63459: Branch Sync release-4.19 to release-4.18 [10-22-2025] Oct 23, 2025

OCPBUGS-63459: Branch Sync release-4.19 to release-4.18 [10-22-2025] #2825

Are you sure you want to change the base?

OCPBUGS-63459: Branch Sync release-4.19 to release-4.18 [10-22-2025] #2825

Uh oh!

Conversation

openshift-pr-manager bot commented Oct 22, 2025

Uh oh!

openshift-pr-manager bot commented Oct 22, 2025

Uh oh!

openshift-ci-robot commented Oct 22, 2025

Uh oh!

openshift-ci bot commented Oct 22, 2025

Uh oh!

openshift-ci bot commented Oct 22, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

openshift-ci-robot commented Oct 23, 2025

Uh oh!

openshift-ci-robot commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

jluhrsen commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

jluhrsen commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 24, 2025

Uh oh!

openshift-ci bot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants