OCPBUGS-61865, OCPBUGS-62636, OCPBUGS-59552: DownStream Merge [10-19-2025] #2817

pperiyasamy · 2025-10-20T09:07:57Z

No description provided.

It was not doing anything since 2020. Signed-off-by: Ihar Hrachyshka <[email protected]>

Signed-off-by: Ihar Hrachyshka <[email protected]>

Signed-off-by: Ihar Hrachyshka <[email protected]> Assisted-By: Claude Code; claude-sonnet-4-20250514

Replace master with base branch to make it work on release branches. Signed-off-by: Nadia Pinaeva <[email protected]>

Signed-off-by: zhaozhanqi <[email protected]>

…tack cluster Signed-off-by: zhaozhanqi <[email protected]>

Addresses incorrect DNAT rules with <proto>/0 target port when using services with externalTrafficPolicy: Local and named ports. The issue occurred when allocateLoadBalancerNodePorts was false and services referenced pod named ports. The previous implementation used svcPort.TargetPort.IntValue() which returns 0 for named ports, causing invalid DNAT rules. This refactoring introduces/uses structured endpoint types that properly handle port mapping from endpoint slices, ensuring the actual pod port numbers are used instead of attempting to convert named ports to integers. This change unifies endpoint processing logic by having both the services controller and nodePortWatcher use the same GetEndpointsForService function. This ensures consistent endpoint resolution and port mapping behavior across all service-related components, preventing divergence in logic and similar unnoticed port handling issues in the future. Signed-off-by: Andreas Karis <[email protected]>

Adds tests for loadBalancer services with named ports and AllocateLoadBalancerNodePorts=False. Add new test cases in Test_getEndpointsForService. Signed-off-by: Andreas Karis <[email protected]>

Signed-off-by: Andreas Karis <[email protected]>

E2E test "Allow connection to an external IP using a source port that is equal to a node port" might flake if a service is already created with the same nodePort number. Give it a chance to recover by selecting a different port. Signed-off-by: Andreas Karis <[email protected]>

Node-taints for too-small MTU were removed in #3004. Taints for NoSchedule were removed in openshift#2459. In general, it's not the CNI plugins responsibility to set node taints. This is for the kubelet/container runtime to figure out. Therefore, it's safe to remove this unused code since it won't be required in future. Signed-off-by: Dave Tucker <[email protected]>

While trying to reproduce flakes with these tests, this is the thing I could reproduce easily. In the tests we add 20 target IPs to each gateway, then we ping them to make sure they go to each gateway and get resolved. However, for TCP/UDP tests, we only run a listener on one of the target IPs. Then we would attempt to contact the listenter from a source pod 20 times, and check that it hit both gateways. In my testing, I can easily run these tests in a loop and see it fail, due to all 20 of the attempts hashing to the same gateway, and never hitting the other gateway. I bumped it to 50, and ran it all night and do not see the issue anymore. Not sure if this fixes all of the flakes we see with these tests, as the logs have gone stale for other runs, but will consider this closed for now and then if we see more flakes reopen it. Closes: #4432 Signed-off-by: Tim Rozet <[email protected]>

unskip skip cases as bug is verified

When OVS is run as system service on node, the /run/openvswitch/ovs-vswitchd.pid is locked by ovs-vswitchd with its PID in host process ID namespace: ``` $ lslocks | grep ovs-vswitchd.pid COMMAND PID TYPE SIZE MODE M START END PATH ovs-vswitchd 1615 POSIX 5B WRITE 0 0 0 /run/openvswitch/ovs-vswitchd.pid $ stat -Lc '%d:%i %n' /run/openvswitch/ovs-vswitchd.pid 25:5398 /run/openvswitch/ovs-vswitchd.pid ``` In ovnkube-node Pod, if hostPID is false, the ovs-vswitchd's PID is not visible inside the Pod's process ID namespace, so the file lock becomes invisible as well, that causes ovs-appctl fail to run: ``` $ ovs-appctl fdb/show br-int 2025-10-14T19:18:36Z|00001|daemon_unix|WARN|/var/run/openvswitch/ovs-vswitchd.pid: stale pidfile for pid 1615 being deleted by pid 0 ovs-appctl: cannot read pidfile "/var/run/openvswitch/ovs-vswitchd.pid" (No such process) command terminated with exit code 1 $ stat -Lc '%d:%i %n' /run/openvswitch/ovs-vswitchd.pid 25:5398 /run/openvswitch/ovs-vswitchd.pid ``` This change replaces RunOVSAppctl() with RunOvsVswitchdAppCtl(), which use `-t /var/run/openvswitch/ovs-vswitchd.1234.ctl` option to skip reading pid file. Signed-off-by: Lei Huang <[email protected]>

The external gatweay tests use default BFD timers, which in OVN is a send frequency of every 1 second, with a max of 3 failures - or 3 seconds total. The tests would remove an external gateway, wait 3 seconds, and then send a packet from a pod client. We notice in CI upstream sometimes this flakes on the first attempt and causes the test case to fail. I cannot reproduce this locally, but we can see that the math is wrong here. If the the external gateway was deleted at the same time that a heart beat was sent and ack'ed by OVN, then it would require almost 4 seconds to detect 3 more failures and transition BFD down. Therefore make the timeout a constant and bump it to 4 seconds. Signed-off-by: Tim Rozet <[email protected]>

Get the latest changes from [1]. There are some improvements, but it is supposed to work the same (if not better). [1] ovn-kubernetes/kubernetes-traffic-flow-tests@ce924ee Signed-off-by: Thomas Haller <[email protected]>

The test validates LoadBalancer services with: - Named targetPorts (http/udp) instead of numeric ports - AllocateLoadBalancerNodePorts=false configuration - ExternalTrafficPolicy=Local behavior Signed-off-by: Andreas Karis <[email protected]>

[th/tft-update] traffic-flow-tests: update to latest version of k8s-tft

RunOVSAppctl() doesn't work when ovs is run on host and hostPID is false

External Gateway E2E: Increase single target attempts

fix: --logfile-maxsize is in megabytes, not bytes

chore: Remove SetTaintOnNode

chore: Remove --pod-ip option

I accidentally removed the check in recent PR [1] which could have performance consequences as checking agains other pods has a cost. Reintroduce the check with a hopefully useful comment to prevent it form happening again. [1] ovn-kubernetes/ovn-kubernetes#5626 Signed-off-by: Jaime Caamaño Ruiz <[email protected]>

Enable ovn-ci workflow on release branches

OCPBUGS-59552: Referencing pod named ports within a service results in bad DNAT rules containing tcp/0 target port

fix: list allowed values for --platform-type option

When processing pods during an EgressIP status update, the controller used to stop iterating as soon as it encountered a pod in Pending state (in my case, pod IPs are not found when pod is in pending state with container creating status). This caused any subsequent Running pods to be skipped, leaving their SNAT entries unprogrammed on the egress node. With this change, only Pending pods are skipped, while iteration continues for the rest. This ensures that Running pods are properly processed and their SNAT entries are programmed. This change also skips pods that are unscheduled or use host networking. Signed-off-by: Periyasamy Palanisamy <[email protected]>

Signed-off-by: Nadia Pinaeva <[email protected]>

[okep: layer2 router topology] Add clarification for joinIP routes.

openshift-ci · 2025-10-21T17:15:22Z

@jluhrsen: trigger 5 job(s) of type blocking for the ci release of OCP 4.20

periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-upgrade
periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aks
periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/82955dc0-aea1-11f0-807e-8628fab62aec-0

trigger 13 job(s) of type blocking for the nightly release of OCP 4.20

periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-upgrade-fips
periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial-1of2
periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial-2of2
periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-1of3
periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-2of3
periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-3of3
periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/82955dc0-aea1-11f0-807e-8628fab62aec-1

jluhrsen · 2025-10-21T20:30:46Z

/retest

jluhrsen · 2025-10-22T04:00:51Z

/retest
payload looking really good. just one job to re-try:
/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-3of3

openshift-ci · 2025-10-22T04:00:54Z

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b35ce7c0-aefb-11f0-88e8-89ab45ab2f04-0

Meina-rh · 2025-10-22T06:23:47Z

/verified by @Meina-rh

openshift-ci-robot · 2025-10-22T06:24:00Z

@Meina-rh: This PR has been marked as verified by @Meina-rh.

In response to this:

/verified by @Meina-rh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jcaamano · 2025-10-22T08:58:31Z

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

broken due to https://issues.redhat.com/browse/OCPBUGS-63027

jcaamano · 2025-10-22T08:58:43Z

/override ci/prow/lint

openshift-ci · 2025-10-22T08:58:52Z

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

broken due to https://issues.redhat.com/browse/OCPBUGS-63027

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-10-22T08:59:05Z

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jcaamano · 2025-10-22T08:59:14Z

/retest

pperiyasamy · 2025-10-22T15:57:03Z

/retest-required

4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade fails with below error. running it again.

error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:de2dcfaa5ae91f8a472a7d7b9d9d23da247654f0ddf954a7a21d358783ef0519" not found: manifest unknown: manifest unknown

jluhrsen · 2025-10-22T20:45:49Z

/retest-required

4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade fails with below error. running it again.

error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:de2dcfaa5ae91f8a472a7d7b9d9d23da247654f0ddf954a7a21d358783ef0519" not found: manifest unknown: manifest unknown

same issue on retry again. will retest one more time:

/retest

jluhrsen · 2025-10-23T06:16:35Z

/retest-required
4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade fails with below error. running it again.
error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:de2dcfaa5ae91f8a472a7d7b9d9d23da247654f0ddf954a7a21d358783ef0519" not found: manifest unknown: manifest unknown

same issue on retry again. will retest one more time:

/retest

this time the e2e got off the ground, but something weird with OAUTH failed. assuming it's not related to us, so sigh will retry again:

/retest

jcaamano · 2025-10-23T16:38:10Z

/lgtm
/approve

openshift-ci · 2025-10-23T16:38:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcaamano, pperiyasamy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jcaamano]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jcaamano · 2025-10-23T16:46:19Z

/override ci/prow/lint
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

openshift-ci · 2025-10-23T16:47:34Z

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw, ci/prow/lint

In response to this:

/override ci/prow/lint
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jcaamano · 2025-10-23T16:59:25Z

/override ci/prow/lint

openshift-ci · 2025-10-23T16:59:41Z

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot · 2025-10-23T18:07:00Z

/retest-required

Remaining retests: 0 against base HEAD a573f44 and 2 for PR HEAD fed1225 in total

openshift-ci · 2025-10-23T21:17:43Z

@pperiyasamy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/security	`fed1225`	link	false	`/test security`
ci/prow/okd-scos-e2e-aws-ovn	`fed1225`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jcaamano · 2025-10-24T08:29:10Z

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

openshift-ci · 2025-10-24T08:30:10Z

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot · 2025-10-24T08:34:24Z

@pperiyasamy: Jira Issue Verification Checks: Jira Issue OCPBUGS-61865
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-61865 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Jira Issue Verification Checks: Jira Issue OCPBUGS-62636
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-62636 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Jira Issue Verification Checks: Jira Issue OCPBUGS-59552
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-59552 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

booxter and others added 30 commits September 30, 2025 18:03

chore: Remove --pod-ip option

d385030

It was not doing anything since 2020. Signed-off-by: Ihar Hrachyshka <[email protected]>

fix: --logfile-maxsize is in megabytes, not bytes

a9d76d6

Signed-off-by: Ihar Hrachyshka <[email protected]>

fix: list allowed values for --platform-type option

d4136cc

Signed-off-by: Ihar Hrachyshka <[email protected]> Assisted-By: Claude Code; claude-sonnet-4-20250514

Enable ovn-ci workflow on release branches

317bdd6

Replace master with base branch to make it work on release branches. Signed-off-by: Nadia Pinaeva <[email protected]>

unskip cases as bug is verified

65e44a7

Signed-off-by: zhaozhanqi <[email protected]>

The expect expectedEndpointsNum should be 2 times endpoints for duals…

cde88f2

…tack cluster Signed-off-by: zhaozhanqi <[email protected]>

Extends unit test coverage for named port handling in ETP local

282b01e

Adds tests for loadBalancer services with named ports and AllocateLoadBalancerNodePorts=False. Add new test cases in Test_getEndpointsForService. Signed-off-by: Andreas Karis <[email protected]>

E2E service: Move checkNumberOf.. to util.go

651759c

Signed-off-by: Andreas Karis <[email protected]>

Merge pull request #5647 from zhaozhanqi/remove-skip

0311aa3

unskip skip cases as bug is verified

traffic-flow-tests: update to latest version of k8s-tft

8e7b2c6

Get the latest changes from [1]. There are some improvements, but it is supposed to work the same (if not better). [1] ovn-kubernetes/kubernetes-traffic-flow-tests@ce924ee Signed-off-by: Thomas Haller <[email protected]>

Merge pull request #5131 from thom311/th/tft-update

2de612a

[th/tft-update] traffic-flow-tests: update to latest version of k8s-tft

Merge pull request #5652 from l8huang/ovs-appctl

a23f664

RunOVSAppctl() doesn't work when ovs is run on host and hostPID is false

Merge pull request #5656 from trozet/fix_exgw_tests

9bb2f6f

External Gateway E2E: Increase single target attempts

Merge pull request #5615 from booxter/logfile-maxsize-mb

9cfeb54

fix: --logfile-maxsize is in megabytes, not bytes

Merge pull request #5655 from dave-tucker/remove-taint

14d8e50

chore: Remove SetTaintOnNode

Merge pull request #5613 from booxter/remove-pod-ip

04a98b6

chore: Remove --pod-ip option

Merge pull request #5637 from npinaeva/release-ci

1bb538c

Enable ovn-ci workflow on release branches

Merge pull request #5583 from andreaskaris/port-0-issue

36078b7

OCPBUGS-59552: Referencing pod named ports within a service results in bad DNAT rules containing tcp/0 target port

Merge pull request #5621 from booxter/construct-platform-types

384408a

fix: list allowed values for --platform-type option

[okep: layer2 router topology] Add clarification for joinIP routes.

c74e67e

Signed-off-by: Nadia Pinaeva <[email protected]>

Merge pull request #5665 from npinaeva/l2-okep-upd

f077fdd

[okep: layer2 router topology] Add clarification for joinIP routes.

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 22, 2025

openshift-ci bot assigned jcaamano Oct 23, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2025

openshift-merge-bot bot merged commit 7dd6e74 into openshift:master Oct 24, 2025
30 of 32 checks passed

OCPBUGS-61865, OCPBUGS-62636, OCPBUGS-59552: DownStream Merge [10-19-2025] #2817

OCPBUGS-61865, OCPBUGS-62636, OCPBUGS-59552: DownStream Merge [10-19-2025] #2817

Conversation

pperiyasamy commented Oct 20, 2025

Uh oh!

openshift-ci bot commented Oct 21, 2025

Uh oh!

jluhrsen commented Oct 21, 2025

Uh oh!

jluhrsen commented Oct 22, 2025

Uh oh!

openshift-ci bot commented Oct 22, 2025

Uh oh!

Meina-rh commented Oct 22, 2025

Uh oh!

openshift-ci-robot commented Oct 22, 2025

Uh oh!

jcaamano commented Oct 22, 2025

Uh oh!

jcaamano commented Oct 22, 2025

Uh oh!

openshift-ci bot commented Oct 22, 2025

Uh oh!

openshift-ci bot commented Oct 22, 2025

Uh oh!

jcaamano commented Oct 22, 2025

Uh oh!

pperiyasamy commented Oct 22, 2025

Uh oh!

jluhrsen commented Oct 22, 2025

Uh oh!

jluhrsen commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

openshift-ci-robot commented Oct 23, 2025

Uh oh!

openshift-ci bot commented Oct 23, 2025

Uh oh!

jcaamano commented Oct 24, 2025

Uh oh!

openshift-ci bot commented Oct 24, 2025

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants