Skip to content

Conversation

@pperiyasamy
Copy link
Member

No description provided.

booxter and others added 30 commits September 30, 2025 18:03
It was not doing anything since 2020.

Signed-off-by: Ihar Hrachyshka <[email protected]>
Signed-off-by: Ihar Hrachyshka <[email protected]>
Assisted-By: Claude Code; claude-sonnet-4-20250514
Replace master with base branch to make it work on release branches.

Signed-off-by: Nadia Pinaeva <[email protected]>
Addresses incorrect DNAT rules with <proto>/0 target port when using
services with externalTrafficPolicy: Local and named ports.

The issue occurred when allocateLoadBalancerNodePorts was false and
services referenced pod named ports. The previous implementation
used svcPort.TargetPort.IntValue() which returns 0 for named ports,
causing invalid DNAT rules.

This refactoring introduces/uses structured endpoint types that
properly handle port mapping from endpoint slices, ensuring the
actual pod port numbers are used instead of attempting to convert
named ports to integers.

This change unifies endpoint processing logic by having both the
services controller and nodePortWatcher use the same
GetEndpointsForService function. This ensures consistent endpoint
resolution and port mapping behavior across all service-related
components, preventing divergence in logic and similar unnoticed
port handling issues in the future.

Signed-off-by: Andreas Karis <[email protected]>
Adds tests for loadBalancer services with named ports and
AllocateLoadBalancerNodePorts=False. Add new test cases in
Test_getEndpointsForService.

Signed-off-by: Andreas Karis <[email protected]>
E2E test "Allow connection to an external IP using a source port that
is equal to a node port" might flake if a service is already created
with the same nodePort number. Give it a chance to recover by selecting
a different port.

Signed-off-by: Andreas Karis <[email protected]>
Node-taints for too-small MTU were removed in #3004.
Taints for NoSchedule were removed in openshift#2459.
In general, it's not the CNI plugins responsibility to set node taints.
This is for the kubelet/container runtime to figure out.

Therefore, it's safe to remove this unused code since it won't
be required in future.

Signed-off-by: Dave Tucker <[email protected]>
While trying to reproduce flakes with these tests, this is the thing I
could reproduce easily. In the tests we add 20 target IPs to each
gateway, then we ping them to make sure they go to each gateway and get
resolved. However, for TCP/UDP tests, we only run a listener on one of
the target IPs. Then we would attempt to contact the listenter from a
source pod 20 times, and check that it hit both gateways.

In my testing, I can easily run these tests in a loop and see it fail,
due to all 20 of the attempts hashing to the same gateway, and never
hitting the other gateway. I bumped it to 50, and ran it all night and
do not see the issue anymore.

Not sure if this fixes all of the flakes we see with these tests, as the
logs have gone stale for other runs, but will consider this closed for
now and then if we see more flakes reopen it.

Closes: #4432

Signed-off-by: Tim Rozet <[email protected]>
unskip skip cases as bug is verified
When OVS is run as system service on node, the /run/openvswitch/ovs-vswitchd.pid is locked by
ovs-vswitchd with its PID in host process ID namespace:

```
$ lslocks  | grep ovs-vswitchd.pid
COMMAND             PID   TYPE SIZE MODE  M      START        END PATH
ovs-vswitchd       1615  POSIX     5B WRITE 0          0          0 /run/openvswitch/ovs-vswitchd.pid

$ stat -Lc '%d:%i %n' /run/openvswitch/ovs-vswitchd.pid
25:5398 /run/openvswitch/ovs-vswitchd.pid
```

In ovnkube-node Pod, if hostPID is false, the ovs-vswitchd's PID is not visible inside the Pod's process
ID namespace, so the file lock becomes invisible as well, that causes ovs-appctl fail to run:

```

$ ovs-appctl fdb/show br-int
2025-10-14T19:18:36Z|00001|daemon_unix|WARN|/var/run/openvswitch/ovs-vswitchd.pid: stale pidfile for pid 1615
 being deleted by pid 0
ovs-appctl: cannot read pidfile "/var/run/openvswitch/ovs-vswitchd.pid" (No such process)
command terminated with exit code 1

$ stat -Lc '%d:%i %n' /run/openvswitch/ovs-vswitchd.pid
25:5398 /run/openvswitch/ovs-vswitchd.pid
```

This change replaces RunOVSAppctl() with RunOvsVswitchdAppCtl(), which use
`-t /var/run/openvswitch/ovs-vswitchd.1234.ctl` option to skip reading pid file.

Signed-off-by: Lei Huang <[email protected]>
The external gatweay tests use default BFD timers, which in OVN is a
send frequency of every 1 second, with a max of 3 failures - or 3
seconds total. The tests would remove an external gateway, wait 3
seconds, and then send a packet from a pod client.

We notice in CI upstream sometimes this flakes on the first attempt and
causes the test case to fail. I cannot reproduce this locally, but we
can see that the math is wrong here. If the the external gateway was
deleted at the same time that a heart beat was sent and ack'ed by OVN,
then it would require almost 4 seconds to detect 3 more failures and
transition BFD down.

Therefore make the timeout a constant and bump it to 4 seconds.

Signed-off-by: Tim Rozet <[email protected]>
Get the latest changes from [1]. There are some improvements, but it
is supposed to work the same (if not better).

[1] ovn-kubernetes/kubernetes-traffic-flow-tests@ce924ee

Signed-off-by: Thomas Haller <[email protected]>
The test validates LoadBalancer services with:
- Named targetPorts (http/udp) instead of numeric ports
- AllocateLoadBalancerNodePorts=false configuration
- ExternalTrafficPolicy=Local behavior

Signed-off-by: Andreas Karis <[email protected]>
[th/tft-update] traffic-flow-tests: update to latest version of k8s-tft
RunOVSAppctl() doesn't work when ovs is run on host and hostPID is false
External Gateway E2E: Increase single target attempts
fix: --logfile-maxsize is in megabytes, not bytes
I accidentally removed the check in recent PR [1] which could have
performance consequences as checking agains other pods has a cost.
Reintroduce the check with a hopefully useful comment to prevent it form
happening again.

[1] ovn-kubernetes/ovn-kubernetes#5626

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
Enable ovn-ci workflow on release branches
OCPBUGS-59552: Referencing pod named ports within a service results in bad DNAT rules containing tcp/0 target port
fix: list allowed values for --platform-type option
When processing pods during an EgressIP status update, the controller used to stop
iterating as soon as it encountered a pod in Pending state (in my case, pod IPs are
not found when pod is in pending state with container creating status).
This caused any subsequent Running pods to be skipped, leaving their SNAT entries
unprogrammed on the egress node.

With this change, only Pending pods are skipped, while iteration continues for the
rest. This ensures that Running pods are properly processed and their SNAT entries
are programmed.

This change also skips pods that are unscheduled or use host networking.

Signed-off-by: Periyasamy Palanisamy <[email protected]>
[okep: layer2 router topology] Add clarification for joinIP routes.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2025

@jluhrsen: trigger 5 job(s) of type blocking for the ci release of OCP 4.20

  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aks
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/82955dc0-aea1-11f0-807e-8628fab62aec-0

trigger 13 job(s) of type blocking for the nightly release of OCP 4.20

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial-1of2
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-serial-2of2
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-3of3
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/82955dc0-aea1-11f0-807e-8628fab62aec-1

@jluhrsen
Copy link
Contributor

/retest

@jluhrsen
Copy link
Contributor

/retest
payload looking really good. just one job to re-try:
/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-3of3

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

@jluhrsen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b35ce7c0-aefb-11f0-88e8-89ab45ab2f04-0

@Meina-rh
Copy link

/verified by @Meina-rh

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 22, 2025
@openshift-ci-robot
Copy link
Contributor

@Meina-rh: This PR has been marked as verified by @Meina-rh.

In response to this:

/verified by @Meina-rh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcaamano
Copy link
Contributor

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

broken due to https://issues.redhat.com/browse/OCPBUGS-63027

@jcaamano
Copy link
Contributor

/override ci/prow/lint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

broken due to https://issues.redhat.com/browse/OCPBUGS-63027

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jcaamano
Copy link
Contributor

/retest

@pperiyasamy
Copy link
Member Author

/retest-required

4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade fails with below error. running it again.

error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:de2dcfaa5ae91f8a472a7d7b9d9d23da247654f0ddf954a7a21d358783ef0519" not found: manifest unknown: manifest unknown

@jluhrsen
Copy link
Contributor

/retest-required

4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade fails with below error. running it again.

error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:de2dcfaa5ae91f8a472a7d7b9d9d23da247654f0ddf954a7a21d358783ef0519" not found: manifest unknown: manifest unknown

same issue on retry again. will retest one more time:

/retest

@jluhrsen
Copy link
Contributor

/retest-required
4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade fails with below error. running it again.
error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:de2dcfaa5ae91f8a472a7d7b9d9d23da247654f0ddf954a7a21d358783ef0519" not found: manifest unknown: manifest unknown

same issue on retry again. will retest one more time:

/retest

this time the e2e got off the ground, but something weird with OAUTH failed. assuming it's not related to us, so sigh will retry again:

/retest

@jcaamano
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcaamano, pperiyasamy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 23, 2025
@jcaamano
Copy link
Contributor

/override ci/prow/lint
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw, ci/prow/lint

In response to this:

/override ci/prow/lint
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jcaamano
Copy link
Contributor

/override ci/prow/lint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/lint

In response to this:

/override ci/prow/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD a573f44 and 2 for PR HEAD fed1225 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 23, 2025

@pperiyasamy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security fed1225 link false /test security
ci/prow/okd-scos-e2e-aws-ovn fed1225 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jcaamano
Copy link
Contributor

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 7dd6e74 into openshift:master Oct 24, 2025
30 of 32 checks passed
@openshift-ci-robot
Copy link
Contributor

@pperiyasamy: Jira Issue Verification Checks: Jira Issue OCPBUGS-61865
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-61865 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Jira Issue Verification Checks: Jira Issue OCPBUGS-62636
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-62636 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Jira Issue Verification Checks: Jira Issue OCPBUGS-59552
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-59552 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.