-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Summary
Running Cilium in kube-proxy-free (KPR) mode causes a small number of upstream SIG-Network and KaaS NetworkPolicy conformance tests to fail, despite the fact that the cluster’s user-visible networking works correctly. This is not unique to us: multiple vendors and upstream projects have had to skip or adjust these tests when using Cilium’s eBPF datapath instead of kube-proxy. I propose that SCS adopt a CNI-aware conformance policy, certifying a list of known-good CNIs (including Cilium KPR) and excluding a narrow, documented set of tests that are not portable across datapaths, or clearly marking them as optional — mirroring how the Cilium project itself handles certain upstream tests.
See: cilium/cilium#29524
Failures
The tests that fail are those tied to kube-proxy’s implementation details (e.g. checking kube-proxy’s own metrics endpoint or making timing-sensitive assumptions about Linux conntrack). These details do not affect everyday application connectivity or isolation.
Note: With KPR, Cilium replaces kube‑proxy and programs the kernel with eBPF. That changes how traffic is steered internally, not what users see from Kubernetes networking.
Examples of non‑portable tests
- kube-proxy metrics/health endpoints: Tests that call the kube-proxy /metrics or health endpoints will fail under KPR because kube-proxy isn't running. Cilium provides these functions differently.
- NodePort session affinity 'flip' tests: These tests rapidly toggle session affinity. Kube-proxy uses iptables, whereas Cilium tracks affinity in BPF maps and uses NodePort modes with different reply paths (e.g. SNAT and DSR). The semantics that users care about remain correct, but timing-specific expectations differ.
- HostPort conflict predicate tests: In KPR, HostPort is implemented by Cilium’s datapath scheduler/runtime, and the behaviour of host socket binding can differ without affecting HostPort functionality.
- SCTP NetworkPolicy tests: SCTP is optional and must be explicitly enabled in Cilium. If it is not enabled cluster-wide, SCTP tests should be skipped by design.
See:
- https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#limitations
- KubeProxy metric tests fails with cilium + kubeproxy replacement kubernetes/kubernetes#126903
Common Issue in the Ecosystem
- Cilium upstream skips specific conformance tests while fixes are implemented
- Vendors report identical failures when using Cilium KPR (e.g. Giant Swarm tracks CNCF test failures attributable to Cilium behaviour).
- Conformance submissions adapt accordingly (e.g. the EKS Anywhere K8s 1.29 conformance PR, with discussions and adjustments around network conformance details).
Proposal (CNI‑aware conformance)
- Publish an 'Approved CNI' list for SCS, including Cilium KPR.
- Maintain a Skip/Optional list for each CNI, with a reason for each one and upstream reference. Initial list for Cilium KPR:
- NodePort session-affinity flip tests (timing/conntrack assumptions).
- HostPort conflict predicate tests (portmap versus datapath implementation).
- SCTP-specific tests, unless SCTP is enabled in the product profile.
- kube-proxy metric/health endpoint tests (component absent under KPR).
- Document that user-visible semantics remain in scope, including pod-to-pod, DNS, ClusterIP/NodePort/LB reachability and L3/L4 NetworkPolicy.