-
Couldn't load subscription status.
- Fork 208
Description
Hi! It looks like the ingress uses the first port number encounteted for given Service port for all addresses. It may result in incorrect haproxy configuration if there are multiple endpoint slices with different port numbers and the same port name - for some servers incorrect port is used. Such configuration can be created by e.g. targeting Pods from multiple Deployments by one Service (I attached an example below). Such configuration is handled by k8s components so I think this may be regarded as a bug in ingress?
It looks like the information about different ports is lost in getEndpoints where only the Port from the first portEndpoints collected for given Service port is copied to PortEndpoints together with all addresses.
I briefly looked at the code and it looks like fix could be to change getEndpoints to return a list of 'tuples' (address, port), change RuntimeBackend.Endpoints to keep this list and then refactor all places to handle different ports. Unfortunately it looks like SyncBackendSrvs would have to be basically rewritten (it's where I gave up at the moment, I can try to prepare a PR if you like the idea) :/
WDYT?
With following manifests:
---
apiVersion: v1
kind: Pod
metadata:
name: test-1
labels:
app: test
spec:
containers:
- name: app
image: python:3.12
command:
- python3
- -m
- http.server
- "8081"
ports:
- containerPort: 8081
name: http
---
apiVersion: v1
kind: Pod
metadata:
name: test-2
labels:
app: test
spec:
containers:
- name: app
image: python:3.12
command:
- python3
- -m
- http.server
- "8082"
ports:
- containerPort: 8082
name: http
---
apiVersion: v1
kind: Service
metadata:
name: test
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: http
selector:
app: test
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: test
spec:
ingressClassName: default-ingress
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: test
port:
name: http
A following haproxy backend is generated:
backend default_svc_test_http
mode http
balance roundrobin
option forwardfor
no option abortonclose
timeout server 3600000
default-server check
server SRV_1 10.163.26.102:8082 enabled
server SRV_2 10.163.26.101:8082 enabled
...
The port from one Pod is used for both backend servers and one SRV is broken.
At the same time generated EndpointSlices are ok (two were created):
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: test-8s4dt
...
addressType: IPv4
endpoints:
- addresses:
- 10.163.26.101
targetRef:
kind: Pod
name: test-1
namespace: default
uid: da040989-401b-4a60-a79c-ffe7c0e4685c
...
ports:
- name: http
port: 8081
protocol: TCP
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: test-pqm26
...
addressType: IPv4
endpoints:
- addresses:
- 10.163.26.102
targetRef:
kind: Pod
name: test-2
namespace: ...
uid: 641c5141-9f2e-4804-b727-6e05a93bb7a3
...
ports:
- name: http
port: 8082
protocol: TCP
and kube-proxy iptables rules looks fine:
-A KUBE-SERVICES -d 10.163.28.111/32 -p tcp -m comment --comment "default/test:http cluster IP" -j KUBE-SVC-5EWQR67FSWUYMNBI
(...)
-A KUBE-SVC-5EWQR67FSWUYMNBI -m comment --comment "default/test:http -> 10.163.26.101:8081" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ZHOO3EYEF6EQWV7J
-A KUBE-SVC-5EWQR67FSWUYMNBI -m comment --comment "default/test:http -> 10.163.26.102:8082" -j KUBE-SEP-BFXD3WDMZJ76QJPQ
(...)
-A KUBE-SEP-ZHOO3EYEF6EQWV7J -p tcp -m comment --comment "default/test:http" -m tcp -j DNAT --to-destination 10.163.26.101:8081
-A KUBE-SEP-BFXD3WDMZJ76QJPQ -p tcp -m comment --comment "default/test:http" -m tcp -j DNAT --to-destination 10.163.26.102:8082