Skip to content

Commit 5e1790c

Browse files
committed
docs: 添加 Cilium eBPF L4 负载均衡方案文档
介绍如何在 ACP 4.2+ 集群中部署 Cilium CNI, 利用 eBPF 实现高性能四层负载均衡,支持源 IP 透传。
1 parent e8c159f commit 5e1790c

File tree

2 files changed

+561
-0
lines changed

2 files changed

+561
-0
lines changed
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
---
2+
id: KB260300001
3+
products:
4+
- Alauda Container Platform
5+
kind:
6+
- Solution
7+
sourceSHA: pending
8+
---
9+
10+
# High-Performance Container Networking with Cilium CNI and eBPF-based L4 Load Balancer (Source IP Preservation)
11+
12+
This document describes how to deploy Cilium CNI in a ACP 4.2+ cluster and leverage eBPF to implement high-performance Layer 4 load balancing with source IP preservation.
13+
14+
## Prerequisites
15+
16+
| Item | Requirement |
17+
|------|------|
18+
| ACP Version | 4.2+ |
19+
| Network Mode | Custom Mode |
20+
| Architecture | x86_64 / amd64 |
21+
22+
> **Note**: Cilium/eBPF requires Linux kernel 4.19+ (5.10+ recommended). The following operating systems are **NOT supported**:
23+
> - CentOS 7.x (kernel version 3.10.x)
24+
> - RHEL 7.x (kernel version 3.10.x - 4.18.x)
25+
>
26+
> Supported operating systems:
27+
> - Ubuntu 22.04
28+
> - RHEL 8.x
29+
> - Kylin V10-SP3
30+
> - openEuler 22.03
31+
32+
### Node Port Requirements
33+
34+
| Port | Component | Description |
35+
|------|------|------|
36+
| 4240 | cilium-agent | Health API |
37+
| 9962 | cilium-agent | Prometheus Metrics |
38+
| 9879 | cilium-agent | Envoy Metrics |
39+
| 9890 | cilium-agent | Agent Metrics |
40+
| 9963 | cilium-operator | Prometheus Metrics |
41+
| 9891 | cilium-operator | Operator Metrics |
42+
| 9234 | cilium-operator | Metrics |
43+
44+
### Kernel Configuration Requirements
45+
46+
Ensure the following kernel configurations are enabled on the nodes (can be checked via `grep` in `/boot/config-$(uname -r)`):
47+
48+
- `CONFIG_BPF=y` or `=m`
49+
- `CONFIG_BPF_SYSCALL=y` or `=m`
50+
- `CONFIG_NET_CLS_BPF=y` or `=m`
51+
- `CONFIG_BPF_JIT=y` or `=m`
52+
- `CONFIG_NET_SCH_INGRESS=y` or `=m`
53+
- `CONFIG_CRYPTO_USER_API_HASH=y` or `=m`
54+
55+
## ACP 4.x Cilium Deployment Steps
56+
57+
### Step 1: Create Cluster
58+
59+
On the cluster creation page, set **Network Mode** to **Custom** mode. Wait until the cluster reaches `EnsureWaitClusterModuleReady` status before deploying Cilium.
60+
61+
### Step 2: Install Cilium
62+
63+
1. Download the latest Cilium image package (v4.2.x) from the ACP marketplace
64+
65+
2. Upload to the platform using violet:
66+
67+
```bash
68+
export PLATFORM_URL=""
69+
export USERNAME=''
70+
export PASSWORD=''
71+
export CLUSTER_NAME=''
72+
73+
violet push cilium-v4.2.17.tgz --platform-address "$PLATFORM_URL" --platform-username "$USERNAME" --platform-password "$PASSWORD" --clusters "$CLUSTER_NAME"
74+
```
75+
76+
3. Create temporary RBAC configuration on the business cluster where Cilium will be installed (this RBAC permission is not configured before the cluster is successfully deployed):
77+
78+
Create temporary RBAC configuration file:
79+
80+
```bash
81+
cat > tmp.yaml << 'EOF'
82+
apiVersion: rbac.authorization.k8s.io/v1
83+
kind: ClusterRole
84+
metadata:
85+
name: cilium-clusterplugininstance-admin
86+
labels:
87+
app.kubernetes.io/name: cilium
88+
rules:
89+
- apiGroups: ["cluster.alauda.io"]
90+
resources: ["clusterplugininstances"]
91+
verbs: ["*"]
92+
---
93+
apiVersion: rbac.authorization.k8s.io/v1
94+
kind: ClusterRoleBinding
95+
metadata:
96+
name: cilium-admin-clusterplugininstance
97+
labels:
98+
app.kubernetes.io/name: cilium
99+
roleRef:
100+
apiGroup: rbac.authorization.k8s.io
101+
kind: ClusterRole
102+
name: cilium-clusterplugininstance-admin
103+
subjects:
104+
- apiGroup: rbac.authorization.k8s.io
105+
kind: User
106+
name: admin
107+
EOF
108+
```
109+
110+
Apply temporary RBAC configuration:
111+
112+
```bash
113+
kubectl apply -f tmp.yaml
114+
```
115+
116+
4. Navigate to **Administrator → Marketplace → Cluster Plugins** and install Cilium
117+
118+
5. After Cilium is successfully installed, delete the temporary RBAC configuration:
119+
120+
```bash
121+
kubectl delete -f tmp.yaml
122+
rm tmp.yaml
123+
```
124+
125+
## Create L4 Load Balancer with Source IP Preservation
126+
127+
Execute the following operations on the master node backend.
128+
129+
### Step 1: Remove kube-proxy and Clean Up Rules
130+
131+
1. Get the current kube-proxy image:
132+
133+
```bash
134+
kubectl get -n kube-system kube-proxy -oyaml | grep image
135+
```
136+
137+
2. Backup and delete the kube-proxy DaemonSet:
138+
139+
```bash
140+
kubectl -n kube-system get ds kube-proxy -oyaml > kube-proxy-backup.yaml
141+
142+
kubectl -n kube-system delete ds kube-proxy
143+
```
144+
145+
3. Create a BroadcastJob to clean up kube-proxy rules:
146+
147+
```yaml
148+
apiVersion: operator.alauda.io/v1alpha1
149+
kind: BroadcastJob
150+
metadata:
151+
name: kube-proxy-cleanup
152+
namespace: kube-system
153+
spec:
154+
completionPolicy:
155+
ttlSecondsAfterFinished: 300
156+
type: Always
157+
failurePolicy:
158+
type: FailFast
159+
template:
160+
metadata:
161+
labels:
162+
k8s-app: kube-proxy-cleanup
163+
spec:
164+
serviceAccountName: kube-proxy
165+
hostNetwork: true
166+
restartPolicy: Never
167+
nodeSelector:
168+
kubernetes.io/os: linux
169+
priorityClassName: system-node-critical
170+
tolerations:
171+
- operator: Exists
172+
containers:
173+
- name: kube-proxy-cleanup
174+
image: registry.alauda.cn:60070/tkestack/kube-proxy:v1.33.5 ## Replace with the kube-proxy image from Step 1
175+
imagePullPolicy: IfNotPresent
176+
command:
177+
- /bin/sh
178+
- -c
179+
- "/usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=$(NODE_NAME) --cleanup || true"
180+
env:
181+
- name: NODE_NAME
182+
valueFrom:
183+
fieldRef:
184+
apiVersion: v1
185+
fieldPath: spec.nodeName
186+
securityContext:
187+
privileged: true
188+
volumeMounts:
189+
- mountPath: /var/lib/kube-proxy
190+
name: kube-proxy
191+
- mountPath: /lib/modules
192+
name: lib-modules
193+
readOnly: true
194+
- mountPath: /run/xtables.lock
195+
name: xtables-lock
196+
volumes:
197+
- name: kube-proxy
198+
configMap:
199+
name: kube-proxy
200+
- name: lib-modules
201+
hostPath:
202+
path: /lib/modules
203+
type: ""
204+
- name: xtables-lock
205+
hostPath:
206+
path: /run/xtables.lock
207+
type: FileOrCreate
208+
```
209+
210+
Save as `kube-proxy-cleanup.yaml` and apply:
211+
212+
```bash
213+
kubectl apply -f kube-proxy-cleanup.yaml
214+
```
215+
216+
The BroadcastJob is configured with `ttlSecondsAfterFinished: 300` and will be automatically cleaned up within 5 minutes after completion.
217+
218+
### Step 2: Create Address Pool
219+
220+
> **VIP Address Requirement**: Cilium L2 Announcement implements IP failover through ARP broadcasting. Therefore, the VIP must be in the **same Layer 2 network** as the cluster nodes to ensure ARP requests can be properly broadcast and responded to.
221+
222+
Save as `lb-resources.yaml`:
223+
224+
```yaml
225+
apiVersion: cilium.io/v2alpha1
226+
kind: CiliumLoadBalancerIPPool
227+
metadata:
228+
name: lb-pool
229+
spec:
230+
blocks:
231+
- cidr: "192.168.132.192/32" # Replace with the actual VIP segment
232+
---
233+
apiVersion: cilium.io/v2alpha1
234+
kind: CiliumL2AnnouncementPolicy
235+
metadata:
236+
name: l2-policy
237+
spec:
238+
interfaces:
239+
- eth0 # Replace with the actual network interface name
240+
externalIPs: true
241+
loadBalancerIPs: true
242+
```
243+
244+
Apply the configuration:
245+
246+
```bash
247+
kubectl apply -f lb-resources.yaml
248+
```
249+
250+
### Step 3: Verification
251+
252+
Create a LoadBalancer Service to verify IP allocation and test connectivity.
253+
254+
**Verification 1: Check if LB Service has been assigned an IP**
255+
256+
```bash
257+
kubectl get svc -A
258+
```
259+
260+
Expected output example:
261+
262+
```text
263+
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
264+
cilium-123-1 test LoadBalancer 10.4.98.81 192.168.132.192 80:31447/TCP 35s
265+
```
266+
267+
**Verification 2: Check the leader node sending ARP requests**
268+
269+
```bash
270+
kubectl get leases -A | grep cilium
271+
```
272+
273+
Expected output example:
274+
275+
```text
276+
cpaas-system cilium-l2announce-cilium-123-1-test 192.168.141.196 24s
277+
```
278+
279+
**Verification 3: Test external access**
280+
281+
From an external client, access the LoadBalancer Service. Capturing packets inside the Pod should show the source IP as the client's IP, indicating successful source IP preservation.

0 commit comments

Comments
 (0)