Skip to content

Commit 6afbc1d

Browse files
committed
add kube-proxy iptables performance optimization notes
1 parent fa72a2a commit 6afbc1d

File tree

1 file changed

+85
-0
lines changed

1 file changed

+85
-0
lines changed

content/en/docs/reference/networking/virtual-ips.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,91 @@ redirected to the backend without rewriting the client IP address.
111111
This same basic flow executes when traffic comes in through a node-port or
112112
through a load-balancer, though in those cases the client IP address does get altered.
113113

114+
#### Optimizing iptables mode performance
115+
116+
In large clusters (with tens of thousands of Pods and Services), the
117+
iptables mode of kube-proxy may take a long time to update the rules
118+
in the kernel when Services (or their EndpointSlices) change. You can adjust the syncing
119+
behavior of kube-proxy via options in the [`iptables` section](/docs/reference/config-api/kube-proxy-config.v1alpha1/#kubeproxy-config-k8s-io-v1alpha1-KubeProxyIPTablesConfiguration)
120+
of the
121+
kube-proxy [configuration file](/docs/reference/config-api/kube-proxy-config.v1alpha1/)
122+
(which you specify via `kube-proxy --config <path>`):
123+
124+
```yaml
125+
...
126+
iptables:
127+
minSyncPeriod: 1s
128+
syncPeriod: 30s
129+
...
130+
```
131+
132+
##### `minSyncPeriod`
133+
134+
The `minSyncPeriod` parameter sets the minimum duration between
135+
attempts to resynchronize iptables rules with the kernel. If it is
136+
`0s`, then kube-proxy will always immediately synchronize the rules
137+
every time any Service or Endpoint changes. This works fine in very
138+
small clusters, but it results in a lot of redundant work when lots of
139+
things change in a small time period. For example, if you have a
140+
Service backed by a Deployment with 100 pods, and you delete the
141+
Deployment, then with `minSyncPeriod: 0s`, kube-proxy would end up
142+
removing the Service's Endpoints from the iptables rules one by one,
143+
for a total of 100 updates. With a larger `minSyncPeriod`, multiple
144+
Pod deletion events would get aggregated together, so kube-proxy might
145+
instead end up making, say, 5 updates, each removing 20 endpoints,
146+
which will be much more efficient in terms of CPU, and result in the
147+
full set of changes being synchronized faster.
148+
149+
The larger the value of `minSyncPeriod`, the more work that can be
150+
aggregated, but the downside is that each individual change may end up
151+
waiting up to the full `minSyncPeriod` before being processed, meaning
152+
that the iptables rules spend more time being out-of-sync with the
153+
current apiserver state.
154+
155+
The default value of `1s` is a good compromise for small and medium
156+
clusters. In large clusters, it may be necessary to set it to a larger
157+
value. (Especially, if kube-proxy's
158+
`sync_proxy_rules_duration_seconds` metric indicates an average
159+
time much larger than 1 second, then bumping up `minSyncPeriod` may
160+
make updates more efficient.)
161+
162+
##### `syncPeriod`
163+
164+
The `syncPeriod` parameter controls a handful of synchronization
165+
operations that are not directly related to changes in individual
166+
Services and Endpoints. In particular, it controls how quickly
167+
kube-proxy notices if an external component has interfered with
168+
kube-proxy's iptables rules. In large clusters, kube-proxy also only
169+
performs certain cleanup operations once every `syncPeriod` to avoid
170+
unnecessary work.
171+
172+
For the most part, increasing `syncPeriod` is not expected to have much
173+
impact on performance, but in the past, it was sometimes useful to set
174+
it to a very large value (eg, `1h`). This is no longer recommended,
175+
and is likely to hurt functionality more than it improves performance.
176+
177+
##### Experimental performance improvements {#minimize-iptables-restore}
178+
179+
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
180+
181+
In Kubernetes 1.26, some new performance improvements were made to the
182+
iptables proxy mode, but they are not enabled by default (and should
183+
probably not be enabled in production clusters yet). To try them out,
184+
enable the `MinimizeIPTablesRestore` [feature
185+
gate](/docs/reference/command-line-tools-reference/feature-gates/) for
186+
kube-proxy with `--feature-gates=MinimizeIPTablesRestore=true,…`.
187+
188+
If you enable that feature gate and you were previously overriding
189+
`minSyncPeriod`, you should try removing that override and letting
190+
kube-proxy use the default value (`1s`) or at least a smaller value
191+
than you were using before.
192+
193+
If you notice kube-proxy's
194+
`sync_proxy_rules_iptables_restore_failures_total` or
195+
`sync_proxy_rules_iptables_partial_restore_failures_total` metrics
196+
increasing after enabling this feature, that likely indicates you are
197+
encountering bugs in the feature and you should file a bug report.
198+
114199
### IPVS proxy mode {#proxy-mode-ipvs}
115200

116201
In `ipvs` mode, kube-proxy watches Kubernetes Services and EndpointSlices,

0 commit comments

Comments
 (0)