Skip to content

Commit 777c6e3

Browse files
authored
Merge pull request #46541 from danwinship/nftables-beta
Update nftables kube-proxy docs for 1.31 beta
2 parents ccaaebf + e6de84d commit 777c6e3

File tree

2 files changed

+67
-8
lines changed

2 files changed

+67
-8
lines changed

content/en/docs/reference/command-line-tools-reference/feature-gates/nftables-proxy-mode.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,9 @@ stages:
99
- stage: alpha
1010
defaultValue: false
1111
fromVersion: "1.29"
12+
toVersion: "1.30"
13+
- stage: beta
14+
defaultValue: true
15+
fromVersion: "1.31"
1216
---
13-
Allow running kube-proxy with in [nftables mode](/docs/reference/networking/virtual-ips/#proxy-mode-nftables).
17+
Allow running kube-proxy in [nftables mode](/docs/reference/networking/virtual-ips/#proxy-mode-nftables).

content/en/docs/reference/networking/virtual-ips.md

Lines changed: 62 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -262,20 +262,75 @@ exits with an error.
262262

263263
### `nftables` proxy mode {#proxy-mode-nftables}
264264

265-
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
265+
{{< feature-state feature_gate_name="NFTablesProxyMode" >}}
266266

267-
_This proxy mode is only available on Linux nodes._
267+
_This proxy mode is only available on Linux nodes, and requires kernel
268+
5.13 or later._
268269

269270
In this mode, kube-proxy configures packet forwarding rules using the
270271
nftables API of the kernel netfilter subsystem. For each endpoint, it
271272
installs nftables rules which, by default, select a backend Pod at
272273
random.
273274

274-
The nftables API is the successor to the iptables API, and although it
275-
is designed to provide better performance and scalability than
276-
iptables, the kube-proxy nftables mode is still under heavy
277-
development as of {{< skew currentVersion >}} and is not necessarily
278-
expected to outperform the other Linux modes at this time.
275+
The nftables API is the successor to the iptables API and is designed
276+
to provide better performance and scalability than iptables. The
277+
`nftables` proxy mode is able to process changes to service endpoints
278+
faster and more efficiently than the `iptables` mode, and is also able
279+
to more efficiently process packets in the kernel (though this only
280+
becomes noticeable in clusters with tens of thousands of services).
281+
282+
As of Kubernetes {{< skew currentVersion >}}, the `nftables` mode is
283+
still relatively new, and may not be compatible with all network
284+
plugins; consult the documentation for your network plugin.
285+
286+
#### Migrating from `iptables` mode to `nftables`
287+
288+
Users who want to switch from the default `iptables` mode to the
289+
`nftables` mode should be aware that some features work slightly
290+
differently the `nftables` mode:
291+
292+
- **NodePort interfaces**: In `iptables` mode, by default,
293+
[NodePort services](/docs/concepts/services-networking/service/#type-nodeport)
294+
are reachable on all local IP addresses. This is usually not what
295+
users want, so the `nftables` mode defaults to
296+
`--nodeport-addresses primary`, meaning NodePort services are only
297+
reachable on the node's primary IPv4 and/or IPv6 addresses. You can
298+
override this by specifying an explicit value for that option:
299+
e.g., `--nodeport-addresses 0.0.0.0/0` to listen on all (local)
300+
IPv4 IPs.
301+
302+
- **NodePort services on `127.0.0.1`**: In `iptables` mode, if the
303+
`--nodeport-addresses` range includes `127.0.0.1` (and the option
304+
`--iptables-localhost-nodeports false` option is not passed), then
305+
NodePort services are reachable even on "localhost" (`127.0.0.1`).
306+
In `nftables` mode (and `ipvs` mode), this will not work. If you
307+
are not sure if you are depending on this functionality, you can
308+
check kube-proxy's
309+
`iptables_localhost_nodeports_accepted_packets_total` metric; if it
310+
is non-0, that means that some client has connected to a NodePort
311+
service via `127.0.0.1`.
312+
313+
- **NodePort interaction with firewalls**: The `iptables` mode of
314+
kube-proxy tries to be compatible with overly-agressive firewalls;
315+
for each NodePort service, it will add rules to accept inbound
316+
traffic on that port, in case that traffic would otherwise be
317+
blocked by a firewall. This approach will not work with firewalls
318+
based on nftables, so kube-proxy's `nftables` mode does not do
319+
anything here; if you have a local firewall, you must ensure that
320+
it is properly configured to allow Kubernetes traffic through
321+
(e.g., by allowing inbound traffic on the entire NodePort range).
322+
323+
- **Conntrack bug workarounds**: Linux kernels prior to 6.1 have a
324+
bug that can result in long-lived TCP connections to service IPs
325+
being closed with the error "Connection reset by peer". The
326+
`iptables` mode of kube-proxy installs a workaround for this bug,
327+
but this workaround was later found to cause other problems in some
328+
clusters. The `nftables` mode does not install any workaround by
329+
default, but you can check kube-proxy's
330+
`iptables_ct_state_invalid_dropped_packets_total` metric to see if
331+
your cluster is depending on the workaround, and if so, you can run
332+
kube-proxy with the option `--conntrack-tcp-be-liberal` to work
333+
around the problem in `nftables` mode.
279334

280335
### `kernelspace` proxy mode {#proxy-mode-kernelspace}
281336

0 commit comments

Comments
 (0)