Skip to content

Commit 99e38d1

Browse files
authored
Merge pull request #5044 from danwinship/nftables-ga
KEP-3866 nftables kube-proxy to GA
2 parents b2b3a1e + 9c3c332 commit 99e38d1

File tree

4 files changed

+50
-57
lines changed

4 files changed

+50
-57
lines changed

keps/prod-readiness/sig-network/3866.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@ alpha:
66
approver: "@wojtek-t"
77
beta:
88
approver: "@wojtek-t"
9+
stable:
10+
approver: "@wojtek-t"

keps/sig-network/3866-nftables-proxy/README.md

Lines changed: 44 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1206,7 +1206,8 @@ create their own table, and not interfere with anyone else's tables.
12061206
If we document the `priority` values we use to connect to each
12071207
nftables hook, then admins and third party developers should be able
12081208
to reliably process packets before or after kube-proxy, without
1209-
needing to modify kube-proxy's chains/rules.
1209+
needing to modify kube-proxy's chains/rules. (As of 1.33, this is now
1210+
documented.)
12101211

12111212
In cases where administrators want to insert rules into the middle of
12121213
particular service or endpoint chains, we would have the same problem
@@ -1224,16 +1225,6 @@ probably makes sense to leave these out initially and see if people
12241225
actually do need them, or if creating rules in another table is
12251226
sufficient.
12261227

1227-
```
1228-
<<[UNRESOLVED external rule integration API ]>>
1229-
1230-
Tigera is currently working on implementing nftables support in
1231-
Calico, so hopefully by 1.32 we should have a good idea of what
1232-
guarantees it needs from nftables kube-proxy.
1233-
1234-
<<[/UNRESOLVED]>>
1235-
```
1236-
12371228
#### Rule monitoring
12381229

12391230
Given the constraints of the iptables API, it would be extremely
@@ -1425,18 +1416,6 @@ We will eventually need e2e tests for switching between `iptables` and
14251416
[It should recreate its iptables rules if they are deleted]: https://github.com/kubernetes/kubernetes/blob/v1.27.0/test/e2e/network/networking.go#L550
14261417
[`TestUnderTemporaryNetworkFailure`]: https://github.com/kubernetes/kubernetes/blob/v1.27.0-alpha.2/test/e2e/framework/network/utils.go#L1078
14271418

1428-
<!--
1429-
This question should be filled when targeting a release.
1430-
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
1431-
1432-
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
1433-
https://storage.googleapis.com/k8s-triage/index.html
1434-
1435-
We expect no non-infra related flakes in the last month as a GA graduation criteria.
1436-
-->
1437-
1438-
- <test>: <link to test coverage>
1439-
14401419
#### Scalability & Performance tests
14411420

14421421
We have an [nftables scalability job]. Initial performance is fine; we
@@ -1577,14 +1556,14 @@ Yes, though it is necessary to clean up the nftables rules that were
15771556
created, or they will continue to intercept service traffic. In any
15781557
normal case, this should happen automatically when restarting
15791558
kube-proxy in `iptables` or `ipvs` mode, however, that assumes the
1580-
user is rolling back to a still-new-enough version of kube-proxy. If
1581-
the user wants to roll back the cluster to a version of Kubernetes
1582-
that doesn't have the nftables kube-proxy code (i.e., rolling back
1583-
from Alpha to Pre-Alpha), or if they are rolling back to an external
1584-
service proxy implementation (e.g., kpng), then they would need to
1585-
make sure that the nftables rules got cleaned up _before_ they rolled
1586-
back, or else clean them up manually. (We can document how to do
1587-
this.)
1559+
user is rolling back to a version of kube-proxy that has at least the
1560+
Alpha nftables code (1.29+). If the user wants to roll back the
1561+
cluster to a version of Kubernetes that doesn't have the nftables
1562+
kube-proxy code (i.e., rolling back from Alpha to Pre-Alpha), or if
1563+
they are rolling back to an external service proxy implementation
1564+
(e.g., kpng), then they would need to make sure that the nftables
1565+
rules got cleaned up _before_ they rolled back, or else clean them up
1566+
manually. (We document how to do this.)
15881567

15891568
(By the time we are considering making the `nftables` backend the
15901569
default in the future, the feature will have existed and been GA for
@@ -1635,14 +1614,23 @@ provide more information.
16351614

16361615
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
16371616

1638-
TBD; we plan to add an e2e job to test switching from `iptables` mode
1639-
to `nftables` mode in 1.31.
1640-
1641-
<!--
1642-
Describe manual testing that was done and the outcomes.
1643-
Longer term, we may want to require automated upgrade/rollback tests, but we
1644-
are missing a bunch of machinery and tooling and can't do that now.
1645-
-->
1617+
Tested by hand:
1618+
1619+
1. Start kube-proxy in `iptables` mode.
1620+
2. Confirm (via `iptables-save`) that iptables rules exist for Services.
1621+
3. Kill kube-proxy.
1622+
4. Start kube-proxy in `nftables` mode.
1623+
5. Confirm (via `iptables-save`) that iptables rules for Services no
1624+
longer exist. (There will still be a handful of iptables chains
1625+
left over, but nothing that actually affects the behavior of
1626+
services.)
1627+
6. Confirm (via `nft list ruleset`) that nftables rules for Services
1628+
exist.
1629+
7. Kill kube-proxy.
1630+
8. Start kube-proxy in `iptables` mode again.
1631+
9. Confirm (via `iptables-save`) that iptables rules exist for Services.
1632+
10. Confirm (via `nft list ruleset`) that the `kube-proxy` table (or
1633+
tables, if dual-stack) has been deleted.
16461634

16471635
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
16481636

@@ -1670,9 +1658,21 @@ For Beta, the goal is for the [network programming latency] to
16701658
be equivalent to the _old_, pre-[KEP-3453] iptables performance
16711659
(because the current code is not yet heavily optimized).
16721660

1673-
For GA, the goal is for it to be at least as good as the current
1661+
For GA, the goal was for it to be at least as good as the current
16741662
iptables performance.
16751663

1664+
In fact, we never got entirely clear measurements of this, because the
1665+
iptables-based 1000 node perf/scale test still uses `minSyncPeriod:
1666+
10s`, while the nftables-based one does not. However, the nftables
1667+
performance is quite satisfactory (and the fact that it is able to
1668+
have satisfactory performance without using `minSyncPeriod` is also a
1669+
major win).
1670+
1671+
Meanwhile, nftables data plane performance is _substantially_ better
1672+
than iptables:
1673+
1674+
![iptables-vs-nftables kube-proxy data plane performance](iptables-vs-nftables.svg)
1675+
16761676
[network programming latency]: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/network_programming_latency.md
16771677
[KEP-3453]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3453-minimize-iptables-restore
16781678

@@ -1701,7 +1701,9 @@ more-or-less **O(1)** behavior, so knowing the number of elements is
17011701
not going to give you much information about how well the system is
17021702
likely to be performing.
17031703

1704-
(Update while going to Beta: it's still not clear.)
1704+
(Update while going to GA: it's still not clear. We have not found
1705+
ourselves wanting any additional metrics, nor have we received any
1706+
requests for additional metrics.)
17051707

17061708
- [X] Metrics
17071709
- Metric names:
@@ -1774,26 +1776,14 @@ processed until the apiserver is available again.
17741776

17751777
###### What are other known failure modes?
17761778

1777-
<!--
1778-
For each of them, fill in the following information by copying the below template:
1779-
- [Failure mode brief description]
1780-
- Detection: How can it be detected via metrics? Stated another way:
1781-
how can an operator troubleshoot without logging into a master or worker node?
1782-
- Mitigations: What can be done to stop the bleeding, especially for already
1783-
running user workloads?
1784-
- Diagnostics: What are the useful log messages and their required logging
1785-
levels that could help debug the issue?
1786-
Not required until feature graduated to beta.
1787-
- Testing: Are there any tests for failure mode? If not, describe why.
1788-
-->
1789-
17901779
###### What steps should be taken if SLOs are not being met to determine the problem?
17911780

17921781
## Implementation History
17931782

17941783
- Initial proposal: 2023-02-01
17951784
- Merged: 2023-10-06
17961785
- Updates for beta: 2024-05-24
1786+
- Updates for GA: 2025-01-15
17971787

17981788
## Drawbacks
17991789

keps/sig-network/3866-nftables-proxy/iptables-vs-nftables.svg

Lines changed: 1 addition & 0 deletions
Loading

keps/sig-network/3866-nftables-proxy/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ kep-number: 3866
33
authors:
44
- "@danwinship"
55
owning-sig: sig-network
6-
status: implementable
6+
status: implemented
77
creation-date: 2023-02-01
88
reviewers:
99
- "@thockin"
@@ -13,12 +13,12 @@ approvers:
1313
- "@thockin"
1414

1515
# The target maturity stage in the current dev cycle for this KEP.
16-
stage: beta
16+
stage: stable
1717

1818
# The most recent milestone for which work toward delivery of this KEP has been
1919
# done. This can be the current (upcoming) milestone, if it is being actively
2020
# worked on.
21-
latest-milestone: "v1.31"
21+
latest-milestone: "v1.33"
2222

2323
# The milestone at which this feature was, or is targeted to be, at each stage.
2424
milestone:

0 commit comments

Comments
 (0)