@@ -1206,7 +1206,8 @@ create their own table, and not interfere with anyone else's tables.
1206
1206
If we document the ` priority ` values we use to connect to each
1207
1207
nftables hook, then admins and third party developers should be able
1208
1208
to reliably process packets before or after kube-proxy, without
1209
- needing to modify kube-proxy's chains/rules.
1209
+ needing to modify kube-proxy's chains/rules. (As of 1.33, this is now
1210
+ documented.)
1210
1211
1211
1212
In cases where administrators want to insert rules into the middle of
1212
1213
particular service or endpoint chains, we would have the same problem
@@ -1224,16 +1225,6 @@ probably makes sense to leave these out initially and see if people
1224
1225
actually do need them, or if creating rules in another table is
1225
1226
sufficient.
1226
1227
1227
- ```
1228
- <<[UNRESOLVED external rule integration API ]>>
1229
-
1230
- Tigera is currently working on implementing nftables support in
1231
- Calico, so hopefully by 1.32 we should have a good idea of what
1232
- guarantees it needs from nftables kube-proxy.
1233
-
1234
- <<[/UNRESOLVED]>>
1235
- ```
1236
-
1237
1228
#### Rule monitoring
1238
1229
1239
1230
Given the constraints of the iptables API, it would be extremely
@@ -1425,18 +1416,6 @@ We will eventually need e2e tests for switching between `iptables` and
1425
1416
[ It should recreate its iptables rules if they are deleted ] : https://github.com/kubernetes/kubernetes/blob/v1.27.0/test/e2e/network/networking.go#L550
1426
1417
[ `TestUnderTemporaryNetworkFailure` ] : https://github.com/kubernetes/kubernetes/blob/v1.27.0-alpha.2/test/e2e/framework/network/utils.go#L1078
1427
1418
1428
- <!--
1429
- This question should be filled when targeting a release.
1430
- For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
1431
-
1432
- For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
1433
- https://storage.googleapis.com/k8s-triage/index.html
1434
-
1435
- We expect no non-infra related flakes in the last month as a GA graduation criteria.
1436
- -->
1437
-
1438
- - <test >: <link to test coverage >
1439
-
1440
1419
#### Scalability & Performance tests
1441
1420
1442
1421
We have an [ nftables scalability job] . Initial performance is fine; we
@@ -1577,14 +1556,14 @@ Yes, though it is necessary to clean up the nftables rules that were
1577
1556
created, or they will continue to intercept service traffic. In any
1578
1557
normal case, this should happen automatically when restarting
1579
1558
kube-proxy in ` iptables ` or ` ipvs ` mode, however, that assumes the
1580
- user is rolling back to a still-new-enough version of kube-proxy. If
1581
- the user wants to roll back the cluster to a version of Kubernetes
1582
- that doesn't have the nftables kube-proxy code (i.e., rolling back
1583
- from Alpha to Pre-Alpha), or if they are rolling back to an external
1584
- service proxy implementation (e.g., kpng), then they would need to
1585
- make sure that the nftables rules got cleaned up _ before _ they rolled
1586
- back, or else clean them up manually. (We can document how to do
1587
- this.)
1559
+ user is rolling back to a version of kube-proxy that has at least the
1560
+ Alpha nftables code (1.29+). If the user wants to roll back the
1561
+ cluster to a version of Kubernetes that doesn't have the nftables
1562
+ kube-proxy code (i.e., rolling back from Alpha to Pre-Alpha), or if
1563
+ they are rolling back to an external service proxy implementation
1564
+ (e.g., kpng), then they would need to make sure that the nftables
1565
+ rules got cleaned up _ before _ they rolled back, or else clean them up
1566
+ manually. (We document how to do this.)
1588
1567
1589
1568
(By the time we are considering making the ` nftables ` backend the
1590
1569
default in the future, the feature will have existed and been GA for
@@ -1635,14 +1614,23 @@ provide more information.
1635
1614
1636
1615
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
1637
1616
1638
- TBD; we plan to add an e2e job to test switching from ` iptables ` mode
1639
- to ` nftables ` mode in 1.31.
1640
-
1641
- <!--
1642
- Describe manual testing that was done and the outcomes.
1643
- Longer term, we may want to require automated upgrade/rollback tests, but we
1644
- are missing a bunch of machinery and tooling and can't do that now.
1645
- -->
1617
+ Tested by hand:
1618
+
1619
+ 1 . Start kube-proxy in ` iptables ` mode.
1620
+ 2 . Confirm (via ` iptables-save ` ) that iptables rules exist for Services.
1621
+ 3 . Kill kube-proxy.
1622
+ 4 . Start kube-proxy in ` nftables ` mode.
1623
+ 5 . Confirm (via ` iptables-save ` ) that iptables rules for Services no
1624
+ longer exist. (There will still be a handful of iptables chains
1625
+ left over, but nothing that actually affects the behavior of
1626
+ services.)
1627
+ 6 . Confirm (via ` nft list ruleset ` ) that nftables rules for Services
1628
+ exist.
1629
+ 7 . Kill kube-proxy.
1630
+ 8 . Start kube-proxy in ` iptables ` mode again.
1631
+ 9 . Confirm (via ` iptables-save ` ) that iptables rules exist for Services.
1632
+ 10 . Confirm (via ` nft list ruleset ` ) that the ` kube-proxy ` table (or
1633
+ tables, if dual-stack) has been deleted.
1646
1634
1647
1635
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
1648
1636
@@ -1670,9 +1658,21 @@ For Beta, the goal is for the [network programming latency] to
1670
1658
be equivalent to the _ old_ , pre-[ KEP-3453] iptables performance
1671
1659
(because the current code is not yet heavily optimized).
1672
1660
1673
- For GA, the goal is for it to be at least as good as the current
1661
+ For GA, the goal was for it to be at least as good as the current
1674
1662
iptables performance.
1675
1663
1664
+ In fact, we never got entirely clear measurements of this, because the
1665
+ iptables-based 1000 node perf/scale test still uses `minSyncPeriod:
1666
+ 10s`, while the nftables-based one does not. However, the nftables
1667
+ performance is quite satisfactory (and the fact that it is able to
1668
+ have satisfactory performance without using ` minSyncPeriod ` is also a
1669
+ major win).
1670
+
1671
+ Meanwhile, nftables data plane performance is _ substantially_ better
1672
+ than iptables:
1673
+
1674
+ ![ iptables-vs-nftables kube-proxy data plane performance] ( iptables-vs-nftables.svg )
1675
+
1676
1676
[ network programming latency ] : https://github.com/kubernetes/community/blob/master/sig-scalability/slos/network_programming_latency.md
1677
1677
[ KEP-3453 ] : https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/3453-minimize-iptables-restore
1678
1678
@@ -1701,7 +1701,9 @@ more-or-less **O(1)** behavior, so knowing the number of elements is
1701
1701
not going to give you much information about how well the system is
1702
1702
likely to be performing.
1703
1703
1704
- (Update while going to Beta: it's still not clear.)
1704
+ (Update while going to GA: it's still not clear. We have not found
1705
+ ourselves wanting any additional metrics, nor have we received any
1706
+ requests for additional metrics.)
1705
1707
1706
1708
- [X] Metrics
1707
1709
- Metric names:
@@ -1774,26 +1776,14 @@ processed until the apiserver is available again.
1774
1776
1775
1777
###### What are other known failure modes?
1776
1778
1777
- <!--
1778
- For each of them, fill in the following information by copying the below template:
1779
- - [Failure mode brief description]
1780
- - Detection: How can it be detected via metrics? Stated another way:
1781
- how can an operator troubleshoot without logging into a master or worker node?
1782
- - Mitigations: What can be done to stop the bleeding, especially for already
1783
- running user workloads?
1784
- - Diagnostics: What are the useful log messages and their required logging
1785
- levels that could help debug the issue?
1786
- Not required until feature graduated to beta.
1787
- - Testing: Are there any tests for failure mode? If not, describe why.
1788
- -->
1789
-
1790
1779
###### What steps should be taken if SLOs are not being met to determine the problem?
1791
1780
1792
1781
## Implementation History
1793
1782
1794
1783
- Initial proposal: 2023-02-01
1795
1784
- Merged: 2023-10-06
1796
1785
- Updates for beta: 2024-05-24
1786
+ - Updates for GA: 2025-01-15
1797
1787
1798
1788
## Drawbacks
1799
1789
0 commit comments