Skip to content

Commit 354dcf2

Browse files
committed
KEP-3178: updates from alpha PRR comments
1 parent d798b36 commit 354dcf2

File tree

1 file changed

+27
-11
lines changed
  • keps/sig-network/3178-iptables-cleanup

1 file changed

+27
-11
lines changed

keps/sig-network/3178-iptables-cleanup/README.md

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -552,15 +552,33 @@ This section must be completed when targeting beta to a release.
552552

553553
The most likely cause of a rollout failure would be a third-party
554554
component that depended on one of the no-longer-existing IPTables
555-
chains. It is impossible to predict exactly how this third-party
556-
component would fail in this case, but it would likely impact already
557-
running workloads.
555+
chains; most likely this would be a CNI plugin (either the default
556+
network plugin or a chained plugin) or some other networking-related
557+
component (NetworkPolicy implementation, service mesh, etc).
558+
559+
It is impossible to predict exactly how this third-party component
560+
would fail in this case, but it would likely impact already running
561+
workloads.
558562

559563
###### What specific metrics should inform a rollback?
560564

561-
Any failures would be the result of third-party components being
562-
incompatible with the change, so no core Kubernetes metrics are likely
563-
to be relevant.
565+
If the default network plugin (or plugin chain) depends on the missing
566+
iptables chains, it is possible that all `CNI_ADD` calls would fail
567+
and it would become impossible to start new pods, in which case
568+
kubelet's `started_pods_errors_total` would start to climb. However,
569+
"impossible to start new pods" would likely be noticed quickly without
570+
metrics anyway...
571+
572+
For the most part, since failures would likely be in third-party
573+
components, it would be the metrics of those third-party components
574+
that would be relevant to diagnosing the problem. Since the problem is
575+
likely to manifest in the form of iptables calls failing because they
576+
reference non-existent chains, a metric for "number of iptables
577+
errors" or "time since last successful iptables update" might be
578+
useful in diagnosing problems related to this feature. (However, it is
579+
also quite possible that the third-party components in question would
580+
have no relevant metrics, and errors would be exposed only via log
581+
messages.)
564582

565583
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
566584

@@ -579,16 +597,14 @@ This section must be completed when targeting beta to a release.
579597

580598
###### How can an operator determine if the feature is in use by workloads?
581599

582-
There is no simple way to do this because if the feature is working
583-
correctly there will be no difference in externally-visible behavior.
584-
(The generated iptables rules will be different, but the _effect_ of
585-
the generated iptables rules will be the same.)
600+
The feature is not "used by workloads"; when enabled, it is always in
601+
effect and affects the cluster as a whole.
586602

587603
###### How can someone using this feature know that it is working for their instance?
588604

589605
- [X] Other (treat as last resort)
590606

591-
- Details: As above, the feature is not supposed to have any
607+
- Details: The feature is not supposed to have any
592608
externally-visible effect. If anything is not working, it is
593609
likely to be a third-party component, so it is impossible to say
594610
what a failure might look like.

0 commit comments

Comments
 (0)