@@ -552,15 +552,33 @@ This section must be completed when targeting beta to a release.
552
552
553
553
The most likely cause of a rollout failure would be a third-party
554
554
component that depended on one of the no-longer-existing IPTables
555
- chains. It is impossible to predict exactly how this third-party
556
- component would fail in this case, but it would likely impact already
557
- running workloads.
555
+ chains; most likely this would be a CNI plugin (either the default
556
+ network plugin or a chained plugin) or some other networking-related
557
+ component (NetworkPolicy implementation, service mesh, etc).
558
+
559
+ It is impossible to predict exactly how this third-party component
560
+ would fail in this case, but it would likely impact already running
561
+ workloads.
558
562
559
563
###### What specific metrics should inform a rollback?
560
564
561
- Any failures would be the result of third-party components being
562
- incompatible with the change, so no core Kubernetes metrics are likely
563
- to be relevant.
565
+ If the default network plugin (or plugin chain) depends on the missing
566
+ iptables chains, it is possible that all ` CNI_ADD ` calls would fail
567
+ and it would become impossible to start new pods, in which case
568
+ kubelet's ` started_pods_errors_total ` would start to climb. However,
569
+ "impossible to start new pods" would likely be noticed quickly without
570
+ metrics anyway...
571
+
572
+ For the most part, since failures would likely be in third-party
573
+ components, it would be the metrics of those third-party components
574
+ that would be relevant to diagnosing the problem. Since the problem is
575
+ likely to manifest in the form of iptables calls failing because they
576
+ reference non-existent chains, a metric for "number of iptables
577
+ errors" or "time since last successful iptables update" might be
578
+ useful in diagnosing problems related to this feature. (However, it is
579
+ also quite possible that the third-party components in question would
580
+ have no relevant metrics, and errors would be exposed only via log
581
+ messages.)
564
582
565
583
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
566
584
@@ -579,16 +597,14 @@ This section must be completed when targeting beta to a release.
579
597
580
598
###### How can an operator determine if the feature is in use by workloads?
581
599
582
- There is no simple way to do this because if the feature is working
583
- correctly there will be no difference in externally-visible behavior.
584
- (The generated iptables rules will be different, but the _ effect_ of
585
- the generated iptables rules will be the same.)
600
+ The feature is not "used by workloads"; when enabled, it is always in
601
+ effect and affects the cluster as a whole.
586
602
587
603
###### How can someone using this feature know that it is working for their instance?
588
604
589
605
- [X] Other (treat as last resort)
590
606
591
- - Details: As above, the feature is not supposed to have any
607
+ - Details: The feature is not supposed to have any
592
608
externally-visible effect. If anything is not working, it is
593
609
likely to be a third-party component, so it is impossible to say
594
610
what a failure might look like.
0 commit comments