You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This section contains ideas that were originally discussed but then dismissed in favour of the current design.
481
484
It also includes some links to related discussion on each topic to give some extra context, however not all decisions are documented in Github prs and may have been discussed in sig-meetings or in slack etc.
482
-
### Add a pod.spec.SidecarContainers array
485
+
486
+
#### Add a pod.spec.SidecarContainers array
483
487
An early idea was to have a separate list of containers in a similar style to init containers, they would have behaved in the same way that the current KEP details. The reason this was dismissed was due to it being considered too large a change to the API that would require a lot of updates to tooling, for a feature that in most respects would act the same as a normal container.
The primary container idea was specific to solving the issue of Jobs that don't complete with a sidecar, the suggestion was to have one container marked as the primary so that the Job would get completed when that container has finished. This was dismissed as it was too specific to Jobs whereas the more generic issues of sidecars could be useful in other places.
A boolean flag of `sidecar: true` could be used to indicate which pods are sidecars, this was dismissed as it was considered too specific and potentially other types of container lifecycle may want to be added in the future.
523
527
524
-
### Mark containers whose termination kills the pod, terminationFatalToPod: true
528
+
#### Mark containers whose termination kills the pod, terminationFatalToPod: true
525
529
This suggestion was to have the ability to mark certain containers as critical to the pod, if they exited it would cause the other containers to exit. While this helped solve things like Jobs it didn't solve the wider issue of ordering startup and shutdown.
Similar to [systemd](https://www.freedesktop.org/wiki/Software/systemd/) this would allow you to specify that a container depends on another container, preventing that container from starting until the container it depends on has also started. This could also be used in shutdown to ensure that the containers which have dependent containers are only terminated after their dependents have all safely shut down.
538
542
```yaml
539
543
containers:
@@ -545,7 +549,7 @@ This was rejected as the UX was considered to be overly complicated for the use
### Pre-defined phases for container startup/shutdown or arbitrary numbers for ordering
552
+
#### Pre-defined phases for container startup/shutdown or arbitrary numbers for ordering
549
553
There were a few variations of this but they all had a similar idea which was the ability to order both the shutdown and startup of containers using phases or numbers to determine the ordering.
This section show the alternatives and workaround app developers need to do
581
+
today.
582
+
583
+
#### Jobs with sidecar containers
584
+
585
+
The problem is described in the [Motivation
586
+
section](#problems-jobs-with-sidecar-containers). Here, we present some
587
+
alternatives and the pain points that affect users today.
588
+
589
+
Most known work-arounds for this are achieved by building an ad-hoc signalling
590
+
mechanism to communicate completion status between containers. Common
591
+
implementations use a shared scratch volume mounted into all pods, where
592
+
lifecycle status can be communicated by creating and watching for the presence
593
+
of files. With the disadvatages of:
594
+
595
+
* Repetitive lifecycle logic must be incorporated into each sidecar (code might
596
+
be shared, but is usually language dependant)
597
+
* Wrappers can alliviate that, but it is still quite complex when there are
598
+
several sidecar to wait for. When more than one sidecar is used, some
599
+
question arise: how many sidecars a wrapper has to wait for? How can that be
600
+
configured in a non-error prone way? How can I use wrappers while still inject
601
+
sidecars automatically and reliably in a mutating webhook or programmatically?
602
+
603
+
Other possible work-arounds can be using a shared PID namespace and checking for
604
+
other containers running or not. Also, it comes with several disadvatages, like:
605
+
606
+
* Security concerns around sharing PID namespace (might be able to see other
607
+
containers env vars via /proc, or even the root filesystem, depends on
608
+
permissions used)
609
+
* Restricts the possibility of changing the container runtime, until all
610
+
runtimes support a shared PID namespace
611
+
* Several applications might need re-work, as PID 1 is not the container
612
+
entrypoint anymore.
613
+
614
+
Using a wrapper with this approach might sound viable for _some_ use cases, but
615
+
when you add to the mix that containers can have more than one sidecar, then
616
+
each container has to know which other containers are sidecar to know if it is
617
+
safe to proceed. This becomes specially tricky when combined with auto-injection
618
+
of sidecars, and even more complicated if auto-inject is done by a third party
619
+
or independent team.
620
+
621
+
Furthermore, wrappers have several pain points if you want to use them for
622
+
startup, as explained in the next section.
623
+
624
+
#### Service mesh or metrics sidecars
625
+
626
+
Let app container be the main app that just has the service mesh extra container
627
+
in the pod.
628
+
629
+
Service mesh, today, have to do the following workarounds due to lack of startup
630
+
ordering:
631
+
* Blackhole all the traffic until service mesh container is up (usually using
632
+
an initContainer for this)
633
+
* Some trickery (sleep preStop hooks or some alternative) to not be killed
634
+
before other containers that need to use the network. Otherwise, traffic for
635
+
those containers will be blackholed
636
+
637
+
This means that if the app container is started before the service mesh is
638
+
started and ready, all traffic will be blackholed and the app needs to retry.
639
+
Once the service mesh container is ready, traffic will be allowed.
640
+
641
+
This has another major disadvantage: several apps crash if traffic is blackholed
642
+
during startup (common in some rails middleware, for example) and have to resort
643
+
to some kind of workaround, like [this one][linkerd-wait] to wait. This makes
644
+
also service mesh miss their goal of aumenting containers functionality without
645
+
modifying the main application.
646
+
647
+
Istio has an alternative to the initContainer hack. Istio [has an
648
+
option][istio-cni-opt] to integrate with CNI and inject the blackhole from there
649
+
instead of using the initContainer. In that case, it will do (just c&p from the
650
+
link, in case it breaks in the future):
651
+
652
+
> By default Istio injects an initContainer, istio-init, in pods deployed in the mesh. The istio-init container sets up the pod network traffic redirection to/from the Istio sidecar proxy. This requires the user or service-account deploying pods to the mesh to have sufficient Kubernetes RBAC permissions to deploy containers with the NET_ADMIN and NET_RAW capabilities. Requiring Istio users to have elevated Kubernetes RBAC permissions is problematic for some organizations’ security compliance
653
+
> ...
654
+
> The Istio CNI plugin performs the Istio mesh pod traffic redirection in the Kubernetes pod lifecycle’s network setup phase, thereby removing the requirement for the NET_ADMIN and NET_RAW capabilities for users deploying pods into the Istio mesh. The Istio CNI plugin replaces the functionality provided by the istio-init container.
655
+
656
+
In other words, Istio has an alternative to configure the traffic blockhole
657
+
without an initContainer. But the other problems and hacks mentioned remain,
0 commit comments