Skip to content

Commit f653be8

Browse files
committed
sidecar: Add alternatives/hacks used today
Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>
1 parent 1b573ef commit f653be8

File tree

1 file changed

+125
-6
lines changed

1 file changed

+125
-6
lines changed

keps/sig-node/0753-sidecarcontainers.md

Lines changed: 125 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -477,9 +477,13 @@ Older Kubelets should still be able to schedule Pods that have sidecar container
477477
[stalled]: https://github.com/kubernetes/enhancements/issues/753#issuecomment-597372056
478478

479479
## Alternatives
480+
481+
### Alternative designs considered
482+
480483
This section contains ideas that were originally discussed but then dismissed in favour of the current design.
481484
It also includes some links to related discussion on each topic to give some extra context, however not all decisions are documented in Github prs and may have been discussed in sig-meetings or in slack etc.
482-
### Add a pod.spec.SidecarContainers array
485+
486+
#### Add a pod.spec.SidecarContainers array
483487
An early idea was to have a separate list of containers in a similar style to init containers, they would have behaved in the same way that the current KEP details. The reason this was dismissed was due to it being considered too large a change to the API that would require a lot of updates to tooling, for a feature that in most respects would act the same as a normal container.
484488

485489
```yaml
@@ -494,7 +498,7 @@ Discussion links:
494498
https://github.com/kubernetes/community/pull/2148#issuecomment-388813902
495499
https://github.com/kubernetes/community/pull/2148#discussion_r221103216
496500
497-
### Mark one container as the Primary Container
501+
#### Mark one container as the Primary Container
498502
The primary container idea was specific to solving the issue of Jobs that don't complete with a sidecar, the suggestion was to have one container marked as the primary so that the Job would get completed when that container has finished. This was dismissed as it was too specific to Jobs whereas the more generic issues of sidecars could be useful in other places.
499503
```yaml
500504
kind: Job
@@ -512,7 +516,7 @@ spec:
512516
Discussion links:
513517
https://github.com/kubernetes/community/pull/2148#discussion_r192846570
514518
515-
### Boolean flag on container, Sidecar: true
519+
#### Boolean flag on container, Sidecar: true
516520
```yaml
517521
containers:
518522
- name: myApp
@@ -521,7 +525,7 @@ containers:
521525
```
522526
A boolean flag of `sidecar: true` could be used to indicate which pods are sidecars, this was dismissed as it was considered too specific and potentially other types of container lifecycle may want to be added in the future.
523527

524-
### Mark containers whose termination kills the pod, terminationFatalToPod: true
528+
#### Mark containers whose termination kills the pod, terminationFatalToPod: true
525529
This suggestion was to have the ability to mark certain containers as critical to the pod, if they exited it would cause the other containers to exit. While this helped solve things like Jobs it didn't solve the wider issue of ordering startup and shutdown.
526530

527531
```yaml
@@ -533,7 +537,7 @@ containers:
533537
Discussion links:
534538
https://github.com/kubernetes/community/pull/2148#issuecomment-414806613
535539

536-
### Add "Depends On" semantics to containers
540+
#### Add "Depends On" semantics to containers
537541
Similar to [systemd](https://www.freedesktop.org/wiki/Software/systemd/) this would allow you to specify that a container depends on another container, preventing that container from starting until the container it depends on has also started. This could also be used in shutdown to ensure that the containers which have dependent containers are only terminated after their dependents have all safely shut down.
538542
```yaml
539543
containers:
@@ -545,7 +549,7 @@ This was rejected as the UX was considered to be overly complicated for the use
545549
Discussion links:
546550
https://github.com/kubernetes/community/pull/2148#discussion_r203071377
547551

548-
### Pre-defined phases for container startup/shutdown or arbitrary numbers for ordering
552+
#### Pre-defined phases for container startup/shutdown or arbitrary numbers for ordering
549553
There were a few variations of this but they all had a similar idea which was the ability to order both the shutdown and startup of containers using phases or numbers to determine the ordering.
550554
examples:
551555
```yaml
@@ -570,3 +574,118 @@ Discussion links:
570574
https://github.com/kubernetes/community/pull/2148#issuecomment-424494976
571575
https://github.com/kubernetes/community/pull/2148#discussion_r221094552
572576
https://github.com/kubernetes/enhancements/pull/841#discussion_r257906512
577+
578+
### Workarounds sidecars need to do today
579+
580+
This section show the alternatives and workaround app developers need to do
581+
today.
582+
583+
#### Jobs with sidecar containers
584+
585+
The problem is described in the [Motivation
586+
section](#problems-jobs-with-sidecar-containers). Here, we present some
587+
alternatives and the pain points that affect users today.
588+
589+
Most known work-arounds for this are achieved by building an ad-hoc signalling
590+
mechanism to communicate completion status between containers. Common
591+
implementations use a shared scratch volume mounted into all pods, where
592+
lifecycle status can be communicated by creating and watching for the presence
593+
of files. With the disadvatages of:
594+
595+
* Repetitive lifecycle logic must be incorporated into each sidecar (code might
596+
be shared, but is usually language dependant)
597+
* Wrappers can alliviate that, but it is still quite complex when there are
598+
several sidecar to wait for. When more than one sidecar is used, some
599+
question arise: how many sidecars a wrapper has to wait for? How can that be
600+
configured in a non-error prone way? How can I use wrappers while still inject
601+
sidecars automatically and reliably in a mutating webhook or programmatically?
602+
603+
Other possible work-arounds can be using a shared PID namespace and checking for
604+
other containers running or not. Also, it comes with several disadvatages, like:
605+
606+
* Security concerns around sharing PID namespace (might be able to see other
607+
containers env vars via /proc, or even the root filesystem, depends on
608+
permissions used)
609+
* Restricts the possibility of changing the container runtime, until all
610+
runtimes support a shared PID namespace
611+
* Several applications might need re-work, as PID 1 is not the container
612+
entrypoint anymore.
613+
614+
Using a wrapper with this approach might sound viable for _some_ use cases, but
615+
when you add to the mix that containers can have more than one sidecar, then
616+
each container has to know which other containers are sidecar to know if it is
617+
safe to proceed. This becomes specially tricky when combined with auto-injection
618+
of sidecars, and even more complicated if auto-inject is done by a third party
619+
or independent team.
620+
621+
Furthermore, wrappers have several pain points if you want to use them for
622+
startup, as explained in the next section.
623+
624+
#### Service mesh or metrics sidecars
625+
626+
Let app container be the main app that just has the service mesh extra container
627+
in the pod.
628+
629+
Service mesh, today, have to do the following workarounds due to lack of startup
630+
ordering:
631+
* Blackhole all the traffic until service mesh container is up (usually using
632+
an initContainer for this)
633+
* Some trickery (sleep preStop hooks or some alternative) to not be killed
634+
before other containers that need to use the network. Otherwise, traffic for
635+
those containers will be blackholed
636+
637+
This means that if the app container is started before the service mesh is
638+
started and ready, all traffic will be blackholed and the app needs to retry.
639+
Once the service mesh container is ready, traffic will be allowed.
640+
641+
This has another major disadvantage: several apps crash if traffic is blackholed
642+
during startup (common in some rails middleware, for example) and have to resort
643+
to some kind of workaround, like [this one][linkerd-wait] to wait. This makes
644+
also service mesh miss their goal of aumenting containers functionality without
645+
modifying the main application.
646+
647+
Istio has an alternative to the initContainer hack. Istio [has an
648+
option][istio-cni-opt] to integrate with CNI and inject the blackhole from there
649+
instead of using the initContainer. In that case, it will do (just c&p from the
650+
link, in case it breaks in the future):
651+
652+
> By default Istio injects an initContainer, istio-init, in pods deployed in the mesh. The istio-init container sets up the pod network traffic redirection to/from the Istio sidecar proxy. This requires the user or service-account deploying pods to the mesh to have sufficient Kubernetes RBAC permissions to deploy containers with the NET_ADMIN and NET_RAW capabilities. Requiring Istio users to have elevated Kubernetes RBAC permissions is problematic for some organizations’ security compliance
653+
> ...
654+
> The Istio CNI plugin performs the Istio mesh pod traffic redirection in the Kubernetes pod lifecycle’s network setup phase, thereby removing the requirement for the NET_ADMIN and NET_RAW capabilities for users deploying pods into the Istio mesh. The Istio CNI plugin replaces the functionality provided by the istio-init container.
655+
656+
In other words, Istio has an alternative to configure the traffic blockhole
657+
without an initContainer. But the other problems and hacks mentioned remain,
658+
though.
659+
660+
[linkerd-last-container]: https://github.com/linkerd/linkerd2/issues/4758#issuecomment-658457737
661+
[istio-cni-opt]: https://istio.io/latest/docs/setup/additional-setup/cni/
662+
[linkerd-wait]: https://github.com/olix0r/linkerd-await
663+
664+
##### Istio bug report
665+
666+
There is also a [2 years old bug][istio-bug-report] from Istio devs that this
667+
KEP will fix. In addition, similar benefit is expected for Linkerd, as we talked
668+
with Linkerd devs.
669+
670+
One of the things mentioned there is that, at least in 2018, a workaround used
671+
was to tell the user to run a script to wait for the service mesh to start on
672+
their containers.
673+
674+
Rodrigo will reach out to Istio devs to see if the situation changed since 2018.
675+
676+
[istio-bug-report]: https://github.com/kubernetes/kubernetes/issues/65502
677+
678+
#### Move containers out of the pod
679+
680+
Due to the ordering problems of having the container in the same pod, another
681+
option is to move it out of the pod. This will, for example, remove the problem
682+
of shutdown order. Furthermore, Rodrigo will argue than in many cases this is
683+
better or more elegant in Kubernetes.
684+
685+
While some things might be possible to move to daemonset or others, it is not
686+
possible for all applications. For example some apps are not multi-tenant and
687+
this can not be an option security-wise. Or some apps would still like to have a
688+
sidecar that adds some metadata, for example.
689+
690+
While this is an option, is not possible or extremely costly for several use
691+
cases.

0 commit comments

Comments
 (0)