Skip to content

Commit 1b573ef

Browse files
committed
sidecar: Improve motivation
Reviewed-by: Joseph-Irving <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>
1 parent 145f0b5 commit 1b573ef

File tree

1 file changed

+138
-20
lines changed

1 file changed

+138
-20
lines changed

keps/sig-node/0753-sidecarcontainers.md

Lines changed: 138 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ container is a sidecar container. The only valid value for now is `sidecar`, but
9898
other values can be added in the future if needed.
9999

100100
Pods with sidecar containers only change the behavior of the startup and
101-
shutdown sequence of a pod: sidecar container are started before non-sidecars
101+
shutdown sequence of a pod: sidecar containers are started before non-sidecars
102102
and stopped after non-sidecars.
103103

104104
A pod that has sidecar containers guarantees that non-sidecar containers are
@@ -133,25 +133,143 @@ section](#graduation-criteria).
133133

134134
## Motivation
135135

136-
SideCar containers have always been used in some ways but just not formally identified as such, they are becoming more common in a lot of applications and as more people have used them, more issues have cropped up.
137-
138-
Here are some examples of the main problems:
139-
140-
### Jobs
141-
If you have a Job with two containers one of which is actually doing the main processing of the job and the other is just facilitating it, you encounter a problem when the main process finishes; your sidecar container will carry on running so the job will never finish.
142-
143-
The only way around this problem is to manage the sidecar container's lifecycle manually and arrange for it to exit when the main container exits. This is typically achieved by building an ad-hoc signalling mechanism to communicate completion status between containers. Common implementations use a shared scratch volume mounted into all pods, where lifecycle status can be communicated by creating and watching for the presence of files. This pattern has several disadvantages:
144-
145-
* Repetitive lifecycle logic must be rewritten in each instance a sidecar is deployed.
146-
* Third-party containers typically require a wrapper to add this behaviour, normally provided via an entrypoint wrapper script implemented in the k8s container spec. This adds undesirable overhead and introduces repetition between the k8s and upstream container image specs.
147-
* The wrapping typically requires the presence of a shell in the container image, so this pattern does not work for minimal containers which ship without a toolchain.
148-
149-
### Startup
150-
An application that has a proxy container acting as a sidecar may fail when it starts up as it's unable to communicate until its proxy has started up successfully. Readiness probes don't help if the application is trying to talk outbound.
151-
152-
### Shutdown
153-
Applications that rely on sidecars may experience a high amount of errors when shutting down as the sidecar may terminate before the application has finished what it's doing.
154-
136+
The concept of sidecar containers has been around almost since Kubernetes early
137+
days. A clear example is [this Kubernetes blog post][sidecar-blog-post] from
138+
2015 mentioning the sidecar pattern.
139+
140+
Over the years the sidecar pattern has become more common in applications,
141+
gained popularity and uses cases are getting more diverse. The current
142+
Kubernetes primitives handled that well, but they are starting to fall short for
143+
several use cases and force weird work-arounds in the applications.
144+
145+
This proposal aims to remediate that by adding a simple set of guarantees for
146+
sidecar containers, while trying to not do a complete re-implementation of an
147+
init system to do so. Those alternatives are interesting and were considered,
148+
but in the end the community decided to go for something simpler that will cover
149+
most of the use cases. Those options are explored in the
150+
[alternatives](#alternatives) section.
151+
152+
The next section expands on what the current problems are. But, to give more
153+
context, it is important to highlight that some companies are already using a
154+
fork of Kubernetes with this sidecar functionality added (not all
155+
implementations are the same, but more than one company has a fork for this).
156+
157+
[sidecar-blog-post]: https://kubernetes.io/blog/2015/06/the-distributed-system-toolkit-patterns/#example-1-sidecar-containers
158+
159+
### Problems: jobs with sidecar containers
160+
161+
Imagine you have a Job with two containers: one which does the main processing
162+
of the job and the other is just a sidecar facilitating it. This sidecar can be
163+
a service mesh, a metrics gathering statsd server, etc.
164+
165+
When the main processing finishes, the pod won't terminate until the sidecar
166+
container finishes too. This is problematic for sidecar containers that run
167+
continously, like service-mesh or statsd metrics server or servers in general.
168+
169+
There is no simple way to handle this on Kubernetes today. There are
170+
work-arounds for this problem, most of them consist of some form of coupling
171+
between the containers to add some logic where a container that finishes
172+
communicates it so other containers can react. But it gets tricky when you have
173+
more than one sidecar container or want to auto-inject sidecars. Some
174+
alternatives to achieve this currently and their pain points are discussed in
175+
detail on the [alternatives](#alternatives) section.
176+
177+
### Problems: service mesh or metrics sidecars
178+
179+
While this problem is general to any sidecar container that might need to start
180+
before others or stop after others with special handling on shutdown, we use
181+
here a service mesh, metrics gathering and logging sidecar for the examples.
182+
183+
#### Logging/Metrics sidecar
184+
185+
A logging sidecar should start before several other containers, to not
186+
lose logs from the startup of other applications. Let's call _main container_
187+
the app that will log and _logging container_ the sidecar that will facilitate
188+
it.
189+
190+
If the logging container starts after the main app, some logs can be lost.
191+
Furthermore, if the logging container is not yet started and the main app
192+
crashes on startup, those logs can be lost (depends if logs to a shared volume
193+
or over the network on localhost, etc.). Startup is usually not that critical,
194+
because if the case where the other container is not started (like a restart) is
195+
important, is usually handled correctly.
196+
197+
On shutdown the ordering behavior is arguably more important: if the logging
198+
container is stopped first, logs for other containers are lost. No matter if
199+
those containers queue them and retry to send them to the logging container, or
200+
if they are persisted to a shared volume. The logging container is already
201+
killed and will not be started, as the pod is shutting down. In those cases,
202+
logs are lost.
203+
204+
The same things regarding startup and shutdown apply for a metrics container.
205+
206+
Some work-arounds can be done, to alliviate the symptoms:
207+
* Ignore SIGTERM on the sidecar container, so it is alive until the pod is
208+
killed. This is not ideal and increases _a lot_ the time a pod will take to
209+
terminate. For example, if the P99 response time is 2 minutes and therefore the
210+
TerminationGracePeriodSeconds is set to that, the main container can finish in 2
211+
seconds (that might be the average) but ignoring SIGTERM in the sidecar
212+
container will force the pod to live for 2 minutes anyways.
213+
* Use preStop hooks that just runs a "sleep X" seconds. This is very similar to
214+
the previous item and has the same problems.
215+
216+
#### Service mesh
217+
218+
Service mesh present a similar problem: you want them to be set and ready before
219+
other containers start, so any inbound/outbound connection that any container can
220+
initiate goes through the service mesh.
221+
222+
A similar problem happens for shutdown: if the service mesh container is
223+
terminated prior to other containers, outgoing traffic from other apps will be
224+
blackholed or not use the service mesh.
225+
226+
However, as none of these are possible to guarantee most service mesh (like
227+
Linkerd and Istio), for example, need to do several hacks to have the basic
228+
functionality. These are explained in detail in the
229+
[alternatives](#alternatives) section. Nonetheless, here is a quick highlight
230+
of some of the things some service mesh currently need to do:
231+
232+
* Recommend users to delay starting their apps by using a script to wait for
233+
the service mesh to be ready. The goal of a service mesh to augment the app
234+
functionality without modifying it is lost in this case.
235+
* To guarantee that traffic goes via the services mesh, an initContainer is
236+
added to blackhole traffic until the service mesh containers is up. This way,
237+
other containers that might be started before the service mesh container can't
238+
use the network until the service mesh container is started and ready. A side
239+
effect is that traffic is blackholed until the service mesh is up and in a
240+
ready state.
241+
* Use preStop hooks with a "sleep infinity" to make sure the service mesh
242+
doesn't terminate before other containers that might be serving requests.
243+
244+
The auto-inject of initContainer [has caused bugs][linkerd-bug-init-last], as it
245+
competes with other tools auto-injecting container to be run last too.
246+
247+
[linkerd-bug-init-last]: https://github.com/linkerd/linkerd2/issues/4758#issuecomment-658457737
248+
249+
### Problems: Coupling infrastructure with applications
250+
251+
The limitations to have some ordering guarantees on startup and shutdown,
252+
sometimes forces to couple application code with infrastructure. This is not
253+
nice, as the infrastructure is ideally handled independently, and forces more
254+
coordination between multiple, possibly indepedant, teams.
255+
256+
An example in the open source world about this is the [Istio CNI
257+
plugin][istio-cni-plugin]. This was created as an alternative for the
258+
initContainer hack that service mesh need to do. The initContainer will
259+
blackhole all traffic until the Istio container is up. But this alternative
260+
requires that nodes have the CNI plugin installed, effectively coupling the
261+
service mesh app with the infrastructure.
262+
263+
This KEP proposal removes the need for service mesh to use either an
264+
initContainer or a CNI plugin: just guarantee that the sidecar container can be
265+
started first.
266+
267+
While in this specific example the CNI plugin has some benefits too (removes the
268+
need for some capabilities in the pod) and might be worth pursing, it is used
269+
here just as an examples of possible coupling apps with infrastructure. Similar
270+
examples also exist in-house for not open source applications too.
271+
272+
[istio-cni-plugin]: https://istio.io/latest/docs/setup/additional-setup/cni/
155273

156274
## Goals
157275

0 commit comments

Comments
 (0)