Skip to content

Commit 9e1fd64

Browse files
authored
Merge pull request #45821 from saschagrunert/blog-cri-streaming-explained
Add blog post about: CRI streaming explained
2 parents 5a9c5f2 + 8c9c60c commit 9e1fd64

File tree

1 file changed

+338
-0
lines changed

1 file changed

+338
-0
lines changed
Lines changed: 338 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,338 @@
1+
---
2+
layout: blog
3+
title: "Container Runtime Interface streaming explained"
4+
date: 2024-05-01
5+
slug: cri-streaming-explained
6+
author: Sascha Grunert
7+
---
8+
9+
The Kubernetes [Container Runtime Interface (CRI)](/docs/concepts/architecture/cri)
10+
acts as the main connection between the [kubelet](/docs/reference/command-line-tools-reference/kubelet)
11+
and the [Container Runtime](/docs/setup/production-environment/container-runtimes).
12+
Those runtimes have to provide a [gRPC](https://grpc.io) server which has to
13+
fulfill a Kubernetes defined [Protocol Buffer](https://protobuf.dev) interface.
14+
[This API definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto)
15+
evolves over time, for example when contributors add new features or fields are
16+
going to become deprecated.
17+
18+
In this blog post, I'd like to dive into the functionality and history of three
19+
extraordinary Remote Procedure Calls (RPCs), which are truly outstanding in
20+
terms of how they work: `Exec`, `Attach` and `PortForward`.
21+
22+
**Exec** can be used to run dedicated commands within the container and stream
23+
the output to a client like [kubectl](/docs/reference/kubectl) or
24+
[crictl](/docs/tasks/debug/debug-cluster/crictl). It also allows interaction with
25+
that process using standard input (stdin), for example if users want to run a
26+
new shell instance within an existing workload.
27+
28+
**Attach** streams the output of the currently running process via [standard I/O](https://en.wikipedia.org/wiki/Standard_streams)
29+
from the container to the client and also allows interaction with them. This is
30+
particularly useful if users want to see what is going on in the container and
31+
be able to interact with the process.
32+
33+
**PortForward** can be utilized to forward a port from the host to the container
34+
to be able to interact with it using third party network tools. This allows it
35+
to bypass [Kubernetes services](/docs/concepts/services-networking/service)
36+
for a certain workload and interact with its network interface.
37+
38+
## What is so special about them?
39+
40+
All RPCs of the CRI either use the [gRPC unary calls](https://grpc.io/docs/what-is-grpc/core-concepts/#unary-rpc)
41+
for communication or the [server side streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc)
42+
feature (only `GetContainerEvents` right now). This means that mainly all RPCs
43+
retrieve a single client request and have to return a single server response.
44+
The same applies to `Exec`, `Attach`, and `PortForward`, where their [protocol definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto#L94-L99)
45+
looks like this:
46+
47+
```protobuf
48+
// Exec prepares a streaming endpoint to execute a command in the container.
49+
rpc Exec(ExecRequest) returns (ExecResponse) {}
50+
```
51+
52+
```protobuf
53+
// Attach prepares a streaming endpoint to attach to a running container.
54+
rpc Attach(AttachRequest) returns (AttachResponse) {}
55+
```
56+
57+
```protobuf
58+
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
59+
rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}
60+
```
61+
62+
The requests carry everything required to allow the server to do the work,
63+
for example, the `ContainerId` or command (`Cmd`) to be run in case of `Exec`.
64+
More interestingly, all of their responses only contain a `url`:
65+
66+
```protobuf
67+
message ExecResponse {
68+
// Fully qualified URL of the exec streaming server.
69+
string url = 1;
70+
}
71+
```
72+
73+
```protobuf
74+
message AttachResponse {
75+
// Fully qualified URL of the attach streaming server.
76+
string url = 1;
77+
}
78+
```
79+
80+
```protobuf
81+
message PortForwardResponse {
82+
// Fully qualified URL of the port-forward streaming server.
83+
string url = 1;
84+
}
85+
```
86+
87+
Why is it implemented like that? Well, [the original design document](https://docs.google.com/document/d/1MreuHzNvkBW6q7o_zehm1CBOBof3shbtMTGtUpjpRmY)
88+
for those RPCs even predates [Kubernetes Enhancements Proposals (KEPs)](https://github.com/kubernetes/enhancements)
89+
and was originally outlined back in 2016. The kubelet had a native
90+
implementation for `Exec`, `Attach`, and `PortForward` before the
91+
initiative to bring the functionality to the CRI started. Before that,
92+
everything was bound to [Docker](https://www.docker.com) or the later abandoned
93+
container runtime [rkt](https://github.com/rkt/rkt).
94+
95+
The CRI related design document also elaborates on the option to use native RPC
96+
streaming for exec, attach, and port forward. The downsides outweighed this
97+
approach: the kubelet would still create a network bottleneck and future
98+
runtimes would not be free in choosing the server implementation details. Also,
99+
another option that the Kubelet implements a portable, runtime-agnostic solution
100+
has been abandoned over the final one, because this would mean another project
101+
to maintain which nevertheless would be runtime dependent.
102+
103+
This means, that the basic flow for `Exec`, `Attach` and `PortForward`
104+
was proposed to look like this:
105+
106+
{{< mermaid >}}
107+
sequenceDiagram
108+
participant crictl
109+
participant kubectl
110+
participant API as API Server
111+
participant kubelet
112+
participant runtime as Container Runtime
113+
participant streaming as Streaming Server
114+
alt Client alternatives
115+
Note over kubelet,runtime: Container Runtime Interface (CRI)
116+
kubectl->>API: exec, attach, port-forward
117+
API->>kubelet:
118+
kubelet->>runtime: Exec, Attach, PortForward
119+
else
120+
Note over crictl,runtime: Container Runtime Interface (CRI)
121+
crictl->>runtime: Exec, Attach, PortForward
122+
end
123+
runtime->>streaming: New Session
124+
streaming->>runtime: HTTP endpoint (URL)
125+
alt Client alternatives
126+
runtime->>kubelet: Response URL
127+
kubelet->>API:
128+
API-->>streaming: Connection upgrade (SPDY or WebSocket)
129+
streaming-)API: Stream data
130+
API-)kubectl: Stream data
131+
else
132+
runtime->>crictl: Response URL
133+
crictl-->>streaming: Connection upgrade (SPDY or WebSocket)
134+
streaming-)crictl: Stream data
135+
end
136+
{{< /mermaid >}}
137+
138+
Clients like crictl or the kubelet (via kubectl) request a new exec, attach or
139+
port forward session from the runtime using the gRPC interface. The runtime
140+
implements a streaming server that also manages the active sessions. This
141+
streaming server provides an HTTP endpoint for the client to connect to. The
142+
client upgrades the connection to use the [SPDY](https://en.wikipedia.org/wiki/SPDY)
143+
streaming protocol or (in the future) to a [WebSocket](https://en.wikipedia.org/wiki/WebSocket)
144+
connection and starts to stream the data back and forth.
145+
146+
This implementation allows runtimes to have the flexibility to implement
147+
`Exec`, `Attach` and `PortForward` the way they want, and also allows a
148+
simple test path. Runtimes can change the underlying implementation to support
149+
any kind of feature without having a need to modify the CRI at all.
150+
151+
Many smaller enhancements to this overall approach have been merged into
152+
Kubernetes in the past years, but the general pattern has always stayed the
153+
same. The kubelet source code transformed into [a reusable library](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming),
154+
which is nowadays usable from container runtimes to implement the basic
155+
streaming capability.
156+
157+
## How does the streaming actually work?
158+
159+
At a first glance, it looks like all three RPCs work the same way, but that's
160+
not the case. It's possible to group the functionality of **Exec** and
161+
**Attach**, while **PortForward** follows a distinct internal protocol
162+
definition.
163+
164+
### Exec and Attach
165+
166+
Kubernetes defines **Exec** and **Attach** as _remote commands_, where its
167+
protocol definition exists in [five different versions](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L28-L52):
168+
169+
| # | Version | Note |
170+
| --- | ------------------- | ---------------------------------------------------------------------------------------------------------------------- |
171+
| 1 | `channel.k8s.io` | Initial (unversioned) SPDY sub protocol ([#13394](https://issues.k8s.io/13394), [#13395](https://issues.k8s.io/13395)) |
172+
| 2 | `v2.channel.k8s.io` | Resolves the issues present in the first version ([#15961](https://github.com/kubernetes/kubernetes/pull/15961)) |
173+
| 3 | `v3.channel.k8s.io` | Adds support for resizing container terminals ([#25273](https://github.com/kubernetes/kubernetes/pull/25273)) |
174+
| 4 | `v4.channel.k8s.io` | Adds support for exit codes using JSON errors ([#26541](https://github.com/kubernetes/kubernetes/pull/26541)) |
175+
| 5 | `v5.channel.k8s.io` | Adds support for a CLOSE signal ([#119157](https://github.com/kubernetes/kubernetes/pull/119157)) |
176+
177+
On top of that, there is an overall effort to replace the SPDY transport
178+
protocol using WebSockets as part [KEP #4006](https://github.com/kubernetes/enhancements/issues/4006).
179+
Runtimes have to satisfy those protocols over their life cycle to stay up to
180+
date with the Kubernetes implementation.
181+
182+
Let's assume that a client uses the latest (`v5`) version of the protocol as
183+
well as communicating over WebSockets. In that case, the general flow would be:
184+
185+
1. The client requests an URL endpoint for **Exec** or **Attach** using the CRI.
186+
187+
- The server (runtime) validates the request, inserts it into a connection
188+
tracking cache, and provides the HTTP endpoint URL for that request.
189+
190+
1. The client connects to that URL, upgrades the connection to establish
191+
a WebSocket, and starts to stream data.
192+
193+
- In the case of **Attach**, the server has to stream the main container process
194+
data to the client.
195+
- In the case of **Exec**, the server has to create the subprocess command within
196+
the container and then streams the output to the client.
197+
198+
If stdin is required, then the server needs to listen for that as well and
199+
redirect it to the corresponding process.
200+
201+
Interpreting data for the defined protocol is fairly simple: The first
202+
byte of every input and output packet [defines](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L57-L64)
203+
the actual stream:
204+
205+
| First Byte | Type | Description |
206+
| ---------- | --------------- | ---------------------------------------- |
207+
| `0` | standard input | Data streamed from stdin |
208+
| `1` | standard output | Data streamed to stdout |
209+
| `2` | standard error | Data streamed to stderr |
210+
| `3` | stream error | A streaming error occurred |
211+
| `4` | stream resize | A terminal resize event |
212+
| `255` | stream close | Stream should be closed (for WebSockets) |
213+
214+
How should runtimes now implement the streaming server methods for **Exec** and
215+
**Attach** by using the provided kubelet library? The key is that the streaming
216+
server implementation in the kubelet [outlines an interface](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming/server.go#L63-L68)
217+
called `Runtime` which has to be fulfilled by the actual container runtime if it
218+
wants to use that library:
219+
220+
```go
221+
// Runtime is the interface to execute the commands and provide the streams.
222+
type Runtime interface {
223+
Exec(ctx context.Context, containerID string, cmd []string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error
224+
Attach(ctx context.Context, containerID string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error
225+
PortForward(ctx context.Context, podSandboxID string, port int32, stream io.ReadWriteCloser) error
226+
}
227+
```
228+
229+
Everything related to the protocol interpretation is
230+
already in place and runtimes only have to implement the actual `Exec` and
231+
`Attach` logic. For example, the container runtime [CRI-O](https://github.com/cri-o/cri-o)
232+
does it [like this pseudo code](https://github.com/cri-o/cri-o/blob/2a0867/server/container_exec.go#L27-L46):
233+
234+
```go
235+
func (s StreamService) Exec(
236+
ctx context.Context,
237+
containerID string,
238+
cmd []string,
239+
stdin io.Reader, stdout, stderr io.WriteCloser,
240+
tty bool,
241+
resizeChan <-chan remotecommand.TerminalSize,
242+
) error {
243+
// Retrieve the container by the provided containerID
244+
//
245+
246+
// Update the container status and verify that the workload is running
247+
//
248+
249+
// Execute the command and stream the data
250+
return s.runtimeServer.Runtime().ExecContainer(
251+
s.ctx, c, cmd, stdin, stdout, stderr, tty, resizeChan,
252+
)
253+
}
254+
```
255+
256+
### PortForward
257+
258+
Forwarding ports to a container works a bit differently when comparing it to
259+
streaming IO data from a workload. The server still has to provide a URL
260+
endpoint for the client to connect to, but then the container runtime has to
261+
enter the network namespace of the container, allocate the port as well as
262+
stream the data back and forth. There is no simple protocol definition available
263+
like for **Exec** or **Attach**. This means that the client will stream the
264+
plain SPDY frames (with or without an additional WebSocket connection) which can
265+
be interpreted using libraries like [moby/spdystream](https://github.com/moby/spdystream).
266+
267+
Luckily, the kubelet library already provides the `PortForward` interface method
268+
which has to be implemented by the runtime. [CRI-O does that]() by (simplified):
269+
270+
```go
271+
func (s StreamService) PortForward(
272+
ctx context.Context,
273+
podSandboxID string,
274+
port int32,
275+
stream io.ReadWriteCloser,
276+
) error {
277+
// Retrieve the pod sandbox by the provided podSandboxID
278+
sandboxID, err := s.runtimeServer.PodIDIndex().Get(podSandboxID)
279+
sb := s.runtimeServer.GetSandbox(sandboxID)
280+
//
281+
282+
// Get the network namespace path on disk for that sandbox
283+
netNsPath := sb.NetNsPath()
284+
//
285+
286+
// Enter the network namespace and stream the data
287+
return s.runtimeServer.Runtime().PortForwardContainer(
288+
ctx, sb.InfraContainer(), netNsPath, port, stream,
289+
)
290+
}
291+
```
292+
293+
## Future work
294+
295+
The flexibility Kubernetes provides for the RPCs `Exec`, `Attach` and
296+
`PortForward` is truly outstanding compared to other methods. Nevertheless,
297+
container runtimes have to keep up with the latest and greatest implementations
298+
to support those features in a meaningful way. The general effort to support
299+
WebSockets is not only a plain Kubernetes thing, it also has to be supported by
300+
container runtimes as well as clients like `crictl`.
301+
302+
For example, `crictl` v1.30 features a new `--transport` flag for the
303+
subcommands `exec`, `attach` and `port-forward`
304+
([#1383](https://github.com/kubernetes-sigs/cri-tools/pull/1383),
305+
[#1385](https://github.com/kubernetes-sigs/cri-tools/pull/1385))
306+
to allow choosing between `websocket` and `spdy`.
307+
308+
CRI-O is going an experimental path by moving the streaming server
309+
implementation into [conmon-rs](https://github.com/containers/conmon-rs)
310+
(a substitute for the container monitor [conmon](https://github.com/containers/conmon)). conmon-rs is
311+
a [Rust](https://www.rust-lang.org) implementation of the original container
312+
monitor and allows streaming WebSockets directly using supported libraries
313+
([#2070](https://github.com/containers/conmon-rs/pull/2070)). The major benefit
314+
of this approach is that CRI-O does not even have to be running while conmon-rs
315+
can keep active **Exec**, **Attach** and **PortForward** sessions open. The
316+
simplified flow when using crictl directly will then look like this:
317+
318+
{{< mermaid >}}
319+
sequenceDiagram
320+
autonumber
321+
participant crictl
322+
participant runtime as Container Runtime
323+
participant conmon-rs
324+
Note over crictl,runtime: Container Runtime Interface (CRI)
325+
crictl->>runtime: Exec, Attach, PortForward
326+
Note over runtime,conmon-rs: Cap’n Proto
327+
runtime->>conmon-rs: Serve Exec, Attach, PortForward
328+
conmon-rs->>runtime: HTTP endpoint (URL)
329+
runtime->>crictl: Response URL
330+
crictl-->>conmon-rs: Connection upgrade to WebSocket
331+
conmon-rs-)crictl: Stream data
332+
{{< /mermaid >}}
333+
334+
All of those enhancements require iterative design decisions, while the original
335+
well-conceived implementation acts as the foundation for those. I really hope
336+
you've enjoyed this compact journey through the history of CRI RPCs. Feel free
337+
to reach out to me anytime for suggestions or feedback using the
338+
[official Kubernetes Slack](https://kubernetes.slack.com/team/U53SUDBD4).

0 commit comments

Comments
 (0)