|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Container Runtime Interface streaming explained" |
| 4 | +date: 2024-05-01 |
| 5 | +slug: cri-streaming-explained |
| 6 | +author: Sascha Grunert |
| 7 | +--- |
| 8 | + |
| 9 | +The Kubernetes [Container Runtime Interface (CRI)](/docs/concepts/architecture/cri) |
| 10 | +acts as the main connection between the [kubelet](/docs/reference/command-line-tools-reference/kubelet) |
| 11 | +and the [Container Runtime](/docs/setup/production-environment/container-runtimes). |
| 12 | +Those runtimes have to provide a [gRPC](https://grpc.io) server which has to |
| 13 | +fulfill a Kubernetes defined [Protocol Buffer](https://protobuf.dev) interface. |
| 14 | +[This API definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto) |
| 15 | +evolves over time, for example when contributors add new features or fields are |
| 16 | +going to become deprecated. |
| 17 | + |
| 18 | +In this blog post, I'd like to dive into the functionality and history of three |
| 19 | +extraordinary Remote Procedure Calls (RPCs), which are truly outstanding in |
| 20 | +terms of how they work: `Exec`, `Attach` and `PortForward`. |
| 21 | + |
| 22 | +**Exec** can be used to run dedicated commands within the container and stream |
| 23 | +the output to a client like [kubectl](/docs/reference/kubectl) or |
| 24 | +[crictl](/docs/tasks/debug/debug-cluster/crictl). It also allows interaction with |
| 25 | +that process using standard input (stdin), for example if users want to run a |
| 26 | +new shell instance within an existing workload. |
| 27 | + |
| 28 | +**Attach** streams the output of the currently running process via [standard I/O](https://en.wikipedia.org/wiki/Standard_streams) |
| 29 | +from the container to the client and also allows interaction with them. This is |
| 30 | +particularly useful if users want to see what is going on in the container and |
| 31 | +be able to interact with the process. |
| 32 | + |
| 33 | +**PortForward** can be utilized to forward a port from the host to the container |
| 34 | +to be able to interact with it using third party network tools. This allows it |
| 35 | +to bypass [Kubernetes services](/docs/concepts/services-networking/service) |
| 36 | +for a certain workload and interact with its network interface. |
| 37 | + |
| 38 | +## What is so special about them? |
| 39 | + |
| 40 | +All RPCs of the CRI either use the [gRPC unary calls](https://grpc.io/docs/what-is-grpc/core-concepts/#unary-rpc) |
| 41 | +for communication or the [server side streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc) |
| 42 | +feature (only `GetContainerEvents` right now). This means that mainly all RPCs |
| 43 | +retrieve a single client request and have to return a single server response. |
| 44 | +The same applies to `Exec`, `Attach`, and `PortForward`, where their [protocol definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto#L94-L99) |
| 45 | +looks like this: |
| 46 | + |
| 47 | +```protobuf |
| 48 | +// Exec prepares a streaming endpoint to execute a command in the container. |
| 49 | +rpc Exec(ExecRequest) returns (ExecResponse) {} |
| 50 | +``` |
| 51 | + |
| 52 | +```protobuf |
| 53 | +// Attach prepares a streaming endpoint to attach to a running container. |
| 54 | +rpc Attach(AttachRequest) returns (AttachResponse) {} |
| 55 | +``` |
| 56 | + |
| 57 | +```protobuf |
| 58 | +// PortForward prepares a streaming endpoint to forward ports from a PodSandbox. |
| 59 | +rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {} |
| 60 | +``` |
| 61 | + |
| 62 | +The requests carry everything required to allow the server to do the work, |
| 63 | +for example, the `ContainerId` or command (`Cmd`) to be run in case of `Exec`. |
| 64 | +More interestingly, all of their responses only contain a `url`: |
| 65 | + |
| 66 | +```protobuf |
| 67 | +message ExecResponse { |
| 68 | + // Fully qualified URL of the exec streaming server. |
| 69 | + string url = 1; |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +```protobuf |
| 74 | +message AttachResponse { |
| 75 | + // Fully qualified URL of the attach streaming server. |
| 76 | + string url = 1; |
| 77 | +} |
| 78 | +``` |
| 79 | + |
| 80 | +```protobuf |
| 81 | +message PortForwardResponse { |
| 82 | + // Fully qualified URL of the port-forward streaming server. |
| 83 | + string url = 1; |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +Why is it implemented like that? Well, [the original design document](https://docs.google.com/document/d/1MreuHzNvkBW6q7o_zehm1CBOBof3shbtMTGtUpjpRmY) |
| 88 | +for those RPCs even predates [Kubernetes Enhancements Proposals (KEPs)](https://github.com/kubernetes/enhancements) |
| 89 | +and was originally outlined back in 2016. The kubelet had a native |
| 90 | +implementation for `Exec`, `Attach`, and `PortForward` before the |
| 91 | +initiative to bring the functionality to the CRI started. Before that, |
| 92 | +everything was bound to [Docker](https://www.docker.com) or the later abandoned |
| 93 | +container runtime [rkt](https://github.com/rkt/rkt). |
| 94 | + |
| 95 | +The CRI related design document also elaborates on the option to use native RPC |
| 96 | +streaming for exec, attach, and port forward. The downsides outweighed this |
| 97 | +approach: the kubelet would still create a network bottleneck and future |
| 98 | +runtimes would not be free in choosing the server implementation details. Also, |
| 99 | +another option that the Kubelet implements a portable, runtime-agnostic solution |
| 100 | +has been abandoned over the final one, because this would mean another project |
| 101 | +to maintain which nevertheless would be runtime dependent. |
| 102 | + |
| 103 | +This means, that the basic flow for `Exec`, `Attach` and `PortForward` |
| 104 | +was proposed to look like this: |
| 105 | + |
| 106 | +{{< mermaid >}} |
| 107 | +sequenceDiagram |
| 108 | + participant crictl |
| 109 | + participant kubectl |
| 110 | + participant API as API Server |
| 111 | + participant kubelet |
| 112 | + participant runtime as Container Runtime |
| 113 | + participant streaming as Streaming Server |
| 114 | + alt Client alternatives |
| 115 | + Note over kubelet,runtime: Container Runtime Interface (CRI) |
| 116 | + kubectl->>API: exec, attach, port-forward |
| 117 | + API->>kubelet: |
| 118 | + kubelet->>runtime: Exec, Attach, PortForward |
| 119 | + else |
| 120 | + Note over crictl,runtime: Container Runtime Interface (CRI) |
| 121 | + crictl->>runtime: Exec, Attach, PortForward |
| 122 | + end |
| 123 | + runtime->>streaming: New Session |
| 124 | + streaming->>runtime: HTTP endpoint (URL) |
| 125 | + alt Client alternatives |
| 126 | + runtime->>kubelet: Response URL |
| 127 | + kubelet->>API: |
| 128 | + API-->>streaming: Connection upgrade (SPDY or WebSocket) |
| 129 | + streaming-)API: Stream data |
| 130 | + API-)kubectl: Stream data |
| 131 | + else |
| 132 | + runtime->>crictl: Response URL |
| 133 | + crictl-->>streaming: Connection upgrade (SPDY or WebSocket) |
| 134 | + streaming-)crictl: Stream data |
| 135 | + end |
| 136 | +{{< /mermaid >}} |
| 137 | + |
| 138 | +Clients like crictl or the kubelet (via kubectl) request a new exec, attach or |
| 139 | +port forward session from the runtime using the gRPC interface. The runtime |
| 140 | +implements a streaming server that also manages the active sessions. This |
| 141 | +streaming server provides an HTTP endpoint for the client to connect to. The |
| 142 | +client upgrades the connection to use the [SPDY](https://en.wikipedia.org/wiki/SPDY) |
| 143 | +streaming protocol or (in the future) to a [WebSocket](https://en.wikipedia.org/wiki/WebSocket) |
| 144 | +connection and starts to stream the data back and forth. |
| 145 | + |
| 146 | +This implementation allows runtimes to have the flexibility to implement |
| 147 | +`Exec`, `Attach` and `PortForward` the way they want, and also allows a |
| 148 | +simple test path. Runtimes can change the underlying implementation to support |
| 149 | +any kind of feature without having a need to modify the CRI at all. |
| 150 | + |
| 151 | +Many smaller enhancements to this overall approach have been merged into |
| 152 | +Kubernetes in the past years, but the general pattern has always stayed the |
| 153 | +same. The kubelet source code transformed into [a reusable library](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming), |
| 154 | +which is nowadays usable from container runtimes to implement the basic |
| 155 | +streaming capability. |
| 156 | + |
| 157 | +## How does the streaming actually work? |
| 158 | + |
| 159 | +At a first glance, it looks like all three RPCs work the same way, but that's |
| 160 | +not the case. It's possible to group the functionality of **Exec** and |
| 161 | +**Attach**, while **PortForward** follows a distinct internal protocol |
| 162 | +definition. |
| 163 | + |
| 164 | +### Exec and Attach |
| 165 | + |
| 166 | +Kubernetes defines **Exec** and **Attach** as _remote commands_, where its |
| 167 | +protocol definition exists in [five different versions](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L28-L52): |
| 168 | + |
| 169 | +| # | Version | Note | |
| 170 | +| --- | ------------------- | ---------------------------------------------------------------------------------------------------------------------- | |
| 171 | +| 1 | `channel.k8s.io` | Initial (unversioned) SPDY sub protocol ([#13394](https://issues.k8s.io/13394), [#13395](https://issues.k8s.io/13395)) | |
| 172 | +| 2 | `v2.channel.k8s.io` | Resolves the issues present in the first version ([#15961](https://github.com/kubernetes/kubernetes/pull/15961)) | |
| 173 | +| 3 | `v3.channel.k8s.io` | Adds support for resizing container terminals ([#25273](https://github.com/kubernetes/kubernetes/pull/25273)) | |
| 174 | +| 4 | `v4.channel.k8s.io` | Adds support for exit codes using JSON errors ([#26541](https://github.com/kubernetes/kubernetes/pull/26541)) | |
| 175 | +| 5 | `v5.channel.k8s.io` | Adds support for a CLOSE signal ([#119157](https://github.com/kubernetes/kubernetes/pull/119157)) | |
| 176 | + |
| 177 | +On top of that, there is an overall effort to replace the SPDY transport |
| 178 | +protocol using WebSockets as part [KEP #4006](https://github.com/kubernetes/enhancements/issues/4006). |
| 179 | +Runtimes have to satisfy those protocols over their life cycle to stay up to |
| 180 | +date with the Kubernetes implementation. |
| 181 | + |
| 182 | +Let's assume that a client uses the latest (`v5`) version of the protocol as |
| 183 | +well as communicating over WebSockets. In that case, the general flow would be: |
| 184 | + |
| 185 | +1. The client requests an URL endpoint for **Exec** or **Attach** using the CRI. |
| 186 | + |
| 187 | + - The server (runtime) validates the request, inserts it into a connection |
| 188 | + tracking cache, and provides the HTTP endpoint URL for that request. |
| 189 | + |
| 190 | +1. The client connects to that URL, upgrades the connection to establish |
| 191 | + a WebSocket, and starts to stream data. |
| 192 | + |
| 193 | + - In the case of **Attach**, the server has to stream the main container process |
| 194 | + data to the client. |
| 195 | + - In the case of **Exec**, the server has to create the subprocess command within |
| 196 | + the container and then streams the output to the client. |
| 197 | + |
| 198 | + If stdin is required, then the server needs to listen for that as well and |
| 199 | + redirect it to the corresponding process. |
| 200 | + |
| 201 | +Interpreting data for the defined protocol is fairly simple: The first |
| 202 | +byte of every input and output packet [defines](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L57-L64) |
| 203 | +the actual stream: |
| 204 | + |
| 205 | +| First Byte | Type | Description | |
| 206 | +| ---------- | --------------- | ---------------------------------------- | |
| 207 | +| `0` | standard input | Data streamed from stdin | |
| 208 | +| `1` | standard output | Data streamed to stdout | |
| 209 | +| `2` | standard error | Data streamed to stderr | |
| 210 | +| `3` | stream error | A streaming error occurred | |
| 211 | +| `4` | stream resize | A terminal resize event | |
| 212 | +| `255` | stream close | Stream should be closed (for WebSockets) | |
| 213 | + |
| 214 | +How should runtimes now implement the streaming server methods for **Exec** and |
| 215 | +**Attach** by using the provided kubelet library? The key is that the streaming |
| 216 | +server implementation in the kubelet [outlines an interface](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming/server.go#L63-L68) |
| 217 | +called `Runtime` which has to be fulfilled by the actual container runtime if it |
| 218 | +wants to use that library: |
| 219 | + |
| 220 | +```go |
| 221 | +// Runtime is the interface to execute the commands and provide the streams. |
| 222 | +type Runtime interface { |
| 223 | + Exec(ctx context.Context, containerID string, cmd []string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error |
| 224 | + Attach(ctx context.Context, containerID string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error |
| 225 | + PortForward(ctx context.Context, podSandboxID string, port int32, stream io.ReadWriteCloser) error |
| 226 | +} |
| 227 | +``` |
| 228 | + |
| 229 | +Everything related to the protocol interpretation is |
| 230 | +already in place and runtimes only have to implement the actual `Exec` and |
| 231 | +`Attach` logic. For example, the container runtime [CRI-O](https://github.com/cri-o/cri-o) |
| 232 | +does it [like this pseudo code](https://github.com/cri-o/cri-o/blob/2a0867/server/container_exec.go#L27-L46): |
| 233 | + |
| 234 | +```go |
| 235 | +func (s StreamService) Exec( |
| 236 | + ctx context.Context, |
| 237 | + containerID string, |
| 238 | + cmd []string, |
| 239 | + stdin io.Reader, stdout, stderr io.WriteCloser, |
| 240 | + tty bool, |
| 241 | + resizeChan <-chan remotecommand.TerminalSize, |
| 242 | +) error { |
| 243 | + // Retrieve the container by the provided containerID |
| 244 | + // … |
| 245 | + |
| 246 | + // Update the container status and verify that the workload is running |
| 247 | + // … |
| 248 | + |
| 249 | + // Execute the command and stream the data |
| 250 | + return s.runtimeServer.Runtime().ExecContainer( |
| 251 | + s.ctx, c, cmd, stdin, stdout, stderr, tty, resizeChan, |
| 252 | + ) |
| 253 | +} |
| 254 | +``` |
| 255 | + |
| 256 | +### PortForward |
| 257 | + |
| 258 | +Forwarding ports to a container works a bit differently when comparing it to |
| 259 | +streaming IO data from a workload. The server still has to provide a URL |
| 260 | +endpoint for the client to connect to, but then the container runtime has to |
| 261 | +enter the network namespace of the container, allocate the port as well as |
| 262 | +stream the data back and forth. There is no simple protocol definition available |
| 263 | +like for **Exec** or **Attach**. This means that the client will stream the |
| 264 | +plain SPDY frames (with or without an additional WebSocket connection) which can |
| 265 | +be interpreted using libraries like [moby/spdystream](https://github.com/moby/spdystream). |
| 266 | + |
| 267 | +Luckily, the kubelet library already provides the `PortForward` interface method |
| 268 | +which has to be implemented by the runtime. [CRI-O does that]() by (simplified): |
| 269 | + |
| 270 | +```go |
| 271 | +func (s StreamService) PortForward( |
| 272 | + ctx context.Context, |
| 273 | + podSandboxID string, |
| 274 | + port int32, |
| 275 | + stream io.ReadWriteCloser, |
| 276 | +) error { |
| 277 | + // Retrieve the pod sandbox by the provided podSandboxID |
| 278 | + sandboxID, err := s.runtimeServer.PodIDIndex().Get(podSandboxID) |
| 279 | + sb := s.runtimeServer.GetSandbox(sandboxID) |
| 280 | + // … |
| 281 | + |
| 282 | + // Get the network namespace path on disk for that sandbox |
| 283 | + netNsPath := sb.NetNsPath() |
| 284 | + // … |
| 285 | + |
| 286 | + // Enter the network namespace and stream the data |
| 287 | + return s.runtimeServer.Runtime().PortForwardContainer( |
| 288 | + ctx, sb.InfraContainer(), netNsPath, port, stream, |
| 289 | + ) |
| 290 | +} |
| 291 | +``` |
| 292 | + |
| 293 | +## Future work |
| 294 | + |
| 295 | +The flexibility Kubernetes provides for the RPCs `Exec`, `Attach` and |
| 296 | +`PortForward` is truly outstanding compared to other methods. Nevertheless, |
| 297 | +container runtimes have to keep up with the latest and greatest implementations |
| 298 | +to support those features in a meaningful way. The general effort to support |
| 299 | +WebSockets is not only a plain Kubernetes thing, it also has to be supported by |
| 300 | +container runtimes as well as clients like `crictl`. |
| 301 | + |
| 302 | +For example, `crictl` v1.30 features a new `--transport` flag for the |
| 303 | +subcommands `exec`, `attach` and `port-forward` |
| 304 | +([#1383](https://github.com/kubernetes-sigs/cri-tools/pull/1383), |
| 305 | +[#1385](https://github.com/kubernetes-sigs/cri-tools/pull/1385)) |
| 306 | +to allow choosing between `websocket` and `spdy`. |
| 307 | + |
| 308 | +CRI-O is going an experimental path by moving the streaming server |
| 309 | +implementation into [conmon-rs](https://github.com/containers/conmon-rs) |
| 310 | +(a substitute for the container monitor [conmon](https://github.com/containers/conmon)). conmon-rs is |
| 311 | +a [Rust](https://www.rust-lang.org) implementation of the original container |
| 312 | +monitor and allows streaming WebSockets directly using supported libraries |
| 313 | +([#2070](https://github.com/containers/conmon-rs/pull/2070)). The major benefit |
| 314 | +of this approach is that CRI-O does not even have to be running while conmon-rs |
| 315 | +can keep active **Exec**, **Attach** and **PortForward** sessions open. The |
| 316 | +simplified flow when using crictl directly will then look like this: |
| 317 | + |
| 318 | +{{< mermaid >}} |
| 319 | +sequenceDiagram |
| 320 | + autonumber |
| 321 | + participant crictl |
| 322 | + participant runtime as Container Runtime |
| 323 | + participant conmon-rs |
| 324 | + Note over crictl,runtime: Container Runtime Interface (CRI) |
| 325 | + crictl->>runtime: Exec, Attach, PortForward |
| 326 | + Note over runtime,conmon-rs: Cap’n Proto |
| 327 | + runtime->>conmon-rs: Serve Exec, Attach, PortForward |
| 328 | + conmon-rs->>runtime: HTTP endpoint (URL) |
| 329 | + runtime->>crictl: Response URL |
| 330 | + crictl-->>conmon-rs: Connection upgrade to WebSocket |
| 331 | + conmon-rs-)crictl: Stream data |
| 332 | +{{< /mermaid >}} |
| 333 | + |
| 334 | +All of those enhancements require iterative design decisions, while the original |
| 335 | +well-conceived implementation acts as the foundation for those. I really hope |
| 336 | +you've enjoyed this compact journey through the history of CRI RPCs. Feel free |
| 337 | +to reach out to me anytime for suggestions or feedback using the |
| 338 | +[official Kubernetes Slack](https://kubernetes.slack.com/team/U53SUDBD4). |
0 commit comments