|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Forensic container checkpointing in Kubernetes" |
| 4 | +date: 2022-12-05 |
| 5 | +slug: forensic-container-checkpointing-alpha |
| 6 | +--- |
| 7 | + |
| 8 | +**Authors:** Adrian Reber (Red Hat) |
| 9 | + |
| 10 | +Forensic container checkpointing is based on [Checkpoint/Restore In |
| 11 | +Userspace](https://criu.org/) (CRIU) and allows the creation of stateful copies |
| 12 | +of a running container without the container knowing that it is being |
| 13 | +checkpointed. The copy of the container can be analyzed and restored in a |
| 14 | +sandbox environment multiple times without the original container being aware |
| 15 | +of it. Forensic container checkpointing was introduced as an alpha feature in |
| 16 | +Kubernetes v1.25. |
| 17 | + |
| 18 | +## How does it work? |
| 19 | + |
| 20 | +With the help of CRIU it is possible to checkpoint and restore containers. |
| 21 | +CRIU is integrated in runc, crun, CRI-O and containerd and forensic container |
| 22 | +checkpointing as implemented in Kubernetes uses these existing CRIU |
| 23 | +integrations. |
| 24 | + |
| 25 | +## Why is it important? |
| 26 | + |
| 27 | +With the help of CRIU and the corresponding integrations it is possible to get |
| 28 | +all information and state about a running container on disk for later forensic |
| 29 | +analysis. Forensic analysis might be important to inspect a suspicious |
| 30 | +container without stopping or influencing it. If the container is really under |
| 31 | +attack, the attacker might detect attempts to inspect the container. Taking a |
| 32 | +checkpoint and analysing the container in a sandboxed environment offers the |
| 33 | +possibility to inspect the container without the original container and maybe |
| 34 | +attacker being aware of the inspection. |
| 35 | + |
| 36 | +In addition to the forensic container checkpointing use case, it is also |
| 37 | +possible to migrate a container from one node to another node without loosing |
| 38 | +the internal state. Especially for stateful containers with long initialization |
| 39 | +times restoring from a checkpoint might save time after a reboot or enable much |
| 40 | +faster startup times. |
| 41 | + |
| 42 | +## How do I use container checkpointing? |
| 43 | + |
| 44 | +The feature is behind a [feature gate][container-checkpoint-feature-gate], so |
| 45 | +make sure to enable the `ContainerCheckpoint` gate before you can use the new |
| 46 | +feature. |
| 47 | + |
| 48 | +The runtime must also support container checkpointing: |
| 49 | + |
| 50 | +* containerd: support is currently under discussion. See containerd |
| 51 | + pull request [#6965][containerd-checkpoint-restore-pr] for more details. |
| 52 | + |
| 53 | +* CRI-O: v1.25 has support for forensic container checkpointing. |
| 54 | + |
| 55 | +[containerd-checkpoint-restore-pr]: https://github.com/containerd/containerd/pull/6965 |
| 56 | +[container-checkpoint-feature-gate]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ |
| 57 | + |
| 58 | +### Usage example with CRI-O |
| 59 | + |
| 60 | +To use forensic container checkpointing in combination with CRI-O, the runtime |
| 61 | +needs to be started with the command-line option `--enable-criu-support=true`. |
| 62 | +For Kubernetes, you need to run your cluster with the `ContainerCheckpoint` |
| 63 | +feature gate enabled. As the checkpointing functionality is provided by CRIU it |
| 64 | +is also necessary to install CRIU. Usually runc or crun depend on CRIU and |
| 65 | +therefore it is installed automatically. |
| 66 | + |
| 67 | +It is also important to mention that at the time of writing the checkpointing functionality is |
| 68 | +to be considered as an alpha level feature in CRI-O and Kubernetes and the |
| 69 | +security implications are still under consideration. |
| 70 | + |
| 71 | +Once containers and pods are running it is possible to create a checkpoint. |
| 72 | +[Checkpointing](https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api/) |
| 73 | +is currently only exposed on the **kubelet** level. To checkpoint a container, |
| 74 | +you can run `curl` on the node where that container is running, and trigger a |
| 75 | +checkpoint: |
| 76 | + |
| 77 | +```shell |
| 78 | +curl -X POST "https://localhost:10250/checkpoint/namespace/podId/container" |
| 79 | +``` |
| 80 | + |
| 81 | +For a container named *counter* in a pod named *counters* in a namespace named |
| 82 | +*default* the **kubelet** API endpoint is reachable at: |
| 83 | + |
| 84 | +```shell |
| 85 | +curl -X POST "https://localhost:10250/checkpoint/default/counters/counter" |
| 86 | +``` |
| 87 | + |
| 88 | +For completeness the following `curl` command-line options are necessary to |
| 89 | +have `curl` accept the *kubelet*'s self signed certificate and authorize the |
| 90 | +use of the *kubelet* `checkpoint` API: |
| 91 | + |
| 92 | +```shell |
| 93 | +--insecure --cert /var/run/kubernetes/client-admin.crt --key /var/run/kubernetes/client-admin.key |
| 94 | +``` |
| 95 | + |
| 96 | +Triggering this **kubelet** API will request the creation of a checkpoint from |
| 97 | +CRI-O. CRI-O requests a checkpoint from your low-level runtime (for example, |
| 98 | +`runc`). Seeing that request, `runc` invokes the `criu` tool |
| 99 | +to do the actual checkpointing. |
| 100 | + |
| 101 | +Once the checkpointing has finished the checkpoint should be available at |
| 102 | +`/var/lib/kubelet/checkpoints/checkpoint-<pod-name>_<namespace-name>-<container-name>-<timestamp>.tar` |
| 103 | + |
| 104 | +You could then use that tar archive to restore the container somewhere else. |
| 105 | + |
| 106 | +### Restore a checkpointed container outside of Kubernetes (with CRI-O) {#restore-checkpointed-container-standalone} |
| 107 | + |
| 108 | +With the checkpoint tar archive it is possible to restore the container outside |
| 109 | +of Kubernetes in a sandboxed instance of CRI-O. For better user experience |
| 110 | +during restore, I recommend that you use the latest version of CRI-O from the |
| 111 | +*main* CRI-O GitHub branch. If you're using CRI-O v1.25, you'll need to |
| 112 | +manually create certain directories Kubernetes would create before starting the |
| 113 | +container. |
| 114 | + |
| 115 | +The first step to restore a container outside of Kubernetes is to create a pod sandbox |
| 116 | +using *crictl*: |
| 117 | +```shell |
| 118 | +crictl runp pod-config.json |
| 119 | +``` |
| 120 | + |
| 121 | +Then you can restore the previously checkpointed container into the newly created pod sandbox: |
| 122 | +```shell |
| 123 | +crictl create <POD_ID> container-config.json pod-config.json |
| 124 | +``` |
| 125 | + |
| 126 | +Instead of specifying a container image in a registry in `container-config.json` |
| 127 | +you need to specify the path to the checkpoint archive that you created earlier: |
| 128 | +```json |
| 129 | +{ |
| 130 | + "metadata": { |
| 131 | + "name": "counter" |
| 132 | + }, |
| 133 | + "image":{ |
| 134 | + "image": "/var/lib/kubelet/checkpoints/<checkpoint-archive>.tar" |
| 135 | + } |
| 136 | +} |
| 137 | +``` |
| 138 | + |
| 139 | +Next, run `crictl start <CONTAINER_ID>` to start that container, and then a |
| 140 | +copy of the previously checkpointed container should be running. |
| 141 | + |
| 142 | +### Restore a checkpointed container within of Kubernetes {#restore-checkpointed-container-k8s} |
| 143 | + |
| 144 | +To restore the previously checkpointed container directly in Kubernetes it is |
| 145 | +necessary to convert the checkpoint archive into an image that can be pushed to |
| 146 | +a registry. |
| 147 | + |
| 148 | +One possible way to convert the local checkpoint archive consists of the |
| 149 | +following steps with the help of [buildah](https://buildah.io/): |
| 150 | +```shell |
| 151 | +newcontainer=$(buildah from scratch) |
| 152 | +buildah add $newcontainer /var/lib/kubelet/checkpoints/checkpoint-<pod-name>_<namespace-name>-<container-name>-<timestamp>.tar / |
| 153 | +buildah config --annotation=io.kubernetes.cri-o.annotations.checkpoint.name=<container-name> $newcontainer |
| 154 | +buildah commit $newcontainer checkpoint-image:latest |
| 155 | +buildah rm $newcontainer |
| 156 | +``` |
| 157 | + |
| 158 | +The resulting image is not standardized and only works in combination with |
| 159 | +CRI-O. Please consider this image format as pre-alpha. There are ongoing |
| 160 | +[discussions][image-spec-discussion] to standardize the format of checkpoint |
| 161 | +images like this. Important to remember is that this not yet standardized image |
| 162 | +format only works if CRI-O has been started with `--enable-criu-support=true`. |
| 163 | +The security implications of starting CRI-O with CRIU support are not yet clear |
| 164 | +and therefore the functionality as well as the image format should be used with |
| 165 | +care. |
| 166 | + |
| 167 | +Now, you'll need to push that image to a container image registry. For example: |
| 168 | +```shell |
| 169 | +buildah push localhost/checkpoint-image:latest container-image-registry.example/user/checkpoint-image:latest |
| 170 | +``` |
| 171 | + |
| 172 | +To restore this checkpoint image (`container-image-registry.example/user/checkpoint-image:latest`), the |
| 173 | +image needs to be listed in the specification for a Pod. Here's an example |
| 174 | +manifest: |
| 175 | +```yaml |
| 176 | +apiVersion: v1 |
| 177 | +kind: Pod |
| 178 | +metadata: |
| 179 | + namePrefix: example- |
| 180 | +spec: |
| 181 | + containers: |
| 182 | + - name: <container-name> |
| 183 | + image: container-image-registry.example/user/checkpoint-image:latest |
| 184 | + nodeName: <destination-node> |
| 185 | +``` |
| 186 | +
|
| 187 | +Kubernetes schedules the new Pod onto a node. The kubelet on that node |
| 188 | +instructs the container runtime (CRI-O in this example) to create and start a |
| 189 | +container based on an image specified as `registry/user/checkpoint-image:latest`. |
| 190 | +CRI-O detects that `registry/user/checkpoint-image:latest` |
| 191 | +is a reference to checkpoint data rather than a container image. Then, |
| 192 | +instead of the usual steps to create and start a container, |
| 193 | +CRI-O fetches the checkpoint data and restores the container from that |
| 194 | +specified checkpoint. |
| 195 | + |
| 196 | +The application in that Pod would continue running as if the checkpoint had not been taken; |
| 197 | +within the container, the application looks and behaves like any other container that had been |
| 198 | +started normally and not restored from a checkpoint. |
| 199 | + |
| 200 | +With these steps, it is possible to replace a Pod running on one node |
| 201 | +with a new equivalent Pod that is running on a different node, |
| 202 | +and without losing the state of the containers in that Pod. |
| 203 | + |
| 204 | +[image-spec-discussion]: https://github.com/opencontainers/image-spec/issues/962 |
| 205 | + |
| 206 | +## How do I get involved? |
| 207 | +You can reach SIG Node by several means: |
| 208 | +- Slack: [#sig-node](https://kubernetes.slack.com/messages/sig-node) |
| 209 | +- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node) |
0 commit comments