|
| 1 | +--- |
| 2 | +content_type: "reference" |
| 3 | +title: Kubelet Checkpoint API |
| 4 | +weight: 10 |
| 5 | +--- |
| 6 | + |
| 7 | + |
| 8 | +{{< feature-state for_k8s_version="v1.25" state="alpha" >}} |
| 9 | + |
| 10 | +Checkpointing a container is the functionality to create a stateful copy of a |
| 11 | +running container. Once you have a stateful copy of a container, you could |
| 12 | +move it to a different computer for debugging or similar purposes. |
| 13 | + |
| 14 | +If you move the checkpointed container data to a computer that's able to restore |
| 15 | +it, that restored container continues to run at exactly the same |
| 16 | +point it was checkpointed. You can also inspect the saved data, provided that you |
| 17 | +have suitable tools for doing so. |
| 18 | + |
| 19 | +Creating a checkpoint of a container might have security implications. Typically |
| 20 | +a checkpoint contains all memory pages of all processes in the checkpointed |
| 21 | +container. This means that everything that used to be in memory is now available |
| 22 | +on the local disk. This includes all private data and possibly keys used for |
| 23 | +encryption. The underlying CRI implementations (the container runtime on that node) |
| 24 | +should create the checkpoint archive to be only accessible by the `root` user. It |
| 25 | +is still important to remember if the checkpoint archive is transferred to another |
| 26 | +system all memory pages will be readable by the owner of the checkpoint archive. |
| 27 | + |
| 28 | +## Operations {#operations} |
| 29 | + |
| 30 | +### `post` checkpoint the specified container {#post-checkpoint} |
| 31 | + |
| 32 | +Tell the kubelet to checkpoint a specific container from the specified Pod. |
| 33 | + |
| 34 | +Consult the [Kubelet authentication/authorization reference](/docs/reference/command-line-tools-reference/kubelet-authentication-authorization) |
| 35 | +for more information about how access to the kubelet checkpoint interface is |
| 36 | +controlled. |
| 37 | + |
| 38 | +The kubelet will request a checkpoint from the underlying |
| 39 | +{{<glossary_tooltip term_id="cri" text="CRI">}} implementation. In the checkpoint |
| 40 | +request the kubelet will specify the name of the checkpoint archive as |
| 41 | +`checkpoint-<podFullName>-<containerName>-<timestamp>.tar` and also request to |
| 42 | +store the checkpoint archive in the `checkpoints` directory below its root |
| 43 | +directory (as defined by `--root-dir`). This defaults to |
| 44 | +`/var/lib/kubelet/checkpoints`. |
| 45 | + |
| 46 | +The checkpoint archive is in _tar_ format, and could be listed using an implementation of |
| 47 | +[`tar`](https://pubs.opengroup.org/onlinepubs/7908799/xcu/tar.html). The contents of the |
| 48 | +archive depend on the underlying CRI implementation (the container runtime on that node). |
| 49 | + |
| 50 | +#### HTTP Request {#post-checkpoint-request} |
| 51 | + |
| 52 | +POST /checkpoint/{namespace}/{pod}/{container} |
| 53 | + |
| 54 | +#### Parameters {#post-checkpoint-params} |
| 55 | + |
| 56 | +- **namespace** (*in path*): string, required |
| 57 | + |
| 58 | + {{< glossary_tooltip term_id="namespace" >}} |
| 59 | + |
| 60 | +- **pod** (*in path*): string, required |
| 61 | + |
| 62 | + {{< glossary_tooltip term_id="pod" >}} |
| 63 | + |
| 64 | +- **container** (*in path*): string, required |
| 65 | + |
| 66 | + {{< glossary_tooltip term_id="container" >}} |
| 67 | + |
| 68 | +- **timeout** (*in query*): integer |
| 69 | + |
| 70 | + Timeout in seconds to wait until the checkpoint creation is finished. |
| 71 | + If zero or no timeout is specfied the default {{<glossary_tooltip |
| 72 | + term_id="cri" text="CRI">}} timeout value will be used. Checkpoint |
| 73 | + creation time depends directly on the used memory of the container. |
| 74 | + The more memory a container uses the more time is required to create |
| 75 | + the corresponding checkpoint. |
| 76 | + |
| 77 | +#### Response {#post-checkpoint-response} |
| 78 | + |
| 79 | +200: OK |
| 80 | + |
| 81 | +401: Unauthorized |
| 82 | + |
| 83 | +404: Not Found (if the `CheckpointContainer` feature gate is disabled) |
| 84 | + |
| 85 | +404: Not Found (if the specified `namespace`, `pod` or `container` cannot be found) |
| 86 | + |
| 87 | +500: Internal Server Error (if the CRI implementation encounter an error during checkpointing (see error message for further details)) |
| 88 | + |
| 89 | +500: Internal Server Error (if the CRI implementation does not implement the checkpoint CRI API (see error message for further details)) |
| 90 | + |
| 91 | +{{< comment >}} |
| 92 | +TODO: Add more information about return codes once CRI implementation have checkpoint/restore. |
| 93 | + This TODO cannot be fixed before the release, because the CRI implementation need |
| 94 | + the Kubernetes changes to be merged to implement the new CheckpointContainer CRI API |
| 95 | + call. We need to wait after the 1.25 release to fix this. |
| 96 | +{{< /comment >}} |
0 commit comments