Skip to content

Commit dfd0880

Browse files
authored
Merge pull request #34940 from adrianreber/2022-07-11-forensic-container-checkpointing-documentation
Add documentation for container checkpointing feature (KEP 2008)
2 parents 735d499 + df55ed5 commit dfd0880

File tree

2 files changed

+99
-0
lines changed

2 files changed

+99
-0
lines changed

content/en/docs/reference/command-line-tools-reference/feature-gates.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ different Kubernetes components.
6464
| `AnyVolumeDataSource` | `false` | Alpha | 1.18 | 1.23 |
6565
| `AnyVolumeDataSource` | `true` | Beta | 1.24 | |
6666
| `AppArmor` | `true` | Beta | 1.4 | |
67+
| `CheckpointContainer` | `false` | Alpha | 1.25 | |
6768
| `CPUManager` | `false` | Alpha | 1.8 | 1.9 |
6869
| `CPUManager` | `true` | Beta | 1.10 | |
6970
| `CPUManagerPolicyAlphaOptions` | `false` | Alpha | 1.23 | |
@@ -663,6 +664,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
663664
flag `--service-account-extend-token-expiration=false`.
664665
Check [Bound Service Account Tokens](https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md)
665666
for more details.
667+
- `CheckpointContainer`: Enables the kubelet `checkpoint` API.
668+
See [Kubelet Checkpoint API](/docs/reference/node/kubelet-checkpoint-api/) for more details.
666669
- `ControllerManagerLeaderMigration`: Enables Leader Migration for
667670
[kube-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#initial-leader-migration-configuration) and
668671
[cloud-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#deploy-cloud-controller-manager)
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
content_type: "reference"
3+
title: Kubelet Checkpoint API
4+
weight: 10
5+
---
6+
7+
8+
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
9+
10+
Checkpointing a container is the functionality to create a stateful copy of a
11+
running container. Once you have a stateful copy of a container, you could
12+
move it to a different computer for debugging or similar purposes.
13+
14+
If you move the checkpointed container data to a computer that's able to restore
15+
it, that restored container continues to run at exactly the same
16+
point it was checkpointed. You can also inspect the saved data, provided that you
17+
have suitable tools for doing so.
18+
19+
Creating a checkpoint of a container might have security implications. Typically
20+
a checkpoint contains all memory pages of all processes in the checkpointed
21+
container. This means that everything that used to be in memory is now available
22+
on the local disk. This includes all private data and possibly keys used for
23+
encryption. The underlying CRI implementations (the container runtime on that node)
24+
should create the checkpoint archive to be only accessible by the `root` user. It
25+
is still important to remember if the checkpoint archive is transferred to another
26+
system all memory pages will be readable by the owner of the checkpoint archive.
27+
28+
## Operations {#operations}
29+
30+
### `post` checkpoint the specified container {#post-checkpoint}
31+
32+
Tell the kubelet to checkpoint a specific container from the specified Pod.
33+
34+
Consult the [Kubelet authentication/authorization reference](/docs/reference/command-line-tools-reference/kubelet-authentication-authorization)
35+
for more information about how access to the kubelet checkpoint interface is
36+
controlled.
37+
38+
The kubelet will request a checkpoint from the underlying
39+
{{<glossary_tooltip term_id="cri" text="CRI">}} implementation. In the checkpoint
40+
request the kubelet will specify the name of the checkpoint archive as
41+
`checkpoint-<podFullName>-<containerName>-<timestamp>.tar` and also request to
42+
store the checkpoint archive in the `checkpoints` directory below its root
43+
directory (as defined by `--root-dir`). This defaults to
44+
`/var/lib/kubelet/checkpoints`.
45+
46+
The checkpoint archive is in _tar_ format, and could be listed using an implementation of
47+
[`tar`](https://pubs.opengroup.org/onlinepubs/7908799/xcu/tar.html). The contents of the
48+
archive depend on the underlying CRI implementation (the container runtime on that node).
49+
50+
#### HTTP Request {#post-checkpoint-request}
51+
52+
POST /checkpoint/{namespace}/{pod}/{container}
53+
54+
#### Parameters {#post-checkpoint-params}
55+
56+
- **namespace** (*in path*): string, required
57+
58+
{{< glossary_tooltip term_id="namespace" >}}
59+
60+
- **pod** (*in path*): string, required
61+
62+
{{< glossary_tooltip term_id="pod" >}}
63+
64+
- **container** (*in path*): string, required
65+
66+
{{< glossary_tooltip term_id="container" >}}
67+
68+
- **timeout** (*in query*): integer
69+
70+
Timeout in seconds to wait until the checkpoint creation is finished.
71+
If zero or no timeout is specfied the default {{<glossary_tooltip
72+
term_id="cri" text="CRI">}} timeout value will be used. Checkpoint
73+
creation time depends directly on the used memory of the container.
74+
The more memory a container uses the more time is required to create
75+
the corresponding checkpoint.
76+
77+
#### Response {#post-checkpoint-response}
78+
79+
200: OK
80+
81+
401: Unauthorized
82+
83+
404: Not Found (if the `CheckpointContainer` feature gate is disabled)
84+
85+
404: Not Found (if the specified `namespace`, `pod` or `container` cannot be found)
86+
87+
500: Internal Server Error (if the CRI implementation encounter an error during checkpointing (see error message for further details))
88+
89+
500: Internal Server Error (if the CRI implementation does not implement the checkpoint CRI API (see error message for further details))
90+
91+
{{< comment >}}
92+
TODO: Add more information about return codes once CRI implementation have checkpoint/restore.
93+
This TODO cannot be fixed before the release, because the CRI implementation need
94+
the Kubernetes changes to be merged to implement the new CheckpointContainer CRI API
95+
call. We need to wait after the 1.25 release to fix this.
96+
{{< /comment >}}

0 commit comments

Comments
 (0)