Skip to content

Commit 891ba31

Browse files
committed
Add Forensic Container Checkpointing KEP
Signed-off-by: Adrian Reber <[email protected]>
1 parent d9ae32f commit 891ba31

File tree

3 files changed

+353
-0
lines changed

3 files changed

+353
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2008
2+
alpha:
3+
approver: "@ehashman"
Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
# KEP-2008: Forensic Container Checkpointing
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Implementation](#implementation)
11+
- [User Stories](#user-stories)
12+
- [Risks and Mitigations](#risks-and-mitigations)
13+
- [Design Details](#design-details)
14+
- [Future Enhancements](#future-enhancements)
15+
- [Test Plan](#test-plan)
16+
- [Graduation Criteria](#graduation-criteria)
17+
- [Alpha](#alpha)
18+
- [Alpha to Beta Graduation](#alpha-to-beta-graduation)
19+
- [Beta to GA Graduation](#beta-to-ga-graduation)
20+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
21+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
22+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
23+
- [Dependencies](#dependencies)
24+
- [Scalability](#scalability)
25+
- [Implementation History](#implementation-history)
26+
- [Drawbacks](#drawbacks)
27+
- [Alternatives](#alternatives)
28+
<!-- /toc -->
29+
30+
## Release Signoff Checklist
31+
32+
Items marked with (R) are required *prior to targeting to a milestone / release*.
33+
34+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
35+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
36+
- [ ] (R) Design details are appropriately documented
37+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
38+
- [ ] (R) Graduation criteria is in place
39+
- [ ] (R) Production readiness review completed
40+
- [ ] Production readiness review approved
41+
- [ ] "Implementation History" section is up-to-date for milestone
42+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
43+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
44+
45+
[kubernetes.io]: https://kubernetes.io/
46+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
47+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
48+
[kubernetes/website]: https://git.k8s.io/website
49+
50+
## Summary
51+
52+
Provide an interface to trigger a container checkpoint for forensic analysis.
53+
54+
## Motivation
55+
56+
Container checkpointing provides the functionality to take a snapshot of a
57+
running container. The checkpointed container can be transferred to another
58+
node and the original container will never know that it was checkpointed.
59+
60+
Restoring the container in a sandboxed environment provides a mean to
61+
forensically analyse a copy of the container to understand if it might
62+
have been a possible threat. As the analysis is happening on a copy of
63+
the original container a possible attacker of the original container
64+
will not be aware of any sandboxed analysis.
65+
66+
### Goals
67+
68+
The goal of this KEP is to introduce *checkpoint* and *restore* to the CRI API.
69+
This includes extending the *kubelet* API to support checkpointing single
70+
containers with the forensic use case in mind.
71+
72+
### Non-Goals
73+
74+
Although *checkpoint* and *restore* can be used to implement container
75+
migration this KEP is only about enabling the forensic use case. Checkpointing
76+
a pod is not part of this proposal and left for future enhancements.
77+
78+
## Proposal
79+
80+
### Implementation
81+
82+
For the forensic use case we want to offer the functionality to checkpoint a
83+
container out of a running Pod without stopping the checkpointed container or
84+
letting the container know that it was checkpointed.
85+
86+
The corresponding code changes for the forensic use case can be found in the
87+
following pull request:
88+
89+
* https://github.com/kubernetes/kubernetes/pull/104907
90+
91+
The goal is to introduce *checkpoint* and *restore* in a bottom-up approach.
92+
In a first step we only want to extend the CRI API to trigger a checkpoint
93+
by the container engine and to have the low level primitives in the *kubelet*
94+
to trigger a checkpoint. It is necessary to enable the feature gate
95+
`ContainerCheckpoint` to be able to checkpoint containers.
96+
97+
In the corresponding pull request a checkpoint is triggered using the *kubelet*
98+
API:
99+
100+
```
101+
curl -skv -X POST "https://localhost:10250/checkpoint/default/counters/wildfly"
102+
```
103+
104+
For the first implementation we do not want to support restore in the
105+
*kubelet*. With the focus on the forensic use case the restore should happen
106+
outside of Kubernetes. The restore is a container engine only operation
107+
in this first step.
108+
109+
The forensic use case is targeted to be part of the next (1.24) release.
110+
111+
Although this KEP only adds checkpointing support to the kubelet the CRI API in
112+
the corresponding code pull request is extended to support *checkpoint* and
113+
*restore* in the CRI API. The reason to add *restore* to the CRI API without
114+
implementing it in the kubelet is to make development and especially testing
115+
easier on the container engine level.
116+
117+
### User Stories
118+
119+
To analyze unusual activities in a container, the container should
120+
be checkpointed without stopping the container or without the container
121+
knowing it was checkpointed. Using checkpointing it is possible to take
122+
a copy of a running container for forensic analysis. The container will
123+
continue to run without knowing a copy was created. This copy can then
124+
be restored in another (sandboxed) environment in the context of another
125+
container engine for detailed analysis of a possible attack.
126+
127+
### Risks and Mitigations
128+
129+
In its first implementation the risks are low as it tries to be a CRI API
130+
change with minimal changes to the kubelet and it is gated by the feature
131+
gate `ContainerCheckpoint`.
132+
133+
## Design Details
134+
135+
The feature gate `ContainerCheckpoint` will ensure that the API
136+
graduation can be done in the standard Kubernetes way.
137+
138+
A kubelet API to trigger the checkpointing of a container will be
139+
introduced as described in [Implementation](#implementation).
140+
141+
Also see https://github.com/kubernetes/kubernetes/pull/104907 for details.
142+
143+
### Future Enhancements
144+
145+
The initial implementation is only about checkpointing specific containers
146+
out of a pod. In future versions we probably want to support checkpointing
147+
complete pods. To checkpoint a complete pod the expectation on the container
148+
engine would be to do a pod level cgroup freeze before checkpointing the
149+
containers in the pod to ensure that all containers are checkpointed at the
150+
same point in time and that the containers do not keep running while other
151+
containers in the pod are checkpointed.
152+
153+
One possible result of being able to checkpoint and restore containers and pods
154+
might be the possibility to migrate containers and pods in the future as
155+
discussed in [#3949](https://github.com/kubernetes/kubernetes/issues/3949).
156+
157+
### Test Plan
158+
159+
For alpha:
160+
- Unit tests available
161+
162+
For beta:
163+
- CRI API changes need to be implemented by at least one
164+
container engine
165+
- Enable e2e testing
166+
167+
### Graduation Criteria
168+
169+
#### Alpha
170+
171+
- [ ] Implement the new feature gate and kubelet implementation
172+
- [ ] Ensure proper tests are in place
173+
- [ ] Update documentation to make the feature visible
174+
175+
#### Alpha to Beta Graduation
176+
177+
At least one container engine has to have implemented the
178+
corresponding CRI APIs to introduce e2e test for checkpointing.
179+
180+
- [ ] Enable the feature per default
181+
- [ ] No major bugs reported in the previous cycle
182+
183+
#### Beta to GA Graduation
184+
185+
TBD
186+
187+
### Upgrade / Downgrade Strategy
188+
189+
No changes are required on upgrade if the container engine supports
190+
the corresponding CRI API changes.
191+
192+
## Production Readiness Review Questionnaire
193+
194+
### Feature Enablement and Rollback
195+
196+
###### How can this feature be enabled / disabled in a live cluster?
197+
198+
- [x] Feature gate
199+
- Feature gate name: `ContainerCheckpoint`
200+
201+
###### Does enabling the feature change any default behavior?
202+
203+
No.
204+
205+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
206+
207+
Yes. By disabling the feature gate `ContainerCheckpoint` again.
208+
209+
###### What happens if we reenable the feature if it was previously rolled back?
210+
211+
Checkpointing containers will be possible again.
212+
213+
###### Are there any tests for feature enablement/disablement?
214+
215+
Currently no.
216+
217+
### Dependencies
218+
219+
CRIU needs to be installed on the node, but on most distributions it is already
220+
a dependency of runc/crun. It does not require any specific services on the
221+
cluster.
222+
223+
### Scalability
224+
225+
###### Will enabling / using this feature result in any new API calls?
226+
227+
The newly introduced CRI API call to checkpoint a container/pod will be
228+
used by this feature. The kubelet will make the CRI API calls and it
229+
will only be done when a checkpoint is triggered. No periodic API calls
230+
will happen.
231+
232+
###### Will enabling / using this feature result in introducing new API types?
233+
234+
No.
235+
236+
###### Will enabling / using this feature result in any new calls to the cloud provider?
237+
238+
No.
239+
240+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
241+
242+
No.
243+
244+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
245+
246+
No. It will only affect checkpoint CRI API calls.
247+
248+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
249+
250+
During checkpointing each memory page will be written to disk. Disk usage will increase by
251+
the size of all memory pages in the checkpointed container. Each file in the container that
252+
has been changed compared to the original version will also be part of the checkpoint.
253+
Disk usage will overall increase by the used memory of the container and the changed files.
254+
Checkpoint archive written to disk can optionally be compressed. The current implementation
255+
does not compress the checkpoint archive on disk.
256+
257+
## Implementation History
258+
259+
* 2020-09-16: Initial version of this KEP
260+
* 2020-12-10: Opened pull request showing an end-to-end implementation of a possible use case
261+
* 2021-02-12: Changed KEP to mention the *experimental* API as suggested in the SIG Node meeting 2021-02-09
262+
* 2021-04-08: Added section about Pod Lifecycle, Checkpoint Storage, Alternatives and Hooks
263+
* 2021-07-08: Reworked structure and added missing details
264+
* 2021-08-03: Added the forensic user story and highlight the goal to implement it in small steps
265+
* 2021-08-10: Added future work with information about pod level cgroup freezing
266+
* 2021-09-15: Removed references to first proof of concept implementation
267+
* 2021-09-21: Mention feature gate `ContainerCheckpointRestore`
268+
* 2021-09-22: Removed everything which is not directly related to the forensic use case
269+
* 2022-01-06: Reworked based on review
270+
* 2022-01-20: Reworked based on review and renamed feature gate to `ContainerCheckpoint`
271+
272+
## Drawbacks
273+
274+
During checkpointing each memory page of the checkpointed container is written to disk
275+
which can result in slightly lower performance because each memory page is copied
276+
to disk. It can also result in increased disk IO operations during checkpoint
277+
creation.
278+
279+
In the current CRI-O implementation the checkpoint archive is created so that only
280+
the `root` user can access it. As the checkpoint archive contains all memory pages
281+
a checkpoint archive can potentially contain secrets which are expected to be
282+
in memory only.
283+
284+
The current CRI-O implementations handles SELinux labels as well as seccomp and restores
285+
these setting as they were before. A possibly restored container is as secure as
286+
before, but it is important to be careful where the checkpoint archive is stored.
287+
288+
During checkpointing CRIU injects parasite code into the to be checkpointed process.
289+
On a SELinux enabled system the access to the parasite code is limited to the
290+
label of corresponding container. On a non SELinux system it is limited to the
291+
`root` user (which can access the process in any way).
292+
293+
## Alternatives
294+
295+
Another possibility to use checkpoint restore would be, for example, to trigger
296+
the checkpoint by a privileged sidecar container (`CAP_SYS_ADMIN`) and do the
297+
restore through an Init container.
298+
299+
The reason to integrate checkpoint restore directly into Kubernetes and not
300+
with helpers like sidecar and init containers is that checkpointing is already,
301+
for many years, deeply integrated into multiple container runtimes and engines
302+
and this integration has been reliable and well tested. Going another way in
303+
Kubernetes would make the whole process much more complicated and fragile. Not
304+
using checkpoint and restore in Kubernetes through the existing paths of
305+
runtimes and engines is not well known and maybe not even possible as
306+
checkpointing and restoring is tightly integrated as it requires much
307+
information only available by working closely with runtimes and engines.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
title: Forensic Container Checkpointing
2+
kep-number: 2008
3+
authors:
4+
- "@adrianreber"
5+
owning-sig: sig-node
6+
participating-sigs:
7+
- TBD
8+
status: implementable
9+
creation-date: 2020-09-16
10+
last-updated: 2022-01-20
11+
reviewers:
12+
- "@mrunalp"
13+
- "@elfinhe"
14+
approvers:
15+
- "@dchen1107"
16+
prr-approvers:
17+
- "@ehashman"
18+
19+
# The target maturity stage in the current dev cycle for this KEP.
20+
stage: alpha
21+
22+
# The most recent milestone for which work toward delivery of this KEP has been
23+
# done. This can be the current (upcoming) milestone, if it is being actively
24+
# worked on.
25+
latest-milestone: "v1.24"
26+
27+
# The milestone at which this feature was, or is targeted to be, at each stage.
28+
milestone:
29+
alpha: "v1.24"
30+
beta: "v1.25"
31+
stable: "v1.27"
32+
33+
# The following PRR answers are required at alpha release
34+
# List the feature gate name and the components for which it must be enabled
35+
feature-gates:
36+
- name: ContainerCheckpoint
37+
components:
38+
- kubelet
39+
disable-supported: true
40+
41+
# The following PRR answers are required at beta release
42+
metrics:
43+
- "N/A"

0 commit comments

Comments
 (0)