Skip to content

Commit dc66028

Browse files
authored
Merge pull request kubernetes#2414 from saschagrunert/seccomp-default
Add KEP for enabling seccomp by default
2 parents 926a123 + 34fa3dd commit dc66028

File tree

3 files changed

+441
-0
lines changed

3 files changed

+441
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2413
2+
alpha:
3+
approver: "@deads2k"
Lines changed: 395 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,395 @@
1+
# KEP-2413: Enable seccomp by default
2+
3+
<!-- toc -->
4+
5+
- [Release Signoff Checklist](#release-signoff-checklist)
6+
- [Summary](#summary)
7+
- [Motivation](#motivation)
8+
- [Goals](#goals)
9+
- [Non-Goals](#non-goals)
10+
- [Proposal](#proposal)
11+
- [User Stories](#user-stories)
12+
- [Risks and Mitigations](#risks-and-mitigations)
13+
- [Design Details](#design-details)
14+
- [Test Plan](#test-plan)
15+
- [Graduation Criteria](#graduation-criteria)
16+
- [Alpha](#alpha)
17+
- [Alpha to Beta Graduation](#alpha-to-beta-graduation)
18+
- [Beta to GA Graduation](#beta-to-ga-graduation)
19+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
20+
- [Version Skew Strategy](#version-skew-strategy)
21+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
22+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
23+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
24+
- [Monitoring Requirements](#monitoring-requirements)
25+
- [Dependencies](#dependencies)
26+
- [Scalability](#scalability)
27+
- [Troubleshooting](#troubleshooting)
28+
- [Implementation History](#implementation-history)
29+
- [Alternatives](#alternatives)
30+
- [Alternative 1: Define a new <code>KubernetesDefault</code> profile](#alternative-1-define-a-new--profile)
31+
- [Alternative 2: Allow admins to pick one of <code>KubernetesDefault</code>, <code>RuntimeDefault</code> or a custom profile](#alternative-2-allow-admins-to-pick-one-of---or-a-custom-profile)
32+
<!-- /toc -->
33+
34+
## Release Signoff Checklist
35+
36+
Items marked with (R) are required _prior to targeting to a milestone / release_.
37+
38+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
39+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
40+
- [x] (R) Design details are appropriately documented
41+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
42+
- [x] (R) Graduation criteria is in place
43+
- [ ] (R) Production readiness review completed
44+
- [ ] (R) Production readiness review approved
45+
- [ ] "Implementation History" section is up-to-date for milestone
46+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
47+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
48+
49+
[kubernetes.io]: https://kubernetes.io/
50+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
51+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
52+
[kubernetes/website]: https://git.k8s.io/website
53+
54+
## Summary
55+
56+
Enable seccomp by default for all workloads running on Kubernetes to improve the
57+
default security of the overall system.
58+
59+
## Motivation
60+
61+
Kubernetes provides a native way to specify seccomp profiles for workloads,
62+
which is disabled by default today. Seccomp adds a layer of security that could
63+
help prevent CVEs or 0-days if enabled by default. If we enable seccomp by
64+
default, we make implicitly Kubernetes more secure.
65+
66+
### Goals
67+
68+
Provide a way to enable seccomp by default for Kubernetes.
69+
70+
### Non-Goals
71+
72+
Everything else related to the feature.
73+
74+
## Proposal
75+
76+
We introduce a feature gate that enables a seccomp for all workloads by default.
77+
There are a few options for what should be the default seccomp profile.
78+
79+
The most preferred solution is to promote the `RuntimeDefault` profile
80+
(previously the `runtime/default` annotation) to the new default one.
81+
82+
Container runtimes already have their own defined default profiles, which get
83+
referenced via the `RuntimeDefault` one. This means we now promote this profile
84+
to the new default. Every workload created will then get the `RuntimeDefault`
85+
(`SeccompProfileTypeRuntimeDefault`) as `SeccompProfile.type` value for the
86+
`PodSecurityContext` as well as the `SecurityContext` for every container.
87+
88+
The advantages of using the `RuntimeDefault` profiles are that there is no need
89+
for shipping an additional seccomp profile. The overall version skew handling in
90+
conjunction with runtime versions and the kubelet is easier, because container
91+
runtimes already support `RuntimeDefault` from the introduction of the seccomp
92+
feature.
93+
94+
For alternative proposals please refer to the [Alternatives
95+
section](#alternatives).
96+
97+
### User Stories
98+
99+
As a Kubernetes admin, I want to ensure that my cluster is secure by default
100+
without relying on workloads opting to use seccomp.
101+
102+
### Risks and Mitigations
103+
104+
Some workloads that may be running without seccomp may break with seccomp
105+
enabled by default. All workloads would have to be tested in a staging/test
106+
environment to ensure there are no breakages. Seccomp could either be turned off
107+
or custom profiles could be created for such workloads.
108+
109+
The configuration possibilities of container runtimes differ in conjunction with
110+
seccomp. For example:
111+
112+
- **containerd**
113+
- can only use the internal default profile for `RuntimeDefault`
114+
- can use a different profile for empty (unconfined) workloads via the
115+
`unset_seccomp_profile` option
116+
- **CRI-O**
117+
- can specify a different `RuntimeDefault` profile via the `seccomp_profile`
118+
option
119+
- can use `RuntimeDefault` for empty (unconfined) workloads via
120+
`seccomp_use_default_when_empty`
121+
122+
This can result in a behavioral change when doing cluster upgrades while runtime
123+
administrates may have to take action if they enable the feature.
124+
125+
## Design Details
126+
127+
The feature gate `SeccompDefault` will ensure that the API graduation can be
128+
done in the standard Kubernetes way. The implementation will be mainly a switch
129+
from `Unconfined` to `RuntimeDefault`.
130+
131+
Documentation around the feature will be added to the [k/website seccomp
132+
section](https://kubernetes.io/docs/tutorials/clusters/seccomp).
133+
134+
### Test Plan
135+
136+
There will be unit tests for the feature, whereas the existing seccomp tests can
137+
be extended to cover the new behavior if enabled.
138+
139+
### Graduation Criteria
140+
141+
#### Alpha
142+
143+
- [ ] Implement the new feature gate
144+
- [ ] Ensure proper tests are in place
145+
- [ ] Update documentation to make the feature visible
146+
147+
#### Alpha to Beta Graduation
148+
149+
- [ ] Enable the feature per default
150+
- [ ] No major bugs reported in the previous cycle
151+
152+
#### Beta to GA Graduation
153+
154+
- [ ] Allowing time for feedback (3 releases)
155+
- [ ] Risks have been addressed by every common container runtime
156+
157+
### Upgrade / Downgrade Strategy
158+
159+
It's recommended to test the existing workloads with the `RuntimeDefault`
160+
profile before turning the feature on. Beside that, the feature can be enabled
161+
on a per node basis to reduce the risk of failing production workloads.
162+
163+
### Version Skew Strategy
164+
165+
There is no explicit version skew strategy required because the feature acts as
166+
a toggle switch.
167+
168+
## Production Readiness Review Questionnaire
169+
170+
### Feature Enablement and Rollback
171+
172+
_This section must be completed when targeting alpha to a release._
173+
174+
- **How can this feature be enabled / disabled in a live cluster?**
175+
176+
- [x] Feature gate (also fill in values in `kep.yaml`)
177+
- Feature gate name: `SeccompDefault`
178+
- Components depending on the feature gate: `kubelet`
179+
180+
- **Does enabling the feature change any default behavior?**
181+
182+
Yes, it will change the `Unconfined` seccomp profile to `RuntimeDefault` if
183+
no profile is specified.
184+
185+
- **Can the feature be disabled once it has been enabled (i.e. can we roll back
186+
the enablement)?**
187+
188+
Yes, the feature can be disabled but workloads have to be restarted to apply
189+
the previous behavior.
190+
191+
- **What happens if we reenable the feature if it was previously rolled back?**
192+
193+
It will enable the feature again but only apply the new profile to new/restarted
194+
workloads.
195+
196+
- **Are there any tests for feature enablement/disablement?**
197+
198+
Yes, the behavior can be tested via unit tests.
199+
200+
### Rollout, Upgrade and Rollback Planning
201+
202+
_This section must be completed when targeting beta graduation to a release._
203+
204+
- **How can a rollout fail? Can it impact already running workloads?**
205+
206+
Workloads on a node may starting to fail when (re)scheduled on the node where
207+
the feature is enabled. Required specific syscalls may be blocked by the
208+
default seccomp profile, which will cause the application to get terminated.
209+
210+
- **What specific metrics should inform a rollback?**
211+
212+
If a workload is starting to fail because of blocked syscalls (audit logs),
213+
then a temporarily rollback would be appropriate in production.
214+
215+
- **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
216+
217+
If we assume that enabling the feature will cause workloads to fail, then
218+
there are three possible mitigations available:
219+
220+
1. Disable the feature on the node (downgrade):
221+
permanent mitigation
222+
2. Run the workload as `Unconfined` (the previous default):
223+
re-enabling possible
224+
3. Create a custom seccomp profile for the application (recommended):
225+
re-enabling possible
226+
227+
- **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
228+
fields of API types, flags, etc.?**
229+
230+
No
231+
232+
### Monitoring Requirements
233+
234+
_This section must be completed when targeting beta graduation to a release._
235+
236+
- **How can an operator determine if the feature is in use by workloads?**
237+
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
238+
checking if there are objects with field X set) may be a last resort. Avoid
239+
logs or events for this purpose.
240+
241+
- **What are the SLIs (Service Level Indicators) an operator can use to determine
242+
the health of the service?**
243+
244+
- [ ] Metrics
245+
- Metric name:
246+
- [Optional] Aggregation method:
247+
- Components exposing the metric:
248+
- [ ] Other (treat as last resort)
249+
- Details:
250+
251+
- **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
252+
At a high level, this usually will be in the form of "high percentile of SLI
253+
per day <= X". It's impossible to provide comprehensive guidance, but at the very
254+
high level (needs more precise definitions) those may be things like:
255+
256+
- per-day percentage of API calls finishing with 5XX errors <= 1%
257+
- 99% percentile over day of absolute value from (job creation time minus expected
258+
job creation time) for cron job <= 10%
259+
- 99,9% of /health requests per day finish with 200 code
260+
261+
- **Are there any missing metrics that would be useful to have to improve observability
262+
of this feature?**
263+
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
264+
implementation difficulties, etc.).
265+
266+
### Dependencies
267+
268+
_This section must be completed when targeting beta graduation to a release._
269+
270+
- **Does this feature depend on any specific services running in the cluster?**
271+
Think about both cluster-level services (e.g. metrics-server) as well
272+
as node-level agents (e.g. specific version of CRI). Focus on external or
273+
optional services that are needed. For example, if this feature depends on
274+
a cloud provider API, or upon an external software-defined storage or network
275+
control plane.
276+
277+
For each of these, fill in the following—thinking about running existing user workloads
278+
and creating new ones, as well as about cluster-level services (e.g. DNS):
279+
280+
- [Dependency name]
281+
- Usage description:
282+
- Impact of its outage on the feature:
283+
- Impact of its degraded performance or high-error rates on the feature:
284+
285+
### Scalability
286+
287+
_For alpha, this section is encouraged: reviewers should consider these questions
288+
and attempt to answer them._
289+
290+
_For beta, this section is required: reviewers must answer these questions._
291+
292+
_For GA, this section is required: approvers should be able to confirm the
293+
previous answers based on experience in the field._
294+
295+
- **Will enabling / using this feature result in any new API calls?**
296+
297+
No
298+
299+
- **Will enabling / using this feature result in introducing new API types?**
300+
301+
No
302+
303+
- **Will enabling / using this feature result in any new calls to the cloud
304+
provider?**
305+
306+
No
307+
308+
- **Will enabling / using this feature result in increasing size or count of
309+
the existing API objects?**
310+
311+
No
312+
313+
- **Will enabling / using this feature result in increasing time taken by any
314+
operations covered by [existing SLIs/SLOs]?**
315+
316+
No
317+
318+
- **Will enabling / using this feature result in non-negligible increase of
319+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
320+
321+
Enabling a seccomp profile for a workload will take more time compared to not
322+
applying a profile at all. There is also a very low overhead for checking the
323+
syscalls within the Linux Kernel.
324+
325+
### Troubleshooting
326+
327+
The Troubleshooting section currently serves the `Playbook` role. We may consider
328+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
329+
details). For now, we leave it here.
330+
331+
_This section must be completed when targeting beta graduation to a release._
332+
333+
- **How does this feature react if the API server and/or etcd is unavailable?**
334+
335+
- **What are other known failure modes?**
336+
For each of them, fill in the following information by copying the below template:
337+
338+
- [Failure mode brief description]
339+
- Detection: How can it be detected via metrics? Stated another way:
340+
how can an operator troubleshoot without logging into a master or worker node?
341+
- Mitigations: What can be done to stop the bleeding, especially for already
342+
running user workloads?
343+
- Diagnostics: What are the useful log messages and their required logging
344+
levels that could help debug the issue?
345+
Not required until feature graduated to beta.
346+
- Testing: Are there any tests for failure mode? If not, describe why.
347+
348+
- **What steps should be taken if SLOs are not being met to determine the problem?**
349+
350+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
351+
[existing slis/slos]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
352+
353+
## Implementation History
354+
355+
- 2021-05-05: KEP promoted to implementable
356+
357+
## Alternatives
358+
359+
There are multiple alternatives to the proposed approach.
360+
361+
### Alternative 1: Define a new `KubernetesDefault` profile
362+
363+
Kubernetes ships a default seccomp profile `KubernetesDefault`
364+
(`SeccompProfileTypeKubernetesDefault`), which is the new default
365+
`SeccompProfile.type` value for the `PodSecurityContext` as well as the
366+
`SecurityContext` for every container.
367+
368+
On startup of the kubelet, it will place the default seccomp profile JSON in a
369+
pre-defined path on the host machine. The container runtime has to verify the
370+
existence of this profile and apply it to the container.
371+
372+
We could pass the information where the default profile resides on disk via the
373+
CRI. This way we can change the path from a kubelet perspective. If the field
374+
is empty, then we assume that the kubelet does not support or has disabled the
375+
feature at all. This means we fallback to the currently implemented "unconfined
376+
when not set" behavior.
377+
378+
A possible starting point to defining this profile is to look at docker,
379+
containerd and cri-o default profiles.
380+
381+
The advantages of defining a `KubernetesDefault` profile are:
382+
383+
- The Kubernetes community / SIG Node owns the profile and is able to
384+
improve/change it without depending on multiple container runtimes.
385+
- Increased transparency and a more uniform documentation around the feature.
386+
- Users still can use the `SeccompProfileTypeRuntimeDefault` and will not
387+
encounter any changes to their workloads, even if they turn on the feature.
388+
389+
### Alternative 2: Allow admins to pick one of `KubernetesDefault`, `RuntimeDefault` or a custom profile
390+
391+
This is a combination of alternatives 1 and 2, which allows the highest amount
392+
of flexibility. Users have the chance to either use the `KubernetesDefault`,
393+
`RuntimeDefault` or configure a custom seccomp profile path directly at the
394+
kubelet level. This also implies that the kubelet has to additionally pre-check
395+
if the profile exists and is valid during its startup.

0 commit comments

Comments
 (0)