Skip to content

Commit 157f1d7

Browse files
authored
Merge pull request #28951 from saschagrunert/seccomp-default-blog
Add seccomp default feature blog post
2 parents 6e7b621 + 84e472e commit 157f1d7

File tree

1 file changed

+267
-0
lines changed

1 file changed

+267
-0
lines changed
Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
---
2+
layout: blog
3+
title: "Enable seccomp for all workloads with a new v1.22 alpha feature"
4+
date: 2021-08-25
5+
slug: seccomp-default
6+
---
7+
8+
**Author:** Sascha Grunert, Red Hat
9+
10+
This blog post is about a new Kubernetes feature introduced in v1.22, which adds
11+
an additional security layer on top of the existing seccomp support. Seccomp is
12+
a security mechanism for Linux processes to filter system calls (syscalls) based
13+
on a set of defined rules. Applying seccomp profiles to containerized workloads
14+
is one of the key tasks when it comes to enhancing the security of the
15+
application deployment. Developers, site reliability engineers and
16+
infrastructure administrators have to work hand in hand to create, distribute
17+
and maintain the profiles over the applications life-cycle.
18+
19+
You can use the [`securityContext`][seccontext] field of Pods and their
20+
containers can be used to adjust security related configurations of the
21+
workload. Kubernetes introduced dedicated [seccomp related API
22+
fields][seccontext] in this `SecurityContext` with the [graduation of seccomp to
23+
General Availability (GA)][ga] in v1.19.0. This enhancement allowed an easier
24+
way to specify if the whole pod or a specific container should run as:
25+
26+
[seccontext]: /docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1
27+
[ga]: https://kubernetes.io/blog/2020/08/26/kubernetes-release-1.19-accentuate-the-paw-sitive/#graduated-to-stable
28+
29+
- `Unconfined`: seccomp will not be enabled
30+
- `RuntimeDefault`: the container runtimes default profile will be used
31+
- `Localhost`: a node local profile will be applied, which is being referenced
32+
by a relative path to the seccomp profile root (`<kubelet-root-dir>/seccomp`)
33+
of the kubelet
34+
35+
With the graduation of seccomp, nothing has changed from an overall security
36+
perspective, because `Unconfined` is still the default. This is totally fine if
37+
you consider this from the upgrade path and backwards compatibility perspective of
38+
Kubernetes releases. But it also means that it is more likely that a workload
39+
runs without seccomp at all, which should be fixed in the long term.
40+
41+
## `SeccompDefault` to the rescue
42+
43+
Kubernetes v1.22.0 introduces a new kubelet [feature gate][gate]
44+
`SeccompDefault`, which has been added in `alpha` state as every other new
45+
feature. This means that it is disabled by default and can be enabled manually
46+
for every single Kubernetes node.
47+
48+
[gate]: /docs/reference/command-line-tools-reference/feature-gates
49+
50+
What does the feature do? Well, it just changes the default seccomp profile from
51+
`Unconfined` to `RuntimeDefault`. If not specified differently in the pod
52+
manifest, then the feature will add a higher set of security constraints by
53+
using the default profile of the container runtime. These profiles may differ
54+
between runtimes like [CRI-O][crio] or [containerd][ctrd]. They also differ for
55+
its used hardware architectures. But generally speaking, those default profiles
56+
allow a common amount of syscalls while blocking the more dangerous ones, which
57+
are unlikely or unsafe to be used in a containerized application.
58+
59+
[crio]: https://github.com/cri-o/cri-o/blob/fe30d62/vendor/github.com/containers/common/pkg/seccomp/default_linux.go#L45
60+
[ctrd]: https://github.com/containerd/containerd/blob/e1445df/contrib/seccomp/seccomp_default.go#L51
61+
62+
### Enabling the feature
63+
64+
Two kubelet configuration changes have to be made to enable the feature:
65+
66+
1. **Enable the feature** gate by setting the `SeccompDefault=true` via the command
67+
line (`--feature-gates`) or the [kubelet configuration][kubelet] file.
68+
2. **Turn on the feature** by enabling the feature by adding the
69+
`--seccomp-default` command line flag or via the [kubelet
70+
configuration][kubelet] file (`seccompDefault: true`).
71+
72+
[kubelet]: /docs/tasks/administer-cluster/kubelet-config-file
73+
74+
The kubelet will error on startup if only one of the above steps have been done.
75+
76+
### Trying it out
77+
78+
If the feature is enabled on a node, then you can create a new workload like
79+
this:
80+
81+
```yaml
82+
apiVersion: v1
83+
kind: Pod
84+
metadata:
85+
name: test-pod
86+
spec:
87+
containers:
88+
- name: test-container
89+
image: nginx:1.21
90+
```
91+
92+
Now it is possible to inspect the used seccomp profile by using
93+
[`crictl`][crictl] while investigating the containers [runtime
94+
specification][rspec]:
95+
96+
[crictl]: https://github.com/kubernetes-sigs/cri-tools
97+
[rspec]: https://github.com/opencontainers/runtime-spec/blob/0c021c1/config-linux.md#seccomp
98+
99+
```bash
100+
CONTAINER_ID=$(sudo crictl ps -q --name=test-container)
101+
sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp
102+
```
103+
104+
```yaml
105+
{
106+
"defaultAction": "SCMP_ACT_ERRNO",
107+
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
108+
"syscalls": [
109+
{
110+
"names": ["_llseek", "_newselect", "accept", …, "write", "writev"],
111+
"action": "SCMP_ACT_ALLOW"
112+
},
113+
114+
]
115+
}
116+
```
117+
118+
You can see that the lower level container runtime ([CRI-O][crio-home] and
119+
[runc][runc] in our case), successfully applied the default seccomp profile.
120+
This profile denies all syscalls per default, while allowing commonly used ones
121+
like [`accept`][accept] or [`write`][write].
122+
123+
[crio-home]: https://github.com/cri-o/cri-o
124+
[runc]: https://github.com/opencontainers/runc
125+
[accept]: https://man7.org/linux/man-pages/man2/accept.2.html
126+
[write]: https://man7.org/linux/man-pages/man2/write.2.html
127+
128+
Please note that the feature will not influence any Kubernetes API for now.
129+
Therefore, it is not possible to retrieve the used seccomp profile via `kubectl`
130+
`get` or `describe` if the [`SeccompProfile`][api] field is unset within the
131+
`SecurityContext`.
132+
133+
[api]: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1
134+
135+
The feature also works when using multiple containers within a pod, for example
136+
if you create a pod like this:
137+
138+
```yaml
139+
apiVersion: v1
140+
kind: Pod
141+
metadata:
142+
name: test-pod
143+
spec:
144+
containers:
145+
- name: test-container-nginx
146+
image: nginx:1.21
147+
securityContext:
148+
seccompProfile:
149+
type: Unconfined
150+
- name: test-container-redis
151+
image: redis:6.2
152+
```
153+
154+
then you should see that the `test-container-nginx` runs without a seccomp profile:
155+
156+
```bash
157+
sudo crictl inspect $(sudo crictl ps -q --name=test-container-nginx) |
158+
jq '.info.runtimeSpec.linux.seccomp == null'
159+
true
160+
```
161+
162+
Whereas the container `test-container-redis` runs with `RuntimeDefault`:
163+
164+
```bash
165+
sudo crictl inspect $(sudo crictl ps -q --name=test-container-redis) |
166+
jq '.info.runtimeSpec.linux.seccomp != null'
167+
true
168+
```
169+
170+
The same applies to the pod itself, which also runs with the default profile:
171+
172+
```bash
173+
sudo crictl inspectp (sudo crictl pods -q --name test-pod) |
174+
jq '.info.runtimeSpec.linux.seccomp != null'
175+
true
176+
```
177+
178+
### Upgrade strategy
179+
180+
It is recommended to enable the feature in multiple steps, whereas different
181+
risks and mitigations exist for each one.
182+
183+
#### Feature gate enabling
184+
185+
Enabling the feature gate at the kubelet level will not turn on the feature, but
186+
will make it possible by using the `SeccompDefault` kubelet configuration or the
187+
`--seccomp-default` CLI flag. This can be done by an administrator for the whole
188+
cluster or only a set of nodes.
189+
190+
#### Testing the Application
191+
192+
If you're trying this within a dedicated test environment, you have to ensure
193+
that the application code does not trigger syscalls blocked by the
194+
`RuntimeDefault` profile before enabling the feature on a node. This can be done
195+
by:
196+
197+
- _Recommended_: Analyzing the code (manually or by running the application with
198+
[strace][strace]) for any executed syscalls which may be blocked by the
199+
default profiles. If that's the case, then you can override the default by
200+
explicitly setting the pod or container to run as `Unconfined`. Alternatively,
201+
you can create a custom seccomp profile (see optional step below).
202+
profile based on the default by adding the additional syscalls to the
203+
`"action": "SCMP_ACT_ALLOW"` section.
204+
205+
- _Recommended_: Manually set the profile to the target workload and use a
206+
rolling upgrade to deploy into production. Rollback the deployment if the
207+
application does not work as intended.
208+
209+
- _Optional_: Run the application against an end-to-end test suite to trigger
210+
all relevant code paths with `RuntimeDefault` enabled. If a test fails, use
211+
the same mitigation as mentioned above.
212+
213+
- _Optional_: Create a custom seccomp profile based on the default and change
214+
its default action from `SCMP_ACT_ERRNO` to `SCMP_ACT_LOG`. This means that
215+
the seccomp filter for unknown syscalls will have no effect on the application
216+
at all, but the system logs will now indicate which syscalls may be blocked.
217+
This requires at least a Kernel version 4.14 as well as a recent [runc][runc]
218+
release. Monitor the application hosts audit logs (defaults to
219+
`/var/log/audit/audit.log`) or syslog entries (defaults to `/var/log/syslog`)
220+
for syscalls via `type=SECCOMP` (for audit) or `type=1326` (for syslog).
221+
Compare the syscall ID with those [listed in the Linux Kernel
222+
sources][syscalls] and add them to the custom profile. Be aware that custom
223+
audit policies may lead into missing syscalls, depending on the configuration
224+
of auditd.
225+
226+
- _Optional_: Use cluster additions like the [Security Profiles Operator][spo]
227+
for profiling the application via its [log enrichment][logs] capabilities or
228+
recording a profile by using its [recording feature][rec]. This makes the
229+
above mentioned manual log investigation obsolete.
230+
231+
[syscalls]: https://github.com/torvalds/linux/blob/7bb7f2a/arch/x86/entry/syscalls/syscall_64.tbl
232+
[spo]: https://github.com/kubernetes-sigs/security-profiles-operator
233+
[logs]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/c90ef3a/installation-usage.md#record-profiles-from-workloads-with-profilerecordings
234+
[rec]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/c90ef3a/installation-usage.md#using-the-log-enricher
235+
[strace]: https://man7.org/linux/man-pages/man1/strace.1.html
236+
237+
#### Deploying the modified application
238+
239+
Based on the outcome of the application tests, it may be required to change the
240+
application deployment by either specifying `Unconfined` or a custom seccomp
241+
profile. This is not the case if the application works as intended with
242+
`RuntimeDefault`.
243+
244+
#### Enable the kubelet configuration
245+
246+
If everything went well, then the feature is ready to be enabled by the kubelet
247+
configuration or its corresponding CLI flag. This should be done on a per-node
248+
basis to reduce the overall risk of missing a syscall during the investigations
249+
when running the application tests. If it's possible to monitor audit logs
250+
within the cluster, then it's recommended to do this for eventually missed
251+
seccomp events. If the application works as intended then the feature can be
252+
enabled for further nodes within the cluster.
253+
254+
## Conclusion
255+
256+
Thank you for reading this blog post! I hope you enjoyed to see how the usage of
257+
seccomp profiles has been evolved in Kubernetes over the past releases as much
258+
as I do. On your own cluster, change the default seccomp profile to
259+
`RuntimeDefault` (using this new feature) and see the security benefits, and, of
260+
course, feel free to reach out any time for feedback or questions.
261+
262+
---
263+
264+
_Editor's note: If you have any questions or feedback about this blog post, feel
265+
free to reach out via the [Kubernetes slack in #sig-node][slack]._
266+
267+
[slack]: https://kubernetes.slack.com/messages/sig-node

0 commit comments

Comments
 (0)