Skip to content

Commit d68b5e9

Browse files
authored
Merge pull request #40486 from saschagrunert/spo-seccomp-blog
Add blog post about recording seccomp profiles in edge scenarios
2 parents 26f5f6b + af4cec6 commit d68b5e9

File tree

1 file changed

+282
-0
lines changed

1 file changed

+282
-0
lines changed
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
---
2+
layout: blog
3+
title: "Having fun with seccomp profiles on the edge"
4+
date: 2023-05-18
5+
slug: seccomp-profiles-edge
6+
---
7+
8+
**Author**: Sascha Grunert
9+
10+
The [Security Profiles Operator (SPO)][spo] is a feature-rich
11+
[operator][operator] for Kubernetes to make managing seccomp, SELinux and
12+
AppArmor profiles easier than ever. Recording those profiles from scratch is one
13+
of the key features of this operator, which usually involves the integration
14+
into large CI/CD systems. Being able to test the recording capabilities of the
15+
operator in edge cases is one of the recent development efforts of the SPO and
16+
makes it excitingly easy to play around with seccomp profiles.
17+
18+
[spo]: https://github.com/kubernetes-sigs/security-profiles-operator
19+
[operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator
20+
21+
## Recording seccomp profiles with `spoc record`
22+
23+
The [v0.8.0][spo-latest] release of the Security Profiles Operator shipped a new
24+
command line interface called `spoc`, a little helper tool for recording and
25+
replaying seccomp profiles among various other things that are out of scope of
26+
this blog post.
27+
28+
[spo-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/v0.8.0
29+
30+
Recording a seccomp profile requires a binary to be executed, which can be a
31+
simple golang application which just calls [`uname(2)`][uname]:
32+
33+
```go
34+
package main
35+
36+
import (
37+
"syscall"
38+
)
39+
40+
func main() {
41+
utsname := syscall.Utsname{}
42+
if err := syscall.Uname(&utsname); err != nil {
43+
panic(err)
44+
}
45+
}
46+
```
47+
48+
[uname]: https://man7.org/linux/man-pages/man2/uname.2.html
49+
50+
Building a binary from that code can be done by:
51+
52+
```console
53+
> go build -o main main.go
54+
> ldd ./main
55+
not a dynamic executable
56+
```
57+
58+
Now it's possible to download the latest binary of [`spoc` from
59+
GitHub][spoc-latest] and run the application on Linux with it:
60+
61+
[spoc-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/download/v0.8.0/spoc.amd64
62+
63+
```console
64+
> sudo ./spoc record ./main
65+
10:08:25.591945 Loading bpf module
66+
10:08:25.591958 Using system btf file
67+
libbpf: loading object 'recorder.bpf.o' from buffer
68+
69+
libbpf: prog 'sys_enter': relo #3: patched insn #22 (ALU/ALU64) imm 16 -> 16
70+
10:08:25.610767 Getting bpf program sys_enter
71+
10:08:25.610778 Attaching bpf tracepoint
72+
10:08:25.611574 Getting syscalls map
73+
10:08:25.611582 Getting pid_mntns map
74+
10:08:25.613097 Module successfully loaded
75+
10:08:25.613311 Processing events
76+
10:08:25.613693 Running command with PID: 336007
77+
10:08:25.613835 Received event: pid: 336007, mntns: 4026531841
78+
10:08:25.613951 No container ID found for PID (pid=336007, mntns=4026531841, err=unable to find container ID in cgroup path)
79+
10:08:25.614856 Processing recorded data
80+
10:08:25.614975 Found process mntns 4026531841 in bpf map
81+
10:08:25.615110 Got syscalls: read, close, mmap, rt_sigaction, rt_sigprocmask, madvise, nanosleep, clone, uname, sigaltstack, arch_prctl, gettid, futex, sched_getaffinity, exit_group, openat
82+
10:08:25.615195 Adding base syscalls: access, brk, capget, capset, chdir, chmod, chown, close_range, dup2, dup3, epoll_create1, epoll_ctl, epoll_pwait, execve, faccessat2, fchdir, fchmodat, fchown, fchownat, fcntl, fstat, fstatfs, getdents64, getegid, geteuid, getgid, getpid, getppid, getuid, ioctl, keyctl, lseek, mkdirat, mknodat, mount, mprotect, munmap, newfstatat, openat2, pipe2, pivot_root, prctl, pread64, pselect6, readlink, readlinkat, rt_sigreturn, sched_yield, seccomp, set_robust_list, set_tid_address, setgid, setgroups, sethostname, setns, setresgid, setresuid, setsid, setuid, statfs, statx, symlinkat, tgkill, umask, umount2, unlinkat, unshare, write
83+
10:08:25.616293 Wrote seccomp profile to: /tmp/profile.yaml
84+
10:08:25.616298 Unloading bpf module
85+
```
86+
87+
I have to execute `spoc` as root because it will internally run an [ebpf][ebpf]
88+
program by reusing the same code parts from the Security Profiles Operator
89+
itself. I can see that the bpf module got loaded successfully and `spoc`
90+
attached the required tracepoint to it. Then it will track the main application
91+
by using its [mount namespace][mntns] and process the recorded syscall data. The
92+
nature of ebpf programs is that they see the whole context of the Kernel, which
93+
means that `spoc` tracks all syscalls of the system, but does not interfere with
94+
their execution.
95+
96+
[ebpf]: https://ebpf.io
97+
[mntns]: https://man7.org/linux/man-pages/man7/mount_namespaces.7.html
98+
99+
The logs indicate that `spoc` found the syscalls `read`, `close`,
100+
`mmap` and so on, including `uname`. All other syscalls than `uname` are coming
101+
from the golang runtime and its garbage collection, which already adds overhead
102+
to a basic application like in our demo. I can also see from the log line
103+
`Adding base syscalls: …` that `spoc` adds a bunch of base syscalls to the
104+
resulting profile. Those are used by the OCI runtime (like [runc][runc] or
105+
[crun][crun]) in order to be able to run a container. This means that `spoc`
106+
can be used to record seccomp profiles which then can be containerized directly.
107+
This behavior can be disabled in `spoc` by using the `--no-base-syscalls`/`-n`
108+
or customized via the `--base-syscalls`/`-b` command line flags This can be
109+
helpful in cases where different OCI runtimes other than crun and runc are used,
110+
or if I just want to record the seccomp profile for the application and stack
111+
it with another [base profile][base].
112+
113+
[runc]: https://github.com/opencontainers/runc
114+
[crun]: https://github.com/containers/crun
115+
[base]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#base-syscalls-for-a-container-runtime
116+
117+
The resulting profile is now available in `/tmp/profile.yaml`, but the default
118+
location can be changed using the `--output-file value`/`-o` flag:
119+
120+
```console
121+
> cat /tmp/profile.yaml
122+
```
123+
124+
```yaml
125+
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
126+
kind: SeccompProfile
127+
metadata:
128+
creationTimestamp: null
129+
name: main
130+
spec:
131+
architectures:
132+
- SCMP_ARCH_X86_64
133+
defaultAction: SCMP_ACT_ERRNO
134+
syscalls:
135+
- action: SCMP_ACT_ALLOW
136+
names:
137+
- access
138+
- arch_prctl
139+
- brk
140+
-
141+
- uname
142+
-
143+
status: {}
144+
```
145+
146+
The seccomp profile Custom Resource Definition (CRD) can be directly used
147+
together with the Security Profiles Operator for managing it within Kubernetes.
148+
`spoc` is also capable of producing raw seccomp profiles (as JSON), by using the
149+
`--type`/`-t` `raw-seccomp` flag:
150+
151+
```console
152+
> sudo ./spoc record --type raw-seccomp ./main
153+
154+
52.628827 Wrote seccomp profile to: /tmp/profile.json
155+
```
156+
157+
```
158+
> jq . /tmp/profile.json
159+
```
160+
161+
```json
162+
{
163+
"defaultAction": "SCMP_ACT_ERRNO",
164+
"architectures": ["SCMP_ARCH_X86_64"],
165+
"syscalls": [
166+
{
167+
"names": ["access", "…", "write"],
168+
"action": "SCMP_ACT_ALLOW"
169+
}
170+
]
171+
}
172+
```
173+
174+
The utility `spoc record` allows us to record complex seccomp profiles directly
175+
from binary invocations in any Linux system which is capable of running the ebpf
176+
code within the Kernel. But it can do more: How about modifying the seccomp
177+
profile and then testing it by using `spoc run`.
178+
179+
## Running seccomp profiles with `spoc run`
180+
181+
`spoc` is also able to run binaries with applied seccomp profiles, making it
182+
easy to test any modification to it. To do that, just run:
183+
184+
```console
185+
> sudo ./spoc run ./main
186+
10:29:58.153263 Reading file /tmp/profile.yaml
187+
10:29:58.153311 Assuming YAML profile
188+
10:29:58.154138 Setting up seccomp
189+
10:29:58.154178 Load seccomp profile
190+
10:29:58.154189 Starting audit log enricher
191+
10:29:58.154224 Enricher reading from file /var/log/audit/audit.log
192+
10:29:58.155356 Running command with PID: 437880
193+
>
194+
```
195+
196+
It looks like that the application exited successfully, which is anticipated
197+
because I did not modify the previously recorded profile yet. I can also
198+
specify a custom location for the profile by using the `--profile`/`-p` flag,
199+
but this was not necessary because I did not modify the default output location
200+
from the record. `spoc` will automatically determine if it's a raw (JSON) or CRD
201+
(YAML) based seccomp profile and then apply it to the process.
202+
203+
The Security Profiles Operator supports a [log enricher feature][enricher],
204+
which provides additional seccomp related information by parsing the audit logs.
205+
`spoc run` uses the enricher in the same way to provide more data to the end
206+
users when it comes to debugging seccomp profiles.
207+
208+
[enricher]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#using-the-log-enricher
209+
210+
Now I have to modify the profile to see anything valuable in the output. For
211+
example, I could remove the allowed `uname` syscall:
212+
213+
```console
214+
> jq 'del(.syscalls[0].names[] | select(. == "uname"))' /tmp/profile.json > /tmp/no-uname-profile.json
215+
```
216+
217+
And then try to run it again with the new profile `/tmp/no-uname-profile.json`:
218+
219+
```
220+
> sudo ./spoc run -p /tmp/no-uname-profile.json ./main
221+
10:39:12.707798 Reading file /tmp/no-uname-profile.json
222+
10:39:12.707892 Setting up seccomp
223+
10:39:12.707920 Load seccomp profile
224+
10:39:12.707982 Starting audit log enricher
225+
10:39:12.707998 Enricher reading from file /var/log/audit/audit.log
226+
10:39:12.709164 Running command with PID: 480512
227+
panic: operation not permitted
228+
229+
goroutine 1 [running]:
230+
main.main()
231+
/path/to/main.go:10 +0x85
232+
10:39:12.713035 Unable to run: launch runner: wait for command: exit status 2
233+
```
234+
235+
Alright, that was expected! The applied seccomp profile blocks the `uname`
236+
syscall, which results in an "operation not permitted" error. This error is
237+
pretty generic and does not provide any hint on what got blocked by seccomp.
238+
It is generally extremely difficult to predict how applications behave if single
239+
syscalls are forbidden by seccomp. It could be possible that the application
240+
terminates like in our simple demo, but it could also lead to a strange
241+
misbehavior and the application does not stop at all.
242+
243+
If I now change the default seccomp action of the profile from `SCMP_ACT_ERRNO`
244+
to `SCMP_ACT_LOG` like this:
245+
246+
```console
247+
> jq '.defaultAction = "SCMP_ACT_LOG"' /tmp/no-uname-profile.json > /tmp/no-uname-profile-log.json
248+
```
249+
250+
Then the log enricher will give us a hint that the `uname` syscall got blocked
251+
when using `spoc run`:
252+
253+
```
254+
> sudo ./spoc run -p /tmp/no-uname-profile-log.json ./main
255+
10:48:07.470126 Reading file /tmp/no-uname-profile-log.json
256+
10:48:07.470234 Setting up seccomp
257+
10:48:07.470245 Load seccomp profile
258+
10:48:07.470302 Starting audit log enricher
259+
10:48:07.470339 Enricher reading from file /var/log/audit/audit.log
260+
10:48:07.470889 Running command with PID: 522268
261+
10:48:07.472007 Seccomp: uname (63)
262+
```
263+
264+
The application will not terminate any more, but seccomp will log the behavior
265+
to `/var/log/audit/audit.log` and `spoc` will parse the data to correlate it
266+
directly to our program. Generating the log messages to the audit subsystem
267+
comes with a large performance overhead and should be handled with care in
268+
production systems. It also comes with a security risk when running untrusted
269+
apps in audit mode in production environments.
270+
271+
This demo should give you an impression how to debug seccomp profile issues with
272+
applications, probably by using our shiny new helper tool powered by the
273+
features of the Security Profiles Operator. `spoc` is a flexible and portable
274+
binary suitable for edge cases where resources are limited and even Kubernetes
275+
itself may not be available with its full capabilities.
276+
277+
Thank you for reading this blog post! If you're interested in more, providing
278+
feedback or asking for help, then feel free to get in touch with us directly via
279+
[Slack (#security-profiles-operator)][slack] or the [mailing list][mail].
280+
281+
[slack]: https://kubernetes.slack.com/messages/security-profiles-operator
282+
[mail]: https://groups.google.com/forum/#!forum/kubernetes-dev

0 commit comments

Comments
 (0)