|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Having fun with seccomp profiles on the edge" |
| 4 | +date: 2023-05-18 |
| 5 | +slug: seccomp-profiles-edge |
| 6 | +--- |
| 7 | + |
| 8 | +**Author**: Sascha Grunert |
| 9 | + |
| 10 | +The [Security Profiles Operator (SPO)][spo] is a feature-rich |
| 11 | +[operator][operator] for Kubernetes to make managing seccomp, SELinux and |
| 12 | +AppArmor profiles easier than ever. Recording those profiles from scratch is one |
| 13 | +of the key features of this operator, which usually involves the integration |
| 14 | +into large CI/CD systems. Being able to test the recording capabilities of the |
| 15 | +operator in edge cases is one of the recent development efforts of the SPO and |
| 16 | +makes it excitingly easy to play around with seccomp profiles. |
| 17 | + |
| 18 | +[spo]: https://github.com/kubernetes-sigs/security-profiles-operator |
| 19 | +[operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator |
| 20 | + |
| 21 | +## Recording seccomp profiles with `spoc record` |
| 22 | + |
| 23 | +The [v0.8.0][spo-latest] release of the Security Profiles Operator shipped a new |
| 24 | +command line interface called `spoc`, a little helper tool for recording and |
| 25 | +replaying seccomp profiles among various other things that are out of scope of |
| 26 | +this blog post. |
| 27 | + |
| 28 | +[spo-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/v0.8.0 |
| 29 | + |
| 30 | +Recording a seccomp profile requires a binary to be executed, which can be a |
| 31 | +simple golang application which just calls [`uname(2)`][uname]: |
| 32 | + |
| 33 | +```go |
| 34 | +package main |
| 35 | + |
| 36 | +import ( |
| 37 | + "syscall" |
| 38 | +) |
| 39 | + |
| 40 | +func main() { |
| 41 | + utsname := syscall.Utsname{} |
| 42 | + if err := syscall.Uname(&utsname); err != nil { |
| 43 | + panic(err) |
| 44 | + } |
| 45 | +} |
| 46 | +``` |
| 47 | + |
| 48 | +[uname]: https://man7.org/linux/man-pages/man2/uname.2.html |
| 49 | + |
| 50 | +Building a binary from that code can be done by: |
| 51 | + |
| 52 | +```console |
| 53 | +> go build -o main main.go |
| 54 | +> ldd ./main |
| 55 | + not a dynamic executable |
| 56 | +``` |
| 57 | + |
| 58 | +Now it's possible to download the latest binary of [`spoc` from |
| 59 | +GitHub][spoc-latest] and run the application on Linux with it: |
| 60 | + |
| 61 | +[spoc-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/download/v0.8.0/spoc.amd64 |
| 62 | + |
| 63 | +```console |
| 64 | +> sudo ./spoc record ./main |
| 65 | +10:08:25.591945 Loading bpf module |
| 66 | +10:08:25.591958 Using system btf file |
| 67 | +libbpf: loading object 'recorder.bpf.o' from buffer |
| 68 | +… |
| 69 | +libbpf: prog 'sys_enter': relo #3: patched insn #22 (ALU/ALU64) imm 16 -> 16 |
| 70 | +10:08:25.610767 Getting bpf program sys_enter |
| 71 | +10:08:25.610778 Attaching bpf tracepoint |
| 72 | +10:08:25.611574 Getting syscalls map |
| 73 | +10:08:25.611582 Getting pid_mntns map |
| 74 | +10:08:25.613097 Module successfully loaded |
| 75 | +10:08:25.613311 Processing events |
| 76 | +10:08:25.613693 Running command with PID: 336007 |
| 77 | +10:08:25.613835 Received event: pid: 336007, mntns: 4026531841 |
| 78 | +10:08:25.613951 No container ID found for PID (pid=336007, mntns=4026531841, err=unable to find container ID in cgroup path) |
| 79 | +10:08:25.614856 Processing recorded data |
| 80 | +10:08:25.614975 Found process mntns 4026531841 in bpf map |
| 81 | +10:08:25.615110 Got syscalls: read, close, mmap, rt_sigaction, rt_sigprocmask, madvise, nanosleep, clone, uname, sigaltstack, arch_prctl, gettid, futex, sched_getaffinity, exit_group, openat |
| 82 | +10:08:25.615195 Adding base syscalls: access, brk, capget, capset, chdir, chmod, chown, close_range, dup2, dup3, epoll_create1, epoll_ctl, epoll_pwait, execve, faccessat2, fchdir, fchmodat, fchown, fchownat, fcntl, fstat, fstatfs, getdents64, getegid, geteuid, getgid, getpid, getppid, getuid, ioctl, keyctl, lseek, mkdirat, mknodat, mount, mprotect, munmap, newfstatat, openat2, pipe2, pivot_root, prctl, pread64, pselect6, readlink, readlinkat, rt_sigreturn, sched_yield, seccomp, set_robust_list, set_tid_address, setgid, setgroups, sethostname, setns, setresgid, setresuid, setsid, setuid, statfs, statx, symlinkat, tgkill, umask, umount2, unlinkat, unshare, write |
| 83 | +10:08:25.616293 Wrote seccomp profile to: /tmp/profile.yaml |
| 84 | +10:08:25.616298 Unloading bpf module |
| 85 | +``` |
| 86 | + |
| 87 | +I have to execute `spoc` as root because it will internally run an [ebpf][ebpf] |
| 88 | +program by reusing the same code parts from the Security Profiles Operator |
| 89 | +itself. I can see that the bpf module got loaded successfully and `spoc` |
| 90 | +attached the required tracepoint to it. Then it will track the main application |
| 91 | +by using its [mount namespace][mntns] and process the recorded syscall data. The |
| 92 | +nature of ebpf programs is that they see the whole context of the Kernel, which |
| 93 | +means that `spoc` tracks all syscalls of the system, but does not interfere with |
| 94 | +their execution. |
| 95 | + |
| 96 | +[ebpf]: https://ebpf.io |
| 97 | +[mntns]: https://man7.org/linux/man-pages/man7/mount_namespaces.7.html |
| 98 | + |
| 99 | +The logs indicate that `spoc` found the syscalls `read`, `close`, |
| 100 | +`mmap` and so on, including `uname`. All other syscalls than `uname` are coming |
| 101 | +from the golang runtime and its garbage collection, which already adds overhead |
| 102 | +to a basic application like in our demo. I can also see from the log line |
| 103 | +`Adding base syscalls: …` that `spoc` adds a bunch of base syscalls to the |
| 104 | +resulting profile. Those are used by the OCI runtime (like [runc][runc] or |
| 105 | +[crun][crun]) in order to be able to run a container. This means that `spoc` |
| 106 | +can be used to record seccomp profiles which then can be containerized directly. |
| 107 | +This behavior can be disabled in `spoc` by using the `--no-base-syscalls`/`-n` |
| 108 | +or customized via the `--base-syscalls`/`-b` command line flags This can be |
| 109 | +helpful in cases where different OCI runtimes other than crun and runc are used, |
| 110 | +or if I just want to record the seccomp profile for the application and stack |
| 111 | +it with another [base profile][base]. |
| 112 | + |
| 113 | +[runc]: https://github.com/opencontainers/runc |
| 114 | +[crun]: https://github.com/containers/crun |
| 115 | +[base]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#base-syscalls-for-a-container-runtime |
| 116 | + |
| 117 | +The resulting profile is now available in `/tmp/profile.yaml`, but the default |
| 118 | +location can be changed using the `--output-file value`/`-o` flag: |
| 119 | + |
| 120 | +```console |
| 121 | +> cat /tmp/profile.yaml |
| 122 | +``` |
| 123 | + |
| 124 | +```yaml |
| 125 | +apiVersion: security-profiles-operator.x-k8s.io/v1beta1 |
| 126 | +kind: SeccompProfile |
| 127 | +metadata: |
| 128 | + creationTimestamp: null |
| 129 | + name: main |
| 130 | +spec: |
| 131 | + architectures: |
| 132 | + - SCMP_ARCH_X86_64 |
| 133 | + defaultAction: SCMP_ACT_ERRNO |
| 134 | + syscalls: |
| 135 | + - action: SCMP_ACT_ALLOW |
| 136 | + names: |
| 137 | + - access |
| 138 | + - arch_prctl |
| 139 | + - brk |
| 140 | + - … |
| 141 | + - uname |
| 142 | + - … |
| 143 | +status: {} |
| 144 | +``` |
| 145 | +
|
| 146 | +The seccomp profile Custom Resource Definition (CRD) can be directly used |
| 147 | +together with the Security Profiles Operator for managing it within Kubernetes. |
| 148 | +`spoc` is also capable of producing raw seccomp profiles (as JSON), by using the |
| 149 | +`--type`/`-t` `raw-seccomp` flag: |
| 150 | + |
| 151 | +```console |
| 152 | +> sudo ./spoc record --type raw-seccomp ./main |
| 153 | +… |
| 154 | +52.628827 Wrote seccomp profile to: /tmp/profile.json |
| 155 | +``` |
| 156 | + |
| 157 | +``` |
| 158 | +> jq . /tmp/profile.json |
| 159 | +``` |
| 160 | +
|
| 161 | +```json |
| 162 | +{ |
| 163 | + "defaultAction": "SCMP_ACT_ERRNO", |
| 164 | + "architectures": ["SCMP_ARCH_X86_64"], |
| 165 | + "syscalls": [ |
| 166 | + { |
| 167 | + "names": ["access", "…", "write"], |
| 168 | + "action": "SCMP_ACT_ALLOW" |
| 169 | + } |
| 170 | + ] |
| 171 | +} |
| 172 | +``` |
| 173 | + |
| 174 | +The utility `spoc record` allows us to record complex seccomp profiles directly |
| 175 | +from binary invocations in any Linux system which is capable of running the ebpf |
| 176 | +code within the Kernel. But it can do more: How about modifying the seccomp |
| 177 | +profile and then testing it by using `spoc run`. |
| 178 | + |
| 179 | +## Running seccomp profiles with `spoc run` |
| 180 | + |
| 181 | +`spoc` is also able to run binaries with applied seccomp profiles, making it |
| 182 | +easy to test any modification to it. To do that, just run: |
| 183 | + |
| 184 | +```console |
| 185 | +> sudo ./spoc run ./main |
| 186 | +10:29:58.153263 Reading file /tmp/profile.yaml |
| 187 | +10:29:58.153311 Assuming YAML profile |
| 188 | +10:29:58.154138 Setting up seccomp |
| 189 | +10:29:58.154178 Load seccomp profile |
| 190 | +10:29:58.154189 Starting audit log enricher |
| 191 | +10:29:58.154224 Enricher reading from file /var/log/audit/audit.log |
| 192 | +10:29:58.155356 Running command with PID: 437880 |
| 193 | +> |
| 194 | +``` |
| 195 | + |
| 196 | +It looks like that the application exited successfully, which is anticipated |
| 197 | +because I did not modify the previously recorded profile yet. I can also |
| 198 | +specify a custom location for the profile by using the `--profile`/`-p` flag, |
| 199 | +but this was not necessary because I did not modify the default output location |
| 200 | +from the record. `spoc` will automatically determine if it's a raw (JSON) or CRD |
| 201 | +(YAML) based seccomp profile and then apply it to the process. |
| 202 | + |
| 203 | +The Security Profiles Operator supports a [log enricher feature][enricher], |
| 204 | +which provides additional seccomp related information by parsing the audit logs. |
| 205 | +`spoc run` uses the enricher in the same way to provide more data to the end |
| 206 | +users when it comes to debugging seccomp profiles. |
| 207 | + |
| 208 | +[enricher]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#using-the-log-enricher |
| 209 | + |
| 210 | +Now I have to modify the profile to see anything valuable in the output. For |
| 211 | +example, I could remove the allowed `uname` syscall: |
| 212 | + |
| 213 | +```console |
| 214 | +> jq 'del(.syscalls[0].names[] | select(. == "uname"))' /tmp/profile.json > /tmp/no-uname-profile.json |
| 215 | +``` |
| 216 | + |
| 217 | +And then try to run it again with the new profile `/tmp/no-uname-profile.json`: |
| 218 | + |
| 219 | +``` |
| 220 | +> sudo ./spoc run -p /tmp/no-uname-profile.json ./main |
| 221 | +10:39:12.707798 Reading file /tmp/no-uname-profile.json |
| 222 | +10:39:12.707892 Setting up seccomp |
| 223 | +10:39:12.707920 Load seccomp profile |
| 224 | +10:39:12.707982 Starting audit log enricher |
| 225 | +10:39:12.707998 Enricher reading from file /var/log/audit/audit.log |
| 226 | +10:39:12.709164 Running command with PID: 480512 |
| 227 | +panic: operation not permitted |
| 228 | +
|
| 229 | +goroutine 1 [running]: |
| 230 | +main.main() |
| 231 | + /path/to/main.go:10 +0x85 |
| 232 | +10:39:12.713035 Unable to run: launch runner: wait for command: exit status 2 |
| 233 | +``` |
| 234 | + |
| 235 | +Alright, that was expected! The applied seccomp profile blocks the `uname` |
| 236 | +syscall, which results in an "operation not permitted" error. This error is |
| 237 | +pretty generic and does not provide any hint on what got blocked by seccomp. |
| 238 | +It is generally extremely difficult to predict how applications behave if single |
| 239 | +syscalls are forbidden by seccomp. It could be possible that the application |
| 240 | +terminates like in our simple demo, but it could also lead to a strange |
| 241 | +misbehavior and the application does not stop at all. |
| 242 | + |
| 243 | +If I now change the default seccomp action of the profile from `SCMP_ACT_ERRNO` |
| 244 | +to `SCMP_ACT_LOG` like this: |
| 245 | + |
| 246 | +```console |
| 247 | +> jq '.defaultAction = "SCMP_ACT_LOG"' /tmp/no-uname-profile.json > /tmp/no-uname-profile-log.json |
| 248 | +``` |
| 249 | + |
| 250 | +Then the log enricher will give us a hint that the `uname` syscall got blocked |
| 251 | +when using `spoc run`: |
| 252 | + |
| 253 | +``` |
| 254 | +> sudo ./spoc run -p /tmp/no-uname-profile-log.json ./main |
| 255 | +10:48:07.470126 Reading file /tmp/no-uname-profile-log.json |
| 256 | +10:48:07.470234 Setting up seccomp |
| 257 | +10:48:07.470245 Load seccomp profile |
| 258 | +10:48:07.470302 Starting audit log enricher |
| 259 | +10:48:07.470339 Enricher reading from file /var/log/audit/audit.log |
| 260 | +10:48:07.470889 Running command with PID: 522268 |
| 261 | +10:48:07.472007 Seccomp: uname (63) |
| 262 | +``` |
| 263 | + |
| 264 | +The application will not terminate any more, but seccomp will log the behavior |
| 265 | +to `/var/log/audit/audit.log` and `spoc` will parse the data to correlate it |
| 266 | +directly to our program. Generating the log messages to the audit subsystem |
| 267 | +comes with a large performance overhead and should be handled with care in |
| 268 | +production systems. It also comes with a security risk when running untrusted |
| 269 | +apps in audit mode in production environments. |
| 270 | + |
| 271 | +This demo should give you an impression how to debug seccomp profile issues with |
| 272 | +applications, probably by using our shiny new helper tool powered by the |
| 273 | +features of the Security Profiles Operator. `spoc` is a flexible and portable |
| 274 | +binary suitable for edge cases where resources are limited and even Kubernetes |
| 275 | +itself may not be available with its full capabilities. |
| 276 | + |
| 277 | +Thank you for reading this blog post! If you're interested in more, providing |
| 278 | +feedback or asking for help, then feel free to get in touch with us directly via |
| 279 | +[Slack (#security-profiles-operator)][slack] or the [mailing list][mail]. |
| 280 | + |
| 281 | +[slack]: https://kubernetes.slack.com/messages/security-profiles-operator |
| 282 | +[mail]: https://groups.google.com/forum/#!forum/kubernetes-dev |
0 commit comments