@@ -6,7 +6,7 @@ The jailer is invoked in this manner:
66
77``` bash
88jailer --id < id> --node < numa_node> --exec-file < exec_file> --uid < uid> --gid < gid> [--chroot-base-dir < chroot_base> ]
9- [--netns < netns> ] [--daemonize]
9+ [--netns < netns> ] [--daemonize] [--seccomp-level < level > ]
1010```
1111
1212- ` id ` is the unique VM identification string, which may contain alphanumeric
@@ -24,21 +24,16 @@ jailer --id <id> --node <numa_node> --exec-file <exec_file> --uid <uid> --gid <g
2424 will use this to join the associated network namespace.
2525- When present, the ` --daemonize ` flag causes the jailer to cal ** setsid()** and
2626 redirect all three standard I/O file descriptors to ` /dev/null ` .
27+ - Possible values for ` --seccomp-level ` are: 0 (disabled - the default value),
28+ 1 (basic filtering), or 2 (advanced filtering). Basic filtering simply prohibits
29+ syscalls not whitelisted by Firecracker, whereas advanced filtering adds further
30+ checks on some of the parameters of the allowed syscalls. The filters are installed
31+ after the jailer execs into Firecracker, but before running any customer code.
2732
2833## Jailer Operation
2934
3035After starting, the Jailer goes through the following operations:
3136
32- - If the ` --secomp-level ` flag is set to ` 1 ` , sets up a list of seccomp
33- filters, white listing the minimum set of system calls that Firecracker
34- requires to function.
35- - If the ` --seccomp-level ` flag is set to ` 2 ` , sets up advanced
36- seccomp filtering. The default action for a syscall is to send ` SIGSYS ` ,
37- unless there is an added rule white listing respective syscall with the given
38- set of arguments. The added rules are the minimum set that Firecracker
39- requires to function.
40- - Otherwise if ` --seccomp-level ` flag is not set or is set to ` 0 ` , does not use
41- seccomp filtering.
4237- Validate ** all provided paths** and the VM ` id ` .
4338- Close all open file descriptors unrelated to standard input.
4439- Open ` /dev/kvm ` as * RW* , and bind a Unix domain socket listener to
@@ -148,8 +143,9 @@ from the controlling terminal. Then, redirect standard file descriptors to `/dev
148143because it is no longer necessary.
149144
150145Finally, the jailer switches the ** uid** to ``` 123 ``` , and ** gid** to ``` 100 ``` , and execs
151- ` ./firecracker --jailed ` . We can now use the socket at ` /srv/jailer/firecracker/551e7604-e35c-42b3-b825-416853441234/api.socket `
152- to interact with the VM.
146+ ` ./firecracker --jailed --seccomp-level=<level> ` . We can now use the socket at
147+ ` /srv/jailer/firecracker/551e7604-e35c-42b3-b825-416853441234/api.socket ` to
148+ interact with the VM.
153149
154150### Observations
155151
@@ -166,10 +162,6 @@ to interact with the VM.
166162 this involves registering handlers with the cgroup ** notify_on_release**
167163 mechanism, while being wary about potential race conditions (the instance
168164 crashing before the subscription process is complete, for example).
169- - Seccomp filtering is currently disabled by default and needs to be enabled by
170- setting the ` USE_SECCOMP ` environment variable due to a bug in the Linux
171- kernel. Enabling it might cause slowness as a result of an increased number of
172- page faults.
173165- For extra resilience, the jailer expects to be spawned by the user in a new PID namespace, most likely via a
174166combination of ** clone()** with the ** CLONE_NEWPID** flag and ** exec()** . A process must be created in a new PID
175167namespace in order to become a pseudo-init process, and the other option is to use a ** clone()** in the jailer,
0 commit comments