@@ -6,7 +6,7 @@ The jailer is invoked in this manner:
6
6
7
7
``` bash
8
8
jailer --id < id> --node < numa_node> --exec-file < exec_file> --uid < uid> --gid < gid> [--chroot-base-dir < chroot_base> ]
9
- [--netns < netns> ] [--daemonize]
9
+ [--netns < netns> ] [--daemonize] [--seccomp-level < level > ]
10
10
```
11
11
12
12
- ` id ` is the unique VM identification string, which may contain alphanumeric
@@ -24,21 +24,16 @@ jailer --id <id> --node <numa_node> --exec-file <exec_file> --uid <uid> --gid <g
24
24
will use this to join the associated network namespace.
25
25
- When present, the ` --daemonize ` flag causes the jailer to cal ** setsid()** and
26
26
redirect all three standard I/O file descriptors to ` /dev/null ` .
27
+ - Possible values for ` --seccomp-level ` are: 0 (disabled - the default value),
28
+ 1 (basic filtering), or 2 (advanced filtering). Basic filtering simply prohibits
29
+ syscalls not whitelisted by Firecracker, whereas advanced filtering adds further
30
+ checks on some of the parameters of the allowed syscalls. The filters are installed
31
+ after the jailer execs into Firecracker, but before running any customer code.
27
32
28
33
## Jailer Operation
29
34
30
35
After starting, the Jailer goes through the following operations:
31
36
32
- - If the ` --secomp-level ` flag is set to ` 1 ` , sets up a list of seccomp
33
- filters, white listing the minimum set of system calls that Firecracker
34
- requires to function.
35
- - If the ` --seccomp-level ` flag is set to ` 2 ` , sets up advanced
36
- seccomp filtering. The default action for a syscall is to send ` SIGSYS ` ,
37
- unless there is an added rule white listing respective syscall with the given
38
- set of arguments. The added rules are the minimum set that Firecracker
39
- requires to function.
40
- - Otherwise if ` --seccomp-level ` flag is not set or is set to ` 0 ` , does not use
41
- seccomp filtering.
42
37
- Validate ** all provided paths** and the VM ` id ` .
43
38
- Close all open file descriptors unrelated to standard input.
44
39
- Open ` /dev/kvm ` as * RW* , and bind a Unix domain socket listener to
@@ -148,8 +143,9 @@ from the controlling terminal. Then, redirect standard file descriptors to `/dev
148
143
because it is no longer necessary.
149
144
150
145
Finally, the jailer switches the ** uid** to ``` 123 ``` , and ** gid** to ``` 100 ``` , and execs
151
- ` ./firecracker --jailed ` . We can now use the socket at ` /srv/jailer/firecracker/551e7604-e35c-42b3-b825-416853441234/api.socket `
152
- to interact with the VM.
146
+ ` ./firecracker --jailed --seccomp-level=<level> ` . We can now use the socket at
147
+ ` /srv/jailer/firecracker/551e7604-e35c-42b3-b825-416853441234/api.socket ` to
148
+ interact with the VM.
153
149
154
150
### Observations
155
151
@@ -166,10 +162,6 @@ to interact with the VM.
166
162
this involves registering handlers with the cgroup ** notify_on_release**
167
163
mechanism, while being wary about potential race conditions (the instance
168
164
crashing before the subscription process is complete, for example).
169
- - Seccomp filtering is currently disabled by default and needs to be enabled by
170
- setting the ` USE_SECCOMP ` environment variable due to a bug in the Linux
171
- kernel. Enabling it might cause slowness as a result of an increased number of
172
- page faults.
173
165
- For extra resilience, the jailer expects to be spawned by the user in a new PID namespace, most likely via a
174
166
combination of ** clone()** with the ** CLONE_NEWPID** flag and ** exec()** . A process must be created in a new PID
175
167
namespace in order to become a pseudo-init process, and the other option is to use a ** clone()** in the jailer,
0 commit comments