From 42a5667194bbcad5690c74ab8c9f72960bfef995 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 12:25:47 +0000 Subject: [PATCH 01/10] docs(jailer): Fix jailer CLI usage formatting with missing backslashes The jailer CLI usage was missing trailing backslashes on several lines. Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/jailer.md b/docs/jailer.md index 5fda82154e7..c691288863e 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -17,15 +17,15 @@ The jailer is invoked in this manner: jailer --id \ --exec-file \ --uid \ - --gid - [--parent-cgroup ] - [--cgroup-version ] - [--cgroup ] - [--chroot-base-dir ] - [--netns ] - [--resource-limit ] - [--daemonize] - [--new-pid-ns] + --gid \ + [--parent-cgroup ] \ + [--cgroup-version ] \ + [--cgroup ] \ + [--chroot-base-dir ] \ + [--netns ] \ + [--resource-limit ] \ + [--daemonize] \ + [--new-pid-ns] \ [--...extra arguments for Firecracker] ``` From f782f24de5ca052292c744b6a50ba98f17d0775e Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 12:30:38 +0000 Subject: [PATCH 02/10] docs(jailer): Use underscore for placeholder The jailer CLI usage uses underscore for placeholders of all the parameters other than --cgroup-version Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/jailer.md b/docs/jailer.md index c691288863e..3630eb3c3f5 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -19,7 +19,7 @@ jailer --id \ --uid \ --gid \ [--parent-cgroup ] \ - [--cgroup-version ] \ + [--cgroup-version ] \ [--cgroup ] \ [--chroot-base-dir ] \ [--netns ] \ From 7d7944545460a44f0853b5fa4391d5cd87e14655 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 12:34:21 +0000 Subject: [PATCH 03/10] docs(jailer): Fix unclosed parenthesis The parenthesis was not closed. Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/jailer.md b/docs/jailer.md index 3630eb3c3f5..7b67dad14b1 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -40,8 +40,8 @@ jailer --id \ nested hierarchies. By specifying this parameter, the jailer will create a new cgroup named `id` for the microvm in the `/` subfolder. `cgroup_base` is the cgroup controller root for `cgroup v1` (e.g. - `/sys/fs/cgroup/cpu`) or the unified controller hierarchy for `cgroup v2` ( - e.g. `/sys/fs/cgroup/unified`. `` is a relative path within + `/sys/fs/cgroup/cpu`) or the unified controller hierarchy for `cgroup v2` + (e.g. `/sys/fs/cgroup/unified`). `` is a relative path within that hierarchy. For example, if `--parent-cgroup all_uvms/external_uvms` is specified, the jailer will write all cgroup parameters specified through `--cgroup` in `/sys/fs/cgroup//all_uvms/external_uvms/`. From 30ffa487eecef8b75644e4183bc03f28669b4d00 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 12:55:02 +0000 Subject: [PATCH 04/10] docs(jailer): Remove description for non-existing parameter Since we removed the --node parameter in v1.0, we should not have description for the parameter. Fixes: b4d51ac1a5c3 ("jailer: remove --node parameter") Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/jailer.md b/docs/jailer.md index 7b67dad14b1..1bb42803170 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -123,8 +123,7 @@ After starting, the Jailer goes through the following operations: identified location (referred to as ``), the jailer creates the `//` subfolder, and writes the current pid to `///tasks`. Also, the value passed for each - `` is written to the file. If `--node` is used the corresponding - values are written to the appropriate `cpuset.mems` and `cpuset.cpus` files. + `` is written to the file. - Call `unshare()` into a new mount namespace, use `pivot_root()` to switch the old system root mount point with a new one base in `chroot_dir`, switch the current working directory to the new root, unmount the old root mount point, From 5de4ed39c099dd494f3cdb168a0230e16e53f778 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 13:02:17 +0000 Subject: [PATCH 05/10] docs(jailer): Distinguish parameter names and values - Added `--` prefix when referring to the CLI option itself. - Wrapped placeholders for parameter values with angle brackets. Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 103 +++++++++++++++++++++++++------------------------ 1 file changed, 52 insertions(+), 51 deletions(-) diff --git a/docs/jailer.md b/docs/jailer.md index 1bb42803170..6e7c631676b 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -18,7 +18,7 @@ jailer --id \ --exec-file \ --uid \ --gid \ - [--parent-cgroup ] \ + [--parent-cgroup ] \ [--cgroup-version ] \ [--cgroup ] \ [--chroot-base-dir ] \ @@ -29,31 +29,34 @@ jailer --id \ [--...extra arguments for Firecracker] ``` -- `id` is the unique VM identification string, which may contain alphanumeric - characters and hyphens. The maximum `id` length is currently 64 characters. -- `exec_file` is the path to the Firecracker binary that will be exec-ed by the - jailer. The filename must include the string `firecracker`. This is enforced - because the interaction with the jailer is Firecracker specific. -- `uid` and `gid` are the uid and gid the jailer switches to as it execs the - target binary. -- `parent-cgroup` is used to allow the placement of microvm cgroups in custom +- `--id` specifies the unique VM identification string, which may contain + alphanumeric characters and hyphens. The maximum length is currently 64 + characters. +- `--exec-file` specifies the path to the Firecracker binary that will be + exec-ed by the jailer. The filename must include the string `firecracker`. + This is enforced because the interaction with the jailer is Firecracker + specific. +- `--uid` and `--gid` specify the uid and gid the jailer switches to as it execs + the target binary. +- `--parent-cgroup` is used to allow the placement of microvm cgroups in custom nested hierarchies. By specifying this parameter, the jailer will create a new - cgroup named `id` for the microvm in the `/` - subfolder. `cgroup_base` is the cgroup controller root for `cgroup v1` (e.g. + cgroup named `` for the microvm in the `/` + subfolder. `` is the cgroup controller root for `cgroup v1` (e.g. `/sys/fs/cgroup/cpu`) or the unified controller hierarchy for `cgroup v2` (e.g. `/sys/fs/cgroup/unified`). `` is a relative path within that hierarchy. For example, if `--parent-cgroup all_uvms/external_uvms` is specified, the jailer will write all cgroup parameters specified through `--cgroup` in `/sys/fs/cgroup//all_uvms/external_uvms/`. - By default, the parent cgroup is `exec-file`. If there are no `--cgroup` + By default, the parent cgroup is the filename of ``, which will be + henceforth referred to as ``. If there are no `--cgroup` parameters specified and `--group-version=2` was passed, then the jailer will move the process to the specified cgroup. -- `cgroup-version` is used to select which type of cgroup hierarchy to use for +- `--cgroup-version` is used to select which type of cgroup hierarchy to use for the creation of cgroups. The default value is "1" which means that cgroups - specified with the `cgroup` argument will be created within a v1 hierarchy. - Supported options are "1" for cgroup-v1 and "2" for cgroup-v2. -- `cgroup` cgroups can be passed to the jailer to let it set the values when the - microVM process is spawned. The `--cgroup` argument must follow this format: + specified with `--cgroup` will be created within a v1 hierarchy. Supported + options are "1" for cgroup-v1 and "2" for cgroup-v2. +- `--cgroup` can be passed to the jailer to let it set the values when the + microVM process is spawned. The argument must follow this format: `=` (e.g `cpuset.cpus=0`). This argument can be used multiple times to set multiple cgroups. This is useful to avoid providing privileged permissions to another process for setting the cgroups before or @@ -61,15 +64,15 @@ jailer --id \ Firecracker process cgroups before the VM starts running, with no need to create the entire cgroup hierarchy manually (which requires privileged permissions). -- `chroot_base` represents the base folder where chroot jails are built. The - default is `/srv/jailer`. -- `netns` represents the path to a network namespace handle. If present, the +- `--chroot-base-dir` specifies the base folder where chroot jails are built. + The default is `/srv/jailer`. +- `--netns` specifies the path to a network namespace handle. If present, the jailer will use this to join the associated network namespace. -- For extra security and control over resource usage, `resource-limit` can be - used to set bounds to the process resources. The `--resource-limit` argument - must follow this format: `=` (e.g `no-file=1024`) and can be - used multiple times to set multiple bounds. Current available resources that - can be limited using this argument are: +- For extra security and control over resource usage, `--resource-limit` can be + used to set bounds to the process resources. The argument must follow this + format: `=` (e.g `no-file=1024`) and can be used multiple + times to set multiple bounds. Current available resources that can be limited + using this argument are: - `fsize`: The maximum size in bytes for files created by the process. - `no-file`: Specifies a value one greater than the maximum file descriptor number that can be opened by this process. @@ -80,13 +83,13 @@ Here is an example on how to set multiple resource limits using this argument: --resource-limit fsize=250000000 --resource-limit no-file=1024 ``` -- When present, the `--daemonize` flag causes the jailer to call `setsid()` and - redirect all three standard I/O file descriptors to `/dev/null`. -- When present, the `--new-pid-ns` flag causes the jailer to spawn the provided - binary into a new PID namespace. It makes use of the libc `clone()` function - with the `CLONE_NEWPID` flag. As a result, the jailer and the process running - the exec file have different PIDs. The PID of the child process is stored in - the jail root directory inside `.pid`. +- When present, `--daemonize` causes the jailer to call `setsid()` and redirect + all three standard I/O file descriptors to `/dev/null`. +- When present, `--new-pid-ns` causes the jailer to spawn the provided binary + into a new PID namespace. It makes use of the libc `clone()` function with the + `CLONE_NEWPID` flag. As a result, the jailer and the process running the exec + file have different PIDs. The PID of the child process is stored in the jail + root directory inside `.pid`. - The jailer adheres to the "end of command options" convention, meaning all parameters specified after `--` are forwarded to Firecracker. For example, this can be paired with the `--config-file` Firecracker argument to specify a @@ -99,23 +102,21 @@ Here is an example on how to set multiple resource limits using this argument: After starting, the Jailer goes through the following operations: -- Validate **all provided paths** and the VM `id`. +- Validate **all provided paths** and the VM ID. - Close all open file descriptors based on `/proc//fd` except input, output and error. - Cleanup all environment variables received from the parent process. - Create the `///root` folder, which will be - henceforth referred to as `chroot_dir`. `exec_file_name` is the last path - component of `exec_file` (for example, that would be `firecracker` for - `/usr/bin/firecracker`). Nothing is done if the path already exists (it should - not, since `id` is supposed to be unique). -- Copy `exec_file` to - `///root/`. This ensures the - new process will not share memory with any other Firecracker process. + henceforth referred to as ``. Nothing is done if the path already + exists (it should not, since `` is supposed to be unique). +- Copy the file specified with `--exec-file` to `/`. + This ensures the new process will not share memory with any other Firecracker + process. - Set resource bounds for current process and its children through `--resource-limit` argument, by calling `setrlimit()` system call with the specific resource argument. If no limits are provided, the jailer bounds `no-file` to a maximum default value of 2048. -- Create the `cgroup` sub-folders. The jailer can use either `cgroup v1` or +- Create the cgroup sub-folders. The jailer can use either `cgroup v1` or `cgroup v2`. On most systems, this is mounted by default in `/sys/fs/cgroup` (should be mounted by the user otherwise). The jailer will parse `/proc/mounts` to detect where each of the controllers required in `--cgroup` @@ -125,14 +126,14 @@ After starting, the Jailer goes through the following operations: `///tasks`. Also, the value passed for each `` is written to the file. - Call `unshare()` into a new mount namespace, use `pivot_root()` to switch the - old system root mount point with a new one base in `chroot_dir`, switch the + old system root mount point with a new one base in ``, switch the current working directory to the new root, unmount the old root mount point, and call `chroot` into the current directory. - Use `mknod` to create a `/dev/net/tun` equivalent inside the jail. - Use `mknod` to create a `/dev/kvm` equivalent inside the jail. -- Use `chown` to change ownership of the `chroot_dir` (root path `/` as seen by - the jailed firecracker), `/dev/net/tun`, `/dev/kvm`. The ownership is changed - to the provided `uid:gid`. +- Use `chown` to change ownership of the `` (root path `/` as seen + by the jailed firecracker), `/dev/net/tun`, `/dev/kvm`. The ownership is + changed to the provided `:`. - If `--netns ` is present, attempt to join the specified network namespace. - If `--daemonize` is specified, call `setsid()` and redirect `STDIN`, `STDOUT`, @@ -147,8 +148,8 @@ After starting, the Jailer goes through the following operations: ` --id= --start-time-us= --start-time-cpu-us=` (and also forward any extra arguments provided to the jailer after `--`, as mentioned in the **Jailer Usage** section), where: - - `id`: (`string`) - The `id` argument provided to jailer. - - `opaque`: (`number`) time calculated by the jailer that it spent doing its + - ``: (`string`) - The `` argument provided to jailer. + - ``: (`number`) time calculated by the jailer that it spent doing its work. ## Example Run and Notes @@ -191,7 +192,7 @@ It’s worth noting that, whenever a folder already exists, nothing will be done and we move on to the next directory that needs to be created. This should only happen for the common `firecracker` subfolder (but, as for creating the chroot path before, we do not issue an error if folders directly associated with the -supposedly unique `id` already exist). +supposedly unique `` already exist). The jailer then writes the current pid to `/sys/fs/cgroup/cpuset/firecracker/551e7604-e35c-42b3-b825-416853441234/tasks`, @@ -228,8 +229,8 @@ call `chown(“/dev/net/tun”, 123, 100)`, so Firecracker can use it after drop privileges. This is required to use multiple TAP interfaces when running jailed. Do the same for `/dev/kvm`. -Change ownership of `` to `uid:gid` so that Firecracker can create -its API socket there. +Change ownership of `` to `:` so that Firecracker can +create its API socket there. Since the `--daemonize` flag is present, call `setsid()` to join a new session, a new process group, and to detach from the controlling terminal. Then, redirect @@ -237,7 +238,7 @@ standard file descriptors to `/dev/null` by calling `dup2(dev_null_fd, STDIN)`, `dup2(dev_null_fd, STDOUT)`, and `dup2(dev_null_fd, STDERR)`. Close `dev_null_fd`, because it is no longer necessary. -Finally, the jailer switches the `uid` to `123`, and `gid` to `100`, and execs +Finally, the jailer switches the uid to `123`, and gid to `100`, and execs ```console ./firecracker \ From 11eac42f68ab390972bb8fb85cb9838c3d8bad94 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 13:05:58 +0000 Subject: [PATCH 06/10] docs(jailer): Reorder cgroup-related parameters Since the behavior of --parent-cgroup depends on whether --cgroup is provided or not and wether --cgroup-version is 1 or 2, explain the dependencies first and then --parent-cgroup. Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/jailer.md b/docs/jailer.md index 6e7c631676b..48f79177e3a 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -18,9 +18,9 @@ jailer --id \ --exec-file \ --uid \ --gid \ - [--parent-cgroup ] \ [--cgroup-version ] \ [--cgroup ] \ + [--parent-cgroup ] \ [--chroot-base-dir ] \ [--netns ] \ [--resource-limit ] \ @@ -38,19 +38,6 @@ jailer --id \ specific. - `--uid` and `--gid` specify the uid and gid the jailer switches to as it execs the target binary. -- `--parent-cgroup` is used to allow the placement of microvm cgroups in custom - nested hierarchies. By specifying this parameter, the jailer will create a new - cgroup named `` for the microvm in the `/` - subfolder. `` is the cgroup controller root for `cgroup v1` (e.g. - `/sys/fs/cgroup/cpu`) or the unified controller hierarchy for `cgroup v2` - (e.g. `/sys/fs/cgroup/unified`). `` is a relative path within - that hierarchy. For example, if `--parent-cgroup all_uvms/external_uvms` is - specified, the jailer will write all cgroup parameters specified through - `--cgroup` in `/sys/fs/cgroup//all_uvms/external_uvms/`. - By default, the parent cgroup is the filename of ``, which will be - henceforth referred to as ``. If there are no `--cgroup` - parameters specified and `--group-version=2` was passed, then the jailer will - move the process to the specified cgroup. - `--cgroup-version` is used to select which type of cgroup hierarchy to use for the creation of cgroups. The default value is "1" which means that cgroups specified with `--cgroup` will be created within a v1 hierarchy. Supported @@ -64,6 +51,19 @@ jailer --id \ Firecracker process cgroups before the VM starts running, with no need to create the entire cgroup hierarchy manually (which requires privileged permissions). +- `--parent-cgroup` is used to allow the placement of microvm cgroups in custom + nested hierarchies. By specifying this parameter, the jailer will create a new + cgroup named `` for the microvm in the `/` + subfolder. `` is the cgroup controller root for `cgroup v1` (e.g. + `/sys/fs/cgroup/cpu`) or the unified controller hierarchy for `cgroup v2` + (e.g. `/sys/fs/cgroup/unified`). `` is a relative path within + that hierarchy. For example, if `--parent-cgroup all_uvms/external_uvms` is + specified, the jailer will write all cgroup parameters specified through + `--cgroup` in `/sys/fs/cgroup//all_uvms/external_uvms/`. + By default, the parent cgroup is the filename of ``, which will be + henceforth referred to as ``. If there are no `--cgroup` + parameters specified and `--group-version=2` was passed, then the jailer will + move the process to the specified cgroup. - `--chroot-base-dir` specifies the base folder where chroot jails are built. The default is `/srv/jailer`. - `--netns` specifies the path to a network namespace handle. If present, the From 8a7214c794513bd56c23a9cc5946f5b710908f00 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 14:25:25 +0000 Subject: [PATCH 07/10] docs(jailer): Clarify behavior of --parent-cgroup Reorganized the description of --parent-cgroup into bullet points to make clear that its behavior depends on the combination of parameters. Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/docs/jailer.md b/docs/jailer.md index 48f79177e3a..3b1769f9eeb 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -52,18 +52,28 @@ jailer --id \ create the entire cgroup hierarchy manually (which requires privileged permissions). - `--parent-cgroup` is used to allow the placement of microvm cgroups in custom - nested hierarchies. By specifying this parameter, the jailer will create a new - cgroup named `` for the microvm in the `/` - subfolder. `` is the cgroup controller root for `cgroup v1` (e.g. - `/sys/fs/cgroup/cpu`) or the unified controller hierarchy for `cgroup v2` - (e.g. `/sys/fs/cgroup/unified`). `` is a relative path within - that hierarchy. For example, if `--parent-cgroup all_uvms/external_uvms` is - specified, the jailer will write all cgroup parameters specified through - `--cgroup` in `/sys/fs/cgroup//all_uvms/external_uvms/`. - By default, the parent cgroup is the filename of ``, which will be - henceforth referred to as ``. If there are no `--cgroup` - parameters specified and `--group-version=2` was passed, then the jailer will - move the process to the specified cgroup. + nested hierarchies. The default value is the filename of ``, which + will be henceforth referred to as ``. The behavior of this + parameter depends on the following condition: + - If either any `--cgroup` parameter is specifed or `--cgroup-version=1` is + passed, the jailer will create a new cgroup named `` for the microvm in + the `/` subfolder. `` is the cgroup + controller root for cgroup v1 (e.g. `/sys/fs/cgroup/cpu`) or the unified + controller hierarchy for cgroup v2 (e.g. `/sys/fs/cgroup/unified`). + `` is a relative path within that hierarchy. For example, if + `--parent-cgroup all_uvms/external_uvms` is specified, the jailer will write + all cgroup parameters specified through `--cgroup` in + `/sys/fs/cgroup//all_uvms/external_uvms/`. + - If no `--cgroup` parameters are specified and `--cgroup-version=2` is + passed, the jailer will not create a new cgroup. If the cgroup specified + with `--parent-cgroup` exists, the jailer will move the process to the + specified cgroup, contrary to its name. This behavior can be used when users + want to configure a cgroup beforehand by themselves and move the process to + the configured cgroup. Note that, if the specified cgroup has domain + controllers (e.g. memory) enabled in `cgroup.subtree_control`, the move + fails due to ["no internal process constraint"][1] and jailer exits with an + error. If the cgroup spcified with `--parent-cgroup` does not exist, the + jailer does not move the process to any cgroup and proceeds without error. - `--chroot-base-dir` specifies the base folder where chroot jails are built. The default is `/srv/jailer`. - `--netns` specifies the path to a network namespace handle. If present, the @@ -301,3 +311,5 @@ Note: default value for `` is `/run/firecracker.socket`. - If all the cgroup controllers are bunched up on a single mount point using the "all" option, our current program logic will complain it cannot detect individual controller mount points. + +[1]: https://docs.kernel.org/admin-guide/cgroup-v2.html#no-internal-process-constraint From fd71e387f9e147d22ae4defb58cbc700bb3f60c4 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Fri, 5 Sep 2025 14:42:14 +0000 Subject: [PATCH 08/10] jailer: Better error message for failure of moving to cgroup If no --cgroup parameters are specified and --cgroup-version=2 is passed, the jailer moves the process to the cgroup specified with --parent-cgroup rather than creating a cgroup under it, contrary to its name. This move fails if the destination cgroup has domain controllers (e.g. memory) enabled in cgroup.subtree_control, which is called "no internal process constraint [1]. [1]: https://docs.kernel.org/admin-guide/cgroup-v2.html#no-internal-process-constraint Signed-off-by: Takahiro Itazuri --- src/jailer/src/env.rs | 2 +- src/jailer/src/main.rs | 7 +++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/src/jailer/src/env.rs b/src/jailer/src/env.rs index 57d28724a57..1f106e2894b 100644 --- a/src/jailer/src/env.rs +++ b/src/jailer/src/env.rs @@ -218,7 +218,7 @@ impl Env { let cg_parent_procs = cg_parent.join("cgroup.procs"); if cg_parent.exists() { fs::write(cg_parent_procs, std::process::id().to_string()) - .map_err(|_| JailerError::CgroupWrite(io::Error::last_os_error()))?; + .map_err(|_| JailerError::CgroupMove(cg_parent, io::Error::last_os_error()))?; } } diff --git a/src/jailer/src/main.rs b/src/jailer/src/main.rs index 9532fa1eba4..694a951d0a4 100644 --- a/src/jailer/src/main.rs +++ b/src/jailer/src/main.rs @@ -43,8 +43,11 @@ pub enum JailerError { CgroupInvalidVersion(String), #[error("Parent cgroup path is invalid. Path should not be absolute or contain '..' or '.'")] CgroupInvalidParentPath(), - #[error("Failed to write to cgroups file: {0}")] - CgroupWrite(io::Error), + #[error( + "Failed to move process to cgroup ({0}): {1}.\nHint: If you intended to create a child \ + cgroup under {0}, pass any --cgroup parameters." + )] + CgroupMove(PathBuf, io::Error), #[error("Failed to change owner for {0}: {1}")] ChangeFileOwner(PathBuf, io::Error), #[error("Failed to chdir into chroot directory: {0}")] From 5d39eb9423e9ec7438d5de4b19880d74715f77bc Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Mon, 8 Sep 2025 12:45:19 +0000 Subject: [PATCH 09/10] test(jailer): Test --parent-cgroup specified but no --cgroup provided The jailer's behavior when --parent-cgroup specified but no --cgroup provided varies depending on the existence and subtree_control configuration of the specified cgroup. Test all the cases for such a combination of parameters. Signed-off-by: Takahiro Itazuri --- tests/integration_tests/security/test_jail.py | 79 ++++++++++++++++--- 1 file changed, 68 insertions(+), 11 deletions(-) diff --git a/tests/integration_tests/security/test_jail.py b/tests/integration_tests/security/test_jail.py index e9cdfc2c644..aa2a206a6a9 100644 --- a/tests/integration_tests/security/test_jail.py +++ b/tests/integration_tests/security/test_jail.py @@ -194,6 +194,20 @@ def move_pid(self, cgname, pid): cg_pids = self.root.joinpath(f"{cgname}/cgroup.procs") cg_pids.write_text(f"{pid}\n", encoding="ascii") + def enable_controller_in_subtree(self, cgname, controller): + """Enable a controller in subtree_control of a cgroup and its ancestors""" + # Enable the controller in all ancestors if not already enabled. + parent_cg = self.root.joinpath(cgname).parent + parent_subtree_control = parent_cg.joinpath("cgroup.subtree_control") + if controller not in parent_subtree_control.read_text(encoding="ascii"): + self.enable_controller_in_subtree( + parent_cg.relative_to(self.root), controller + ) + + subtree_control = self.root.joinpath(f"{cgname}/cgroup.subtree_control") + subtree_control.write_text(f"+{controller}", encoding="ascii") + assert controller in subtree_control.read_text(encoding="ascii") + @pytest.fixture(scope="session", autouse=True) def cgroups_info(): @@ -230,11 +244,7 @@ def check_cgroups_v2(vm): cg_parent = cg.root / parent_cgroup cg_jail = cg_parent / vm.jailer.jailer_id - # if no cgroups were specified, then the jailer should move the FC process - # to the parent group - if len(vm.jailer.cgroups) == 0: - procs = cg_parent.joinpath("cgroup.procs").read_text().splitlines() - assert str(vm.firecracker_pid) in procs + assert len(vm.jailer.cgroups) > 0 for cgroup in vm.jailer.cgroups: controller = cgroup.split(".")[0] @@ -393,11 +403,23 @@ def test_v1_default_cgroups(uvm_plain, cgroups_info): check_cgroups_v1(test_microvm.jailer.cgroups, test_microvm.jailer.jailer_id) -def test_cgroups_custom_parent_move(uvm_plain, cgroups_info): +@pytest.mark.parametrize( + "parent_exists,domain_controller_in_subtree", + [(True, False), (True, True), (False, None)], +) +def test_cgroups_parent_cgroup_but_no_cgroup( + uvm_plain, cgroups_info, parent_exists, domain_controller_in_subtree +): """ - Test cgroups when a custom parent cgroup is used and no cgroups are specified + Test cgroups when `--parent-cgroup` is used but no `--cgroup` are specified. - In this case we just want to move under the parent cgroup + If the cgroup specified with `--parent-cgroup` exists, the jailer should + move to the specified cgroup instead of creating a new cgroup under it. + However, if the specified cgroup has domain controllers (e.g. `memory`) + enabled in `cgroup.subtree_control`, the move should fail. + + If the specified cgroup does not exist, the jailer does not move the process + to any cgroup and proceeds without error. """ if cgroups_info.version != 2: pytest.skip("cgroupsv2 only") @@ -407,9 +429,44 @@ def test_cgroups_custom_parent_move(uvm_plain, cgroups_info): parent_cgroup = f"custom_cgroup/{test_microvm.id[:8]}" test_microvm.jailer.parent_cgroup = parent_cgroup - cgroups_info.new_cgroup(parent_cgroup) - test_microvm.spawn() - check_cgroups_v2(test_microvm) + if parent_exists: + # Create the parent cgroup. + cgroups_info.new_cgroup(parent_cgroup) + if domain_controller_in_subtree: + # Enable "memory" controller in cgroup.subtree_control of the parent. + cgroups_info.enable_controller_in_subtree(parent_cgroup, "memory") + + # Check no --cgroups are specified just in case. + assert len(test_microvm.jailer.cgroups) == 0 + + cg_parent = cgroups_info.root / parent_cgroup + + if parent_exists: + if domain_controller_in_subtree: + # The jailer should have failed to move to the `parent_cgroup` + # since it has domain controllers enabled in + # `cgroup.subtree_control` due to the no internal process + # constraint. + # https://docs.kernel.org/admin-guide/cgroup-v2.html#no-internal-process-constraint + with pytest.raises( + ChildProcessError, + match=( + rf"Failed to move process to cgroup \({cg_parent}\): " + r"Resource busy \(os error 16\)" + ), + ): + test_microvm.spawn() + else: + # The jailer should have moved to the `parent_cgroup` instead of + # creating a new cgroup under it and move to the new cgroup. + test_microvm.spawn() + procs = cg_parent.joinpath("cgroup.procs").read_text().splitlines() + assert str(test_microvm.firecracker_pid) in procs + else: + # The jailer should not have moved to any cgroup and the parent + # still does not exist. + test_microvm.spawn() + assert not cg_parent.exists() def test_args_default_resource_limits(uvm_plain): From ae32fb158a33fa59a3976079f5557ba283d1d4a9 Mon Sep 17 00:00:00 2001 From: Takahiro Itazuri Date: Tue, 9 Sep 2025 06:02:12 +0000 Subject: [PATCH 10/10] docs(jailer): Remove requirement for exec file name We removed the requirement for jailer where the file name of `--exec-file` needs to contain `firecracker`. Fixes: aedee560bbea ("feat(jailer): remove requirement for an executable name") Signed-off-by: Takahiro Itazuri --- docs/jailer.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/jailer.md b/docs/jailer.md index 3b1769f9eeb..98db675d726 100644 --- a/docs/jailer.md +++ b/docs/jailer.md @@ -33,9 +33,7 @@ jailer --id \ alphanumeric characters and hyphens. The maximum length is currently 64 characters. - `--exec-file` specifies the path to the Firecracker binary that will be - exec-ed by the jailer. The filename must include the string `firecracker`. - This is enforced because the interaction with the jailer is Firecracker - specific. + exec-ed by the jailer. - `--uid` and `--gid` specify the uid and gid the jailer switches to as it execs the target binary. - `--cgroup-version` is used to select which type of cgroup hierarchy to use for