Skip to content

Commit 8c363e8

Browse files
committed
Proposal: runtime should ignore capabilities that cannot be granted
Currently, the specification requires runtimes to produce a (fatal) error if a container configuration requests capabilities that cannot be granted (either the capability is "unknown" to the runtime, not supported by the kernel version in use, or not available in the environment that the runtime operates in). This causes problems in situations where the runtime is running in a restricted environment (for example, docker-in-docker), or if there is a mismatch between the list of capabilities known by higher-level runtimes and the OCI runtime. Some examples: - Kernel 5.8 introduced CAP_PERFMON, CAP_BPF, and CAP_CHECKPOINT_RESTORE capabilities. Docker 20.10.0 ("higher level runtime") shipped with an updated list of capabilities, and when creating a "privileged" container, would determine what capabilities are known by the kernel in use, and request all those capabilities (by including them in the container config). However, runc did not yet have an updated list of capabilities, and therefore reject the container specification, producing an error because the new capabilities were "unknown". - When running nested containers, for example, when running docker-in-docker, the "inner" container may be using a more recent version of docker than the "outer" container. In this situation, the "outer" container may be missing capabilities that the inner container expects to be supported (based on kernel version). However, starting the container would fail, because the OCI runtime could not grant those capabilities (them not being available in the environment it's running in). Workarounds, and motivation ------------------------------------- In the current situation, responsibility of detection what capabilities are supported is left to the "higher level" runtimes. As an example, containerd recently added code to dynamically adjust the list of requested capabilities by attempting to detect which capabilities are available in the environment it's running. This is only a partial solution, as it will not address mismatches between the list of capabilities _known_ by the higher-level and lower-level runtime (which cannot be detected). Not only does this workaround only provide a *partial* fix, it also introduces additional complexity in every higher-level runtime. Proposal: WARN (but otherwise ignore) capabilities that cannot be granted ------------------------------------- This patch changes the specification to have runtimes WARN (but otherwise ignore) capabilities that are requested in the container config, but cannot be granted. Moving this responsibility to the lower-level (OCI) runtime makes more sense, as the OCI runtime _already_ is responsible for interacting with the kernel (detecting what capabilities are supported, and performing conversion), and only the lower-level runtime itself knows what capabilities it supports itself. Making the lower-level runtime responsible for handling "unknown" or "unavailable" capabilities keeps the logic central. Impact on security ------------------------------------- Given that `capabilities` is an "allow-list", ignoring unknown capabilities will not impose a security risk; worst case, a container does not get all requested capabilities granted and as a result, some actions may fail. Backward-compatibility ------------------------------------- Changing this behavior should be backward compatible. Higher-level runtimes that already dynamically adjust the list of requested capabilities can continue to do so. Runtimes that do not adjust will see an improvement (containers can start even if some of the requested capabilities are not granted). Container processes MAY fail (as described in "impact on security"), but users can debug this situation either by looking at the warnings produces by the OCI runtime, or using tools such as `capsh` / `libcap` to get the list of actual capabilities in the container. Signed-off-by: Sebastiaan van Stijn <[email protected]>
1 parent 7413a7f commit 8c363e8

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

config.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,11 @@ For Linux-based systems, the `process` object supports the following process-spe
190190
For more information about AppArmor, see [AppArmor documentation][apparmor].
191191
* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process.
192192
Valid values are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`.
193-
Any value which cannot be mapped to a relevant kernel interface MUST cause an error.
193+
Any value which cannot be mapped to a relevant kernel interface, or cannot
194+
be granted otherwise MUST be [logged as a warning](runtime.md#warnings) by
195+
the runtime. Runtimes SHOULD NOT fail if the container configuration requests
196+
capabilities that cannot be granted, for example, if the runtime operates in
197+
a restricted environment with a limited set of capabilities.
194198
`capabilities` contains the following properties:
195199

196200
* **`effective`** (array of strings, OPTIONAL) the `effective` field is an array of effective capabilities that are kept for the process.

0 commit comments

Comments
 (0)