Impact
This attack is very similar in concept and application to CVE-2025-31133,
except that it attacks a similar vulnerability in a different target (namely,
the bind-mount of /dev/pts/$n to /dev/console as configured for all
containers that allocate a console).
In runc version 1.0.0-rc3 and later, due to insufficient checks when
bind-mounting /dev/pts/$n to /dev/console inside the container, an attacker
can trick runc into bind-mounting paths which would normally be made read-only
or be masked onto a path that the attacker can write to. This happens after
pivot_root(2), so this cannot be used to write to host files directly --
however, as with CVE-2025-31133, this can load to denial of service of the host
or a container breakout by providing the attacker with a writable copy of
/proc/sysrq-trigger or /proc/sys/kernel/core_pattern (respectively).
The reason that the attacker can gain write access to these files is because
the /dev/console bind-mount happens before maskedPaths and readonlyPaths
are applied.
This attack was analysed as having a CVSSv4 severity of 7.3 (High) using the
vector CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H.
Additional Findings
While investigating this issue, we discovered some other theoretical issues
that may or may not be exploitable, as well as taking the opportunity to fix
some fairly well-known issues related to consoles.
Issue 1: Problematic Usage of os.Create
Go provides an os.Create function for creating files, which older code in
runc (dating back to the original libcontainer from the early 2010s) had a
tendency to use fairly liberally. os.Create implies O_CREAT|O_TRUNC but by
design it does not apply O_NOFOLLOW nor O_EXCL, meaning if the target is
swapped with a malicious symlink runc can be tricked into truncating host files
(which can lead to denial of service attacks, among other concerns).
We conducted an audit of all os.Create usages in runc and found some
suspicious usages related to device inodes, but based on our testing these were
not exploitable in practice. We now have custom code lints to block any
os.Create usage in runc, and plan to do a further audit of any other plain
os.* operation usage throughout runc after this advisory becomes public.
CVE-2024-45310 was a similar attack but without the O_TRUNC component (which
resulted in a "Low" severity) -- a similar attack being exploitable would've
been much more severe. As this attack did not appear to be exploitable, we did
not assign a CVSS vector for this theoretical attack.
Issue 2: Malicious /dev/pts/$n Inode Attacks (TIOCGPTPEER)
The (very) classic API for constructing consoles involves first opening
/dev/ptmx for reading and writing. This allocates a new pseudo-terminal and
the returned file descriptor is the "master" end (which is used by higher-level
runtimes to do I/O with the container).
Traditionally, in order to get the "slave" end, you do ioctl(ptm, TIOCGPTN)
to get the pseudo-terminal number and then open the file in /dev/pts/ with
the corresponding base-10 decimal number of the number returned by TIOCGPTN.
The naive way of doing this is vulnerable to very basic race attacks where
/dev/pts/$n is replaced with a different pseudo-terminal or other malicious
file.
In order to provide a mechanism to mitigate this risk, Aleksa Sarai (@cyphar
from SUSE) implemented TIOCGPTPEER back in 2017 to provide a race-free way of
doing the last TIOCGPTN step by opening the peer end of the pseudo-terminal
directly. However, at the time it was believed to be too impractical to
implement this protection in runc due to its no-monitor-process architecture
(unlike runtimes like LXC which made use of TIOCGPTPEER almost immediately).
While working on this advisory, we found a way to make TIOCGPTN usage on
pre-4.13 kernels still safe against race attacks and so have implemented both
TIOCGPTPEER support as well as safe TIOCGPTN support as a fallback.
Another possible target of attack would be replacing /dev/ptmx or
/dev/pts/ptmx with a different inode and tricking runc into trying to operate
on it. This is very similar to the core issue in CVE-2025-31133 and had a
similar solution.
Our analysis was that while this attack appears to be potentially problematic
in theory, it seems unlikely to actually be exploitable due to how consoles are
treated (runc tries to do several pseudo-terminal-specific ioctls and will
error out if they fail -- which happens for most other file types). In
principle you could imagine a DoS attack using a disconnected NFS handle but it
seems impractical to exploit. However, we felt it prudent to include a solution
(and this also provides a safe mechanism to get the source mount for the
/dev/console bind-mount issue at the beginning of this advisory).
As this attack did not appear to be exploitable, we did not assign a CVSS
vector for this theoretical attack.
Patches
This advisory is being published as part of a set of three advisories:
The patches fixing this issue have accordingly been combined into a single
patchset. The following patches from that patchset resolve the issues in this
advisory:
- db19bbe ("internal/sys: add VerifyInode helper")
- ff94f99 ("*: switch to safer securejoin.Reopen")
- 531ef79 ("console: use TIOCGPTPEER when allocating peer PTY")
- 398955b ("console: add fallback for pre-TIOCGPTPEER kernels")
- 9be1dbf ("console: avoid trivial symlink attacks for /dev/console")
- de87203 ("console: verify /dev/pts/ptmx before use")
- 01de9d6 ("rootfs: avoid using os.Create for new device inodes")
- aee7d3f ("ci: add lint to forbid the usage of os.Create")
runc 1.2.8, 1.3.3, and 1.4.0-rc.3 have been released and all contain fixes for these
issues. As per our new release model, runc 1.1.x and earlier are
no longer supported and thus have not been patched.
Mitigations
-
Use containers with user namespaces (with the host root user not mapped into
the container's user namespace). This will block most of the most serious
aspects of these attacks, as the procfs files used for the container
breakout use Unix DAC permissions and user namespaced users will not have
access to the relevant files.
An attacker would still be able to bind-mount host paths into the container
but if the host uids and gids mapped into the container do not overlap with
ordinary users on the host (which is the generally recommended
configuration) then the attacker would likely not be able to read or write
to most sensitive host files (depending on the Unix DAC permissions of the
host files). Note that this is still technically more privilege than an
unprivileged user on the host -- because the bind-mount is done by a
privileged process, the attacker would be able to get access to directories
whose parents may have denied search access (i.e., they may be able to
access paths inside a chmod 700 directory that would normally block them
from resolving subpaths).
We would also like to take this opportunity to re-iterate that we
strongly recommend all users use user namespaced containers. They have
proven to be one of the best security hardening mechanisms against container
breakouts, and the kernel applies additional restrictions to user namespaced
containers above and beyond the user remapping functionality provided. With
the advent of id-mapped mounts (Linux 5.12), there is very little reason to
not use user namespaces for most applications. Note that using user
namespaces to configure your container does not mean you have to enable
unprivileged user namespace creation inside the container -- most
container runtimes apply a seccomp-bpf profile which blocks
unshare(CLONE_NEWUSER) inside containers regardless of whether the
container itself uses user namespaces.
Rootless containers can provide even more protection if your configuration
can use them -- by having runc itself be an unprivileged process, in general
you would expect the impact scope of a runc bug to be less severe as it
would only have the privileges afforded to the host user which spawned runc.
-
For non-user namespaced containers, configure all containers you spawn to
not permit processes to run with root privileges. In most cases this would
require configuring the container to use a non-root user and enabling
noNewPrivileges to disable any setuid or set-capability binaries. (Note
that this is our general recommendation for a secure container setup -- it
is very difficult, if not impossible, to run an untrusted program with root
privileges safely.) If you need to use ping in your containers, there is a
net.ipv4.ping_group_range sysctl that can be used to allow unprivileged
users to ping without requiring setuid or set-capability binaries.
-
Do not run untrusted container images from unknown or unverified sources.
-
The default containers-selinux SELinux policy mitigates this issue, as
(unlike CVE-2025-31133) the /dev/console bind-mount does not get
relabeled and so the container process cannot write to the bind-mounted
procfs file by default.
Please note that CVE-2025-52881 allows an attacker to bypass LSM labels,
and so this mitigation is not that helpful when considered in combination
with CVE-2025-52881.
-
The default AppArmor policy used by Docker and Podman does not mitigate this
issue (as access to /dev/console) is usually permitted. Users could create
a custom profile that blocks access to /dev/console, but such a profile
might break regular containers.
Please note that CVE-2025-52881 allows an attacker to bypass LSM labels,
and so the mitigation provided with a custom profile is not that helpful
when considered in combination with CVE-2025-52881.
References
Other Runtimes
As this vulnerability boils down to a fairly easy-to-make logic bug, we have
provided information to other OCI (crun, youki) and non-OCI (LXC) container
runtimes about this vulnerability.
Based on discussions with other runtimes, it seems that crun and youki may have
similar security issues and will release a co-ordinated security release along
with runc. LXC appears to also be vulnerable in some aspects, but their
security stance is (understandably) that non-user-namespaced
containers are fundamentally insecure by design.
Credits
Thanks to Lei Wang (@ssst0n3 from Huawei) and Li Fubang (@lifubang from
acmcoder.com, CIIC) for discovering and reporting the main /dev/console
bind-mount vulnerability, as well as Aleksa Sarai (@cyphar from SUSE) for
discovering Issues 1 and 2 and the original research into these classes of
issues several years ago.
Impact
This attack is very similar in concept and application to CVE-2025-31133,
except that it attacks a similar vulnerability in a different target (namely,
the bind-mount of
/dev/pts/$nto/dev/consoleas configured for allcontainers that allocate a console).
In runc version 1.0.0-rc3 and later, due to insufficient checks when
bind-mounting
/dev/pts/$nto/dev/consoleinside the container, an attackercan trick runc into bind-mounting paths which would normally be made read-only
or be masked onto a path that the attacker can write to. This happens after
pivot_root(2), so this cannot be used to write to host files directly --however, as with CVE-2025-31133, this can load to denial of service of the host
or a container breakout by providing the attacker with a writable copy of
/proc/sysrq-triggeror/proc/sys/kernel/core_pattern(respectively).The reason that the attacker can gain write access to these files is because
the
/dev/consolebind-mount happens beforemaskedPathsandreadonlyPathsare applied.
This attack was analysed as having a CVSSv4 severity of 7.3 (High) using the
vector
CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H.Additional Findings
While investigating this issue, we discovered some other theoretical issues
that may or may not be exploitable, as well as taking the opportunity to fix
some fairly well-known issues related to consoles.
Issue 1: Problematic Usage of
os.CreateGo provides an
os.Createfunction for creating files, which older code inrunc (dating back to the original
libcontainerfrom the early 2010s) had atendency to use fairly liberally.
os.CreateimpliesO_CREAT|O_TRUNCbut bydesign it does not apply
O_NOFOLLOWnorO_EXCL, meaning if the target isswapped with a malicious symlink runc can be tricked into truncating host files
(which can lead to denial of service attacks, among other concerns).
We conducted an audit of all
os.Createusages in runc and found somesuspicious usages related to device inodes, but based on our testing these were
not exploitable in practice. We now have custom code lints to block any
os.Createusage in runc, and plan to do a further audit of any other plainos.*operation usage throughout runc after this advisory becomes public.CVE-2024-45310 was a similar attack but without the
O_TRUNCcomponent (whichresulted in a "Low" severity) -- a similar attack being exploitable would've
been much more severe. As this attack did not appear to be exploitable, we did
not assign a CVSS vector for this theoretical attack.
Issue 2: Malicious
/dev/pts/$nInode Attacks (TIOCGPTPEER)The (very) classic API for constructing consoles involves first opening
/dev/ptmxfor reading and writing. This allocates a new pseudo-terminal andthe returned file descriptor is the "master" end (which is used by higher-level
runtimes to do I/O with the container).
Traditionally, in order to get the "slave" end, you do
ioctl(ptm, TIOCGPTN)to get the pseudo-terminal number and then open the file in
/dev/pts/withthe corresponding base-10 decimal number of the number returned by
TIOCGPTN.The naive way of doing this is vulnerable to very basic race attacks where
/dev/pts/$nis replaced with a different pseudo-terminal or other maliciousfile.
In order to provide a mechanism to mitigate this risk, Aleksa Sarai (@cyphar
from SUSE) implemented
TIOCGPTPEERback in 2017 to provide a race-free way ofdoing the last
TIOCGPTNstep by opening the peer end of the pseudo-terminaldirectly. However, at the time it was believed to be too impractical to
implement this protection in runc due to its no-monitor-process architecture
(unlike runtimes like LXC which made use of
TIOCGPTPEERalmost immediately).While working on this advisory, we found a way to make
TIOCGPTNusage onpre-4.13 kernels still safe against race attacks and so have implemented both
TIOCGPTPEERsupport as well as safeTIOCGPTNsupport as a fallback.Another possible target of attack would be replacing
/dev/ptmxor/dev/pts/ptmxwith a different inode and tricking runc into trying to operateon it. This is very similar to the core issue in CVE-2025-31133 and had a
similar solution.
Our analysis was that while this attack appears to be potentially problematic
in theory, it seems unlikely to actually be exploitable due to how consoles are
treated (runc tries to do several pseudo-terminal-specific
ioctls and willerror out if they fail -- which happens for most other file types). In
principle you could imagine a DoS attack using a disconnected NFS handle but it
seems impractical to exploit. However, we felt it prudent to include a solution
(and this also provides a safe mechanism to get the source mount for the
/dev/consolebind-mount issue at the beginning of this advisory).As this attack did not appear to be exploitable, we did not assign a CVSS
vector for this theoretical attack.
Patches
This advisory is being published as part of a set of three advisories:
The patches fixing this issue have accordingly been combined into a single
patchset. The following patches from that patchset resolve the issues in this
advisory:
runc 1.2.8, 1.3.3, and 1.4.0-rc.3 have been released and all contain fixes for these
issues. As per our new release model, runc 1.1.x and earlier are
no longer supported and thus have not been patched.
Mitigations
Use containers with user namespaces (with the host root user not mapped into
the container's user namespace). This will block most of the most serious
aspects of these attacks, as the
procfsfiles used for the containerbreakout use Unix DAC permissions and user namespaced users will not have
access to the relevant files.
An attacker would still be able to bind-mount host paths into the container
but if the host uids and gids mapped into the container do not overlap with
ordinary users on the host (which is the generally recommended
configuration) then the attacker would likely not be able to read or write
to most sensitive host files (depending on the Unix DAC permissions of the
host files). Note that this is still technically more privilege than an
unprivileged user on the host -- because the bind-mount is done by a
privileged process, the attacker would be able to get access to directories
whose parents may have denied search access (i.e., they may be able to
access paths inside a
chmod 700directory that would normally block themfrom resolving subpaths).
We would also like to take this opportunity to re-iterate that we
strongly recommend all users use user namespaced containers. They have
proven to be one of the best security hardening mechanisms against container
breakouts, and the kernel applies additional restrictions to user namespaced
containers above and beyond the user remapping functionality provided. With
the advent of id-mapped mounts (Linux 5.12), there is very little reason to
not use user namespaces for most applications. Note that using user
namespaces to configure your container does not mean you have to enable
unprivileged user namespace creation inside the container -- most
container runtimes apply a seccomp-bpf profile which blocks
unshare(CLONE_NEWUSER)inside containers regardless of whether thecontainer itself uses user namespaces.
Rootless containers can provide even more protection if your configuration
can use them -- by having runc itself be an unprivileged process, in general
you would expect the impact scope of a runc bug to be less severe as it
would only have the privileges afforded to the host user which spawned runc.
For non-user namespaced containers, configure all containers you spawn to
not permit processes to run with root privileges. In most cases this would
require configuring the container to use a non-root user and enabling
noNewPrivilegesto disable any setuid or set-capability binaries. (Notethat this is our general recommendation for a secure container setup -- it
is very difficult, if not impossible, to run an untrusted program with root
privileges safely.) If you need to use
pingin your containers, there is anet.ipv4.ping_group_rangesysctl that can be used to allow unprivilegedusers to ping without requiring setuid or set-capability binaries.
Do not run untrusted container images from unknown or unverified sources.
The default
containers-selinuxSELinux policy mitigates this issue, as(unlike CVE-2025-31133) the
/dev/consolebind-mount does not getrelabeled and so the container process cannot write to the bind-mounted
procfs file by default.
Please note that CVE-2025-52881 allows an attacker to bypass LSM labels,
and so this mitigation is not that helpful when considered in combination
with CVE-2025-52881.
The default AppArmor policy used by Docker and Podman does not mitigate this
issue (as access to
/dev/console) is usually permitted. Users could createa custom profile that blocks access to
/dev/console, but such a profilemight break regular containers.
Please note that CVE-2025-52881 allows an attacker to bypass LSM labels,
and so the mitigation provided with a custom profile is not that helpful
when considered in combination with CVE-2025-52881.
References
Other Runtimes
As this vulnerability boils down to a fairly easy-to-make logic bug, we have
provided information to other OCI (crun, youki) and non-OCI (LXC) container
runtimes about this vulnerability.
Based on discussions with other runtimes, it seems that crun and youki may have
similar security issues and will release a co-ordinated security release along
with runc. LXC appears to also be vulnerable in some aspects, but their
security stance is (understandably) that non-user-namespaced
containers are fundamentally insecure by design.
Credits
Thanks to Lei Wang (@ssst0n3 from Huawei) and Li Fubang (@lifubang from
acmcoder.com, CIIC) for discovering and reporting the main
/dev/consolebind-mount vulnerability, as well as Aleksa Sarai (@cyphar from SUSE) for
discovering Issues 1 and 2 and the original research into these classes of
issues several years ago.