Skip to content

container escape and denial of service due to arbitrary write gadgets and procfs write redirects

High
cyphar published GHSA-cgrx-mc8f-2prm Nov 5, 2025

Package

gomod github.com/opencontainers/runc (Go)

Affected versions

<=1.2.7, <=1.3.2, <=1.4.0-rc.2

Patched versions

1.2.8, 1.3.3, 1.4.0-rc.3
gomod github.com/opencontainers/selinux (Go)
<=1.12.0
1.13.0

Description

Impact

This attack is primarily a more sophisticated version of CVE-2019-19921, which
was a flaw which allowed an attacker to trick runc into writing the LSM process
labels for a container process into a dummy tmpfs file and thus not apply the
correct LSM labels to the container process. The mitigation we applied for
CVE-2019-19921 was fairly limited and effectively only caused runc to verify
that when we write LSM labels that those labels are actual procfs files.

Rather than using a fake tmpfs file for /proc/self/attr/<label>, an
attacker could instead (through various means) make /proc/self/attr/<label>
reference a real procfs file, but one that would still be a no-op (such as
/proc/self/sched). This would have the same effect but would clear the "is a
procfs file" check. We were aware that this kind of attack would be possible
(even going so far as to discuss this publicly as "future work" at
conferences), and we were working on a far more comprehensive mitigation of
this attack, but this security issue was disclosed before we could complete
this work.

In all known versions of runc, an attacker can trick runc into misdirecting
writes to /proc to other procfs files through the use of a racing container
with shared mounts (we have also verified this attack is possible to exploit
using a standard Dockerfile with docker buildx build as that also permits
triggering parallel execution of containers with custom shared mounts
configured). This redirect could be through symbolic links in a tmpfs or
theoretically other methods such as regular bind-mounts.

Note that while /proc/self/attr/<label> was the example used above (which is
LSM-specific), this issue affect all writes to /proc in runc and thus also
affects sysctls (written to /proc/sys/...) and some other APIs.

Additional Impacts

While investigating this issue, we discovered that another risk with these
redirected writes is that they could be redirected to dangerous files such as
/proc/sysrq-trigger rather than just no-op files like /proc/self/sched.
For instance, the default AppArmor profile name in Docker is docker-default,
which when written to /proc/sysrq-trigger would cause the host system to
crash.

When this was discovered, we conducted an audit of other write operations
within runc and found several possible areas where runc could be used as a
semi-arbitrary write gadget when combined with the above race attacks. The most
concerning attack scenario was the configuration of sysctls. Because the
contents of the sysctl are free-form text, an attacker could use a misdirected
write to write to /proc/sys/kernel/core_pattern and break out of the
container (as described in CVE-2025-31133, kernel upcalls are not namespaced
and so coredump helpers will run with complete root privileges on the host).
Even if the attacker cannot configure custom sysctls, a valid sysctl string
(when redirected to /proc/sysrq-trigger) can easily cause the machine to
hang.

Note that the fact that this attack allows you to disable LSM labels makes it a
very useful attack to combine with CVE-2025-31133 (as one of the only
mitigations available to most users for that issue is AppArmor, and this attack
would let you bypass that). However, the misdirected write issue above means
that you could also achieve most of the same goals without needing to chain
together attacks.

Taking the above additional impacts into account, this attack was analysed as
having a CVSSv4 severity of 7.3 (High) using the vector
CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H.

Patches

This advisory is being published as part of a set of three advisories:

The patches fixing this issue have accordingly been combined into a single
patchset. The following patches from that patchset resolve the issues in this
advisory:

  • db19bbe ("internal/sys: add VerifyInode helper")
  • 6fc1914 ("internal: move utils.MkdirAllInRoot to internal/pathrs")
  • ff94f99 ("*: switch to safer securejoin.Reopen")
  • 44a0fcf ("go.mod: update to github.com/cyphar/[email protected]")
  • 77889b5 ("internal: add wrappers for securejoin.Proc*")
  • fdcc9d3 ("apparmor: use safe procfs API for labels")
  • ff6fe13 ("utils: use safe procfs for /proc/self/fd loop code")
  • b3dd1bc ("utils: remove unneeded EnsureProcHandle")
  • 77d217c ("init: write sysctls using safe procfs API")
  • 435cc81 ("init: use securejoin for /proc/self/setgroups")
  • d61fd29 ("libct/system: use securejoin for /proc/$pid/stat")
  • 4b37cd9 ("libct: align param type for mountCgroupV1/V2 functions")
  • d40b343 ("rootfs: switch to fd-based handling of mountpoint targets")
  • ed6b169 ("selinux: use safe procfs API for labels")
    • Please note that this patch includes a private patch for
      github.com/opencontainers/selinux that could not be made public through
      a public pull request (as it would necessarily disclose this embargoed
      security issue).

      The patch includes a complete copy of the forked code and a replace
      directive (as well as go mod vendor applied), which should still work
      with downstream build systems. If you cannot apply this patch, you can
      safely drop it -- some of the other patches in this series should block
      these kinds of racing mount attacks entirely.

      See opencontainers/selinux#237 for the upstream patch.

  • 3f92552 ("rootfs: re-allow dangling symlinks in mount targets")
  • a41366e ("openat2: improve resilience on busy systems")

runc 1.2.8, 1.3.3, and 1.4.0-rc.3 have been released and all contain fixes for these
issues. As per our new release model, runc 1.1.x and earlier are
no longer supported and thus have not been patched.

Mitigations

  • Do not run untrusted container images from unknown or unverified sources.

  • For the basic no-op attack, this attack allows a container process to run
    with the same LSM labels as runc. For most AppArmor deployments this means
    it will be unconfined, and for SELinux it will likely be
    container_runtime_t. We haven't conducted in-depth testing of the impact
    on SELinux -- it is possible that it provides some reasonable protection but
    it seems likely that an attacker could cause harm to systems even with such
    an SELinux setup.

  • For the more involved redirect and write gadget attacks, unfortunately most
    LSM profiles (including the standard container-selinux profiles) provide the
    container runtime access to sysctl files (including /proc/sysrq-trigger)
    and so LSMs likely do not provide much protection against these attacks.

  • Using rootless containers provides some protection against these kinds of
    bugs (privileged writes in runc being redirected) -- by having runc itself
    be an unprivileged process, in general you would expect the impact scope of
    a runc bug to be less severe as it would only have the privileges afforded
    to the host user which spawned runc. For this particular bug, the privilege
    escalation caused by the inadvertent write issue is entirely mitigated with
    rootless containers because the unprivileged user that the runc process is
    executing as cannot write to the aforementioned procfs files (even
    intentionally).

References

Other Runtimes

As this vulnerability boils down to a fairly easy-to-make logic bug, we have
provided information to other OCI (crun, youki) and non-OCI (LXC) container
runtimes about this vulnerability.

Based on discussions with other runtimes, it seems that crun and youki may have
similar security issues and will release a co-ordinated security release along
with runc. LXC appears to use the host's /proc for all procfs operations, and
so is likely not vulnerable to this issue (this is a trade-off -- runc uses the
container's procfs to avoid CVE-2016-9962-style attacks).

Credits

Thanks to Li Fubang (@lifubang from acmcoder.com, CIIC) and Tõnis Tiigi
(@tonistiigi from Docker) for both independently discovering this
vulnerability, as well as Aleksa Sarai (@cyphar from SUSE) for the original
research into this class of security issues and solutions.

Additional thanks go to Tõnis Tiigi for finding some very useful exploit
templates for these kinds of race attacks using docker buildx build.

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v4 base metrics

Exploitability Metrics
Attack Vector Local
Attack Complexity Low
Attack Requirements Present
Privileges Required Low
User interaction Active
Vulnerable System Impact Metrics
Confidentiality High
Integrity High
Availability High
Subsequent System Impact Metrics
Confidentiality High
Integrity High
Availability High

CVSS v4 base metrics

Exploitability Metrics
Attack Vector: This metric reflects the context by which vulnerability exploitation is possible. This metric value (and consequently the resulting severity) will be larger the more remote (logically, and physically) an attacker can be in order to exploit the vulnerable system. The assumption is that the number of potential attackers for a vulnerability that could be exploited from across a network is larger than the number of potential attackers that could exploit a vulnerability requiring physical access to a device, and therefore warrants a greater severity.
Attack Complexity: This metric captures measurable actions that must be taken by the attacker to actively evade or circumvent existing built-in security-enhancing conditions in order to obtain a working exploit. These are conditions whose primary purpose is to increase security and/or increase exploit engineering complexity. A vulnerability exploitable without a target-specific variable has a lower complexity than a vulnerability that would require non-trivial customization. This metric is meant to capture security mechanisms utilized by the vulnerable system.
Attack Requirements: This metric captures the prerequisite deployment and execution conditions or variables of the vulnerable system that enable the attack. These differ from security-enhancing techniques/technologies (ref Attack Complexity) as the primary purpose of these conditions is not to explicitly mitigate attacks, but rather, emerge naturally as a consequence of the deployment and execution of the vulnerable system.
Privileges Required: This metric describes the level of privileges an attacker must possess prior to successfully exploiting the vulnerability. The method by which the attacker obtains privileged credentials prior to the attack (e.g., free trial accounts), is outside the scope of this metric. Generally, self-service provisioned accounts do not constitute a privilege requirement if the attacker can grant themselves privileges as part of the attack.
User interaction: This metric captures the requirement for a human user, other than the attacker, to participate in the successful compromise of the vulnerable system. This metric determines whether the vulnerability can be exploited solely at the will of the attacker, or whether a separate user (or user-initiated process) must participate in some manner.
Vulnerable System Impact Metrics
Confidentiality: This metric measures the impact to the confidentiality of the information managed by the VULNERABLE SYSTEM due to a successfully exploited vulnerability. Confidentiality refers to limiting information access and disclosure to only authorized users, as well as preventing access by, or disclosure to, unauthorized ones.
Integrity: This metric measures the impact to integrity of a successfully exploited vulnerability. Integrity refers to the trustworthiness and veracity of information. Integrity of the VULNERABLE SYSTEM is impacted when an attacker makes unauthorized modification of system data. Integrity is also impacted when a system user can repudiate critical actions taken in the context of the system (e.g. due to insufficient logging).
Availability: This metric measures the impact to the availability of the VULNERABLE SYSTEM resulting from a successfully exploited vulnerability. While the Confidentiality and Integrity impact metrics apply to the loss of confidentiality or integrity of data (e.g., information, files) used by the system, this metric refers to the loss of availability of the impacted system itself, such as a networked service (e.g., web, database, email). Since availability refers to the accessibility of information resources, attacks that consume network bandwidth, processor cycles, or disk space all impact the availability of a system.
Subsequent System Impact Metrics
Confidentiality: This metric measures the impact to the confidentiality of the information managed by the SUBSEQUENT SYSTEM due to a successfully exploited vulnerability. Confidentiality refers to limiting information access and disclosure to only authorized users, as well as preventing access by, or disclosure to, unauthorized ones.
Integrity: This metric measures the impact to integrity of a successfully exploited vulnerability. Integrity refers to the trustworthiness and veracity of information. Integrity of the SUBSEQUENT SYSTEM is impacted when an attacker makes unauthorized modification of system data. Integrity is also impacted when a system user can repudiate critical actions taken in the context of the system (e.g. due to insufficient logging).
Availability: This metric measures the impact to the availability of the SUBSEQUENT SYSTEM resulting from a successfully exploited vulnerability. While the Confidentiality and Integrity impact metrics apply to the loss of confidentiality or integrity of data (e.g., information, files) used by the system, this metric refers to the loss of availability of the impacted system itself, such as a networked service (e.g., web, database, email). Since availability refers to the accessibility of information resources, attacks that consume network bandwidth, processor cycles, or disk space all impact the availability of a system.
CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H

CVE ID

CVE-2025-52881

Weaknesses

UNIX Symbolic Link (Symlink) Following

The product, when opening a file or directory, does not sufficiently account for when the file is a symbolic link that resolves to a target outside of the intended control sphere. This could allow an attacker to cause the product to operate on unauthorized files. Learn more on MITRE.

Race Condition Enabling Link Following

The product checks the status of a file or directory before accessing it, which produces a race condition in which the file can be replaced with a link before the access is performed, causing the product to access the wrong file. Learn more on MITRE.

Credits