Skip to content

Commit 5796d39

Browse files
peaktocreekakpm00
authored andcommitted
mseal sysmap: kernel config and header change
Patch series "mseal system mappings", v9. As discussed during mseal() upstream process [1], mseal() protects the VMAs of a given virtual memory range against modifications, such as the read/write (RW) and no-execute (NX) bits. For complete descriptions of memory sealing, please see mseal.rst [2]. The mseal() is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. The system mappings are readonly only, memory sealing can protect them from ever changing to writable or unmmap/remapped as different attributes. System mappings such as vdso, vvar, vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), are created by the kernel during program initialization, and could be sealed after creation. Unlike the aforementioned mappings, the uprobe mapping is not established during program startup. However, its lifetime is the same as the process's lifetime [3]. It could be sealed from creation. The vsyscall on x86-64 uses a special address (0xffffffffff600000), which is outside the mm managed range. This means mprotect, munmap, and mremap won't work on the vsyscall. Since sealing doesn't enhance the vsyscall's security, it is skipped in this patch. If we ever seal the vsyscall, it is probably only for decorative purpose, i.e. showing the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored. It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may alter the system mappings during restore operations. UML(User Mode Linux) and gVisor, rr are also known to change the vdso/vvar mappings. Consequently, this feature cannot be universally enabled across all systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default. To support mseal of system mappings, architectures must define CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special mappings calls to pass mseal flag. Additionally, architectures must confirm they do not unmap/remap system mappings during the process lifetime. The existence of this flag for an architecture implies that it does not require the remapping of thest system mappings during process lifetime, so sealing these mappings is safe from a kernel perspective. This version covers x86-64 and arm64 archiecture as minimum viable feature. While no specific CPU hardware features are required for enable this feature on an archiecture, memory sealing requires a 64-bit kernel. Other architectures can choose whether or not to adopt this feature. Currently, I'm not aware of any instances in the kernel code that actively munmap/mremap a system mapping without a request from userspace. The PPC does call munmap when _install_special_mapping fails for vdso; however, it's uncertain if this will ever fail for PPC - this needs to be investigated by PPC in the future [4]. The UML kernel can add this support when KUnit tests require it [5]. In this version, we've improved the handling of system mapping sealing from previous versions, instead of modifying the _install_special_mapping function itself, which would affect all architectures, we now call _install_special_mapping with a sealing flag only within the specific architecture that requires it. This targeted approach offers two key advantages: 1) It limits the code change's impact to the necessary architectures, and 2) It aligns with the software architecture by keeping the core memory management within the mm layer, while delegating the decision of sealing system mappings to the individual architecture, which is particularly relevant since 32-bit architectures never require sealing. Prior to this patch series, we explored sealing special mappings from userspace using glibc's dynamic linker. This approach revealed several issues: - The PT_LOAD header may report an incorrect length for vdso, (smaller than its actual size). The dynamic linker, which relies on PT_LOAD information to determine mapping size, would then split and partially seal the vdso mapping. Since each architecture has its own vdso/vvar code, fixing this in the kernel would require going through each archiecture. Our initial goal was to enable sealing readonly mappings, e.g. .text, across all architectures, sealing vdso from kernel since creation appears to be simpler than sealing vdso at glibc. - The [vvar] mapping header only contains address information, not length information. Similar issues might exist for other special mappings. - Mappings like uprobe are not covered by the dynamic linker, and there is no effective solution for them. This feature's security enhancements will benefit ChromeOS, Android, and other high security systems. Testing: This feature was tested on ChromeOS and Android for both x86-64 and ARM64. - Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly, i.e. "sl" shown in the smaps for those mappings, and mremap is blocked. - Passing various automation tests (e.g. pre-checkin) on ChromeOS and Android to ensure the sealing doesn't affect the functionality of Chromebook and Android phone. I also tested the feature on Ubuntu on x86-64: - With config disabled, vdso/vvar is not sealed, - with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK, normal operations such as browsing the web, open/edit doc are OK. Link: https://lore.kernel.org/all/[email protected]/ [1] Link: Documentation/userspace-api/mseal.rst [2] Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3] Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/ [4] Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5] This patch (of 7): Provide infrastructure to mseal system mappings. Establish two kernel configs (CONFIG_MSEAL_SYSTEM_MAPPINGS, ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future patches. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Xu <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Cc: Adhemerval Zanella <[email protected]> Cc: Alexander Mikhalitsyn <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Anna-Maria Behnsen <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Benjamin Berg <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Rientjes <[email protected]> Cc: David S. Miller <[email protected]> Cc: Elliot Hughes <[email protected]> Cc: Florian Faineli <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Jorge Lucangeli Obes <[email protected]> Cc: Linus Waleij <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcow (Oracle) <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Pedro Falcato <[email protected]> Cc: Peter Xu <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Stephen Röttger <[email protected]> Cc: Thomas Weißschuh <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 02d9e1a commit 5796d39

File tree

3 files changed

+53
-0
lines changed

3 files changed

+53
-0
lines changed

include/linux/mm.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4236,4 +4236,14 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st
42364236
int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
42374237
int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
42384238

4239+
4240+
/*
4241+
* mseal of userspace process's system mappings.
4242+
*/
4243+
#ifdef CONFIG_MSEAL_SYSTEM_MAPPINGS
4244+
#define VM_SEALED_SYSMAP VM_SEALED
4245+
#else
4246+
#define VM_SEALED_SYSMAP VM_NONE
4247+
#endif
4248+
42394249
#endif /* _LINUX_MM_H */

init/Kconfig

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1888,6 +1888,28 @@ config ARCH_HAS_MEMBARRIER_CALLBACKS
18881888
config ARCH_HAS_MEMBARRIER_SYNC_CORE
18891889
bool
18901890

1891+
config ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
1892+
bool
1893+
help
1894+
Control MSEAL_SYSTEM_MAPPINGS access based on architecture.
1895+
1896+
A 64-bit kernel is required for the memory sealing feature.
1897+
No specific hardware features from the CPU are needed.
1898+
1899+
To enable this feature, the architecture needs to update their
1900+
special mappings calls to include the sealing flag and confirm
1901+
that it doesn't unmap/remap system mappings during the life
1902+
time of the process. The existence of this flag for an architecture
1903+
implies that it does not require the remapping of the system
1904+
mappings during process lifetime, so sealing these mappings is safe
1905+
from a kernel perspective.
1906+
1907+
After the architecture enables this, a distribution can set
1908+
CONFIG_MSEAL_SYSTEM_MAPPING to manage access to the feature.
1909+
1910+
For complete descriptions of memory sealing, please see
1911+
Documentation/userspace-api/mseal.rst
1912+
18911913
config HAVE_PERF_EVENTS
18921914
bool
18931915
help

security/Kconfig

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,27 @@ config PROC_MEM_NO_FORCE
5151

5252
endchoice
5353

54+
config MSEAL_SYSTEM_MAPPINGS
55+
bool "mseal system mappings"
56+
depends on 64BIT
57+
depends on ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
58+
depends on !CHECKPOINT_RESTORE
59+
help
60+
Apply mseal on system mappings.
61+
The system mappings includes vdso, vvar, vvar_vclock,
62+
vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
63+
64+
A 64-bit kernel is required for the memory sealing feature.
65+
No specific hardware features from the CPU are needed.
66+
67+
WARNING: This feature breaks programs which rely on relocating
68+
or unmapping system mappings. Known broken software at the time
69+
of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
70+
this config can't be enabled universally.
71+
72+
For complete descriptions of memory sealing, please see
73+
Documentation/userspace-api/mseal.rst
74+
5475
config SECURITY
5576
bool "Enable different security models"
5677
depends on SYSFS

0 commit comments

Comments
 (0)