Skip to content

Commit a9e7b8d

Browse files
davidhildenbrandtorvalds
authored andcommitted
kernel/resource: disallow access to exclusive system RAM regions
virtio-mem dynamically exposes memory inside a device memory region as system RAM to Linux, coordinating with the hypervisor which parts are actually "plugged" and consequently usable/accessible. On the one hand, the virtio-mem driver adds/removes whole memory blocks, creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other hand, it logically (un)plugs memory inside added memory blocks, dynamically either exposing them to the buddy or hiding them from the buddy and marking them PG_offline. In contrast to physical devices, like a DIMM, the virtio-mem driver is required to actually make use of any of the device-provided memory, because it performs the handshake with the hypervisor. virtio-mem memory cannot simply be access via /dev/mem without a driver. There is no safe way to: a) Access plugged memory blocks via /dev/mem, as they might contain unplugged holes or might get silently unplugged by the virtio-mem driver and consequently turned inaccessible. b) Access unplugged memory blocks via /dev/mem because the virtio-mem driver is required to make them actually accessible first. The virtio-spec states that unplugged memory blocks MUST NOT be written, and only selected unplugged memory blocks MAY be read. We want to make sure, this is the case in sane environments -- where the virtio-mem driver was loaded. We want to make sure that in a sane environment, nobody "accidentially" accesses unplugged memory inside the device managed region. For example, a user might spot a memory region in /proc/iomem and try accessing it via /dev/mem via gdb or dumping it via something else. By the time the mmap() happens, the memory might already have been removed by the virtio-mem driver silently: the mmap() would succeeed and user space might accidentially access unplugged memory. So once the driver was loaded and detected the device along the device-managed region, we just want to disallow any access via /dev/mem to it. In an ideal world, we would mark the whole region as busy ("owned by a driver") and exclude it; however, that would be wrong, as we don't really have actual system RAM at these ranges added to Linux ("busy system RAM"). Instead, we want to mark such ranges as "not actual busy system RAM but still soft-reserved and prepared by a driver for future use." Let's teach iomem_is_exclusive() to reject access to any range with "IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even if "iomem=relaxed" is set. Introduce EXCLUSIVE_SYSTEM_RAM to make it easier for applicable drivers to depend on this setting in their Kconfig. For now, there are no applicable ranges and we'll modify virtio-mem next to properly set IORESOURCE_EXCLUSIVE on the parent resource container it creates to contain all actual busy system RAM added via add_memory_driver_managed(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hanjun Guo <[email protected]> Cc: Jason Wang <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent b78dfa0 commit a9e7b8d

File tree

2 files changed

+26
-10
lines changed

2 files changed

+26
-10
lines changed

kernel/resource.c

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1719,26 +1719,23 @@ static int strict_iomem_checks;
17191719
#endif
17201720

17211721
/*
1722-
* check if an address is reserved in the iomem resource tree
1723-
* returns true if reserved, false if not reserved.
1722+
* Check if an address is exclusive to the kernel and must not be mapped to
1723+
* user space, for example, via /dev/mem.
1724+
*
1725+
* Returns true if exclusive to the kernel, otherwise returns false.
17241726
*/
17251727
bool iomem_is_exclusive(u64 addr)
17261728
{
1729+
const unsigned int exclusive_system_ram = IORESOURCE_SYSTEM_RAM |
1730+
IORESOURCE_EXCLUSIVE;
17271731
bool skip_children = false, err = false;
17281732
int size = PAGE_SIZE;
17291733
struct resource *p;
17301734

1731-
if (!strict_iomem_checks)
1732-
return false;
1733-
17341735
addr = addr & PAGE_MASK;
17351736

17361737
read_lock(&resource_lock);
17371738
for_each_resource(&iomem_resource, p, skip_children) {
1738-
/*
1739-
* We can probably skip the resources without
1740-
* IORESOURCE_IO attribute?
1741-
*/
17421739
if (p->start >= addr + size)
17431740
break;
17441741
if (p->end < addr) {
@@ -1747,12 +1744,24 @@ bool iomem_is_exclusive(u64 addr)
17471744
}
17481745
skip_children = false;
17491746

1747+
/*
1748+
* IORESOURCE_SYSTEM_RAM resources are exclusive if
1749+
* IORESOURCE_EXCLUSIVE is set, even if they
1750+
* are not busy and even if "iomem=relaxed" is set. The
1751+
* responsible driver dynamically adds/removes system RAM within
1752+
* such an area and uncontrolled access is dangerous.
1753+
*/
1754+
if ((p->flags & exclusive_system_ram) == exclusive_system_ram) {
1755+
err = true;
1756+
break;
1757+
}
1758+
17501759
/*
17511760
* A resource is exclusive if IORESOURCE_EXCLUSIVE is set
17521761
* or CONFIG_IO_STRICT_DEVMEM is enabled and the
17531762
* resource is busy.
17541763
*/
1755-
if ((p->flags & IORESOURCE_BUSY) == 0)
1764+
if (!strict_iomem_checks || !(p->flags & IORESOURCE_BUSY))
17561765
continue;
17571766
if (IS_ENABLED(CONFIG_IO_STRICT_DEVMEM)
17581767
|| p->flags & IORESOURCE_EXCLUSIVE) {

mm/Kconfig

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,13 @@ config NUMA_KEEP_MEMINFO
109109
config MEMORY_ISOLATION
110110
bool
111111

112+
# IORESOURCE_SYSTEM_RAM regions in the kernel resource tree that are marked
113+
# IORESOURCE_EXCLUSIVE cannot be mapped to user space, for example, via
114+
# /dev/mem.
115+
config EXCLUSIVE_SYSTEM_RAM
116+
def_bool y
117+
depends on !DEVMEM || STRICT_DEVMEM
118+
112119
#
113120
# Only be set on architectures that have completely implemented memory hotplug
114121
# feature. If you are not sure, don't touch it.

0 commit comments

Comments
 (0)