diff --git a/docs/memory-hotplug.md b/docs/memory-hotplug.md new file mode 100644 index 00000000000..b9a24ded0be --- /dev/null +++ b/docs/memory-hotplug.md @@ -0,0 +1,314 @@ +# Memory Hotplugging with virtio-mem + +## What is virtio-mem + +`virtio-mem` is a para-virtualized memory device that enables dynamic memory +resizing for virtual machines. Unlike traditional memory hot-plug mechanisms, +`virtio-mem` provides a flexible and efficient solution that works across +different architectures and avoids many limitations of older approaches. + +The `virtio-mem` device manages a contiguous memory region that is divided into +fixed-size blocks. The host can request the guest to plug (make available) or +unplug (release) memory by changing the device's target size, and the guest +driver responds by allocating or freeing memory blocks accordingly. This +approach provides fine-grained control over guest memory with minimal overhead. + +Firecracker further adds the concept of slots, which are a set of contiguous +blocks (usually 128MiB) that can be fully protected from guest accesses to +prevent malicious guests from accessing the hotpluggable memory range when not +allowed by the host. + +## Prerequisites + +To support memory hotplugging via `virtio-mem`, you must use a guest kernel with +the appropriate version and configuration options enabled as follows: + +#### Kernel Version + +For x86_64, Firecracker requires a kernel version >=5.16 as it requires the +guest driver to negotiate `VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE`. For aarch64, a +kernel >=5.18 is required. Refer to the +[kernel policy documentation](kernel-policy.md) for a list of officially +supported guest kernels. + +#### Kernel Config + +The following kernel configuration options are required: + +- `CONFIG_VIRTIO_MEM=y`: Enables the `virtio-mem` driver +- `CONFIG_MEMORY_HOTPLUG=y`: Enables memory hotplug support +- `CONFIG_MEMORY_HOTREMOVE=y`: Enables memory hot-remove support (recommended) +- `CONFIG_STRICT_DEVMEM`: (aarch64 only) required to enable the `virtio-mem` + driver. + +## Adding hotpluggable memory + +The `virtio-mem` device must be configured during VM setup with the total amount +of memory that can be hotplugged, before starting the virtual machine. This can +be done through a `PUT` request on `/hotplug/memory` or by including the +configuration in the JSON configuration file. + +### Configuration Parameters + +- `total_size_mib` (required): The maximum size of hotpluggable memory in MiB. + This defines the upper bound of memory that can be added to the VM. Must be a + multiple of `slot_size_mib`. + +- `block_size_mib` (optional, default: 2): The size of individual memory blocks + in MiB. Must be at least 2 MiB and a power of 2. Larger block sizes provide + better performance but less granularity (harder for the guest to unplug). + +- `slot_size_mib` (optional, default: 128): The size of KVM memory slots in MiB. + Must be at least `block_size_mib` and a power of 2. Larger slot sizes improve + performance for large memory operations but reduce unplugging protection + efficiency. + +We recommend leaving these values to the default unless strict memory protection +is required, in which case `block_size_mib` should be equal to `slot_size_mib`. +Note that this will make it harder for the guest kernel to find contiguous +memory to hot-un-plug. Refer to the [Memory Protection](#memory-protection) +section below for more details. + +### API Configuration + +Here is an example of how to configure the `virtio-mem` device via the API: + +```console +socket_location=/run/firecracker.socket + +curl --unix-socket $socket_location -i \ + -X PUT 'http://localhost/hotplug/memory' \ + -H 'Accept: application/json' \ + -H 'Content-Type: application/json' \ + -d "{ + \"total_size_mib\": 1024, + \"block_size_mib\": 2, + \"slot_size_mib\": 128 + }" +``` + +Note that this is only allowed before the `InstanceStart` action and not on +snapshot-restored VMs (which will use the configuration saved in the snapshot). + +### JSON Configuration + +To configure via JSON, add the following to your VM configuration file: + +```json +{ + "memory-hotplug": { + "total_size_mib": 1024, + "block_size_mib": 2, + "slot_size_mib": 128 + } +} +``` + +### Checking Device Status + +After configuration, you can query the device status at any time: + +```console +socket_location=/run/firecracker.socket + +curl --unix-socket $socket_location -i \ + -X GET 'http://localhost/hotplug/memory' \ + -H 'Accept: application/json' +``` + +This returns information about the current device state, including: + +- `total_size_mib`: Maximum hotpluggable memory size +- `block_size_mib`: Block size used by the device +- `slot_size_mib`: KVM slot size +- `plugged_size_mib`: Currently plugged (available) memory by the guest +- `requested_size_mib`: Target memory size set by the host + +## Operating the virtio-mem device + +Once configured and the VM is running, you can dynamically adjust the amount of +memory available to the guest by updating the requested size. + +### Hot-plugging Memory + +To add memory to a running VM, request a greater size from the `virtio-mem` +device: + +```console +socket_location=/run/firecracker.socket + +curl --unix-socket $socket_location -i \ + -X PATCH 'http://localhost/hotplug/memory' \ + -H 'Accept: application/json' \ + -H 'Content-Type: application/json' \ + -d "{ + \"requested_size_mib\": 512 + }" +``` + +This updates the target memory size. The guest driver will detect this change +and allocate memory blocks to reach the requested size. The process is +asynchronous - the guest will incrementally plug memory until it reaches the +target. It is recommended to use the GET API to monitor the current state of the +hot-plugging by the driver. + +### Hot-removing Memory + +To remove memory from a running VM, request a lower size: + +```console +socket_location=/run/firecracker.socket + +curl --unix-socket $socket_location -i \ + -X PATCH 'http://localhost/hotplug/memory' \ + -H 'Accept: application/json' \ + -H 'Content-Type: application/json' \ + -d "{ + \"requested_size_mib\": 256 + }" +``` + +Setting a lower `requested_size_mib` value causes the guest driver to free +memory blocks. Once the guest reports a block to be unplugged, the unplugged +memory is immediately freed from the host process. If all blocks in a memory +slot are unplugged, then Firecracker will also protect the memory slot, removing +access from the guest. + +To remove all hotplugged memory, set `requested_size_mib` to 0: + +```console +curl --unix-socket $socket_location -i \ + -X PATCH 'http://localhost/hotplug/memory' \ + -H 'Accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{"requested_size_mib": 0}' +``` + +Note that this requires the guest to actually be able to find and report memory +blocks that can be moved or freed. + +## Configuring the guest driver + +The guest kernel must be configured with specific boot or runtime module +parameters to ensure optimal behavior of the `virtio-mem` driver and memory +hotplug module. + +In short: + +- pass `memhp_default_state=online_movable` if hot-removal is required and the + hotpluggable memory area is not much bigger than the normal memory. +- pass `memory_hotplug.memmap_on_memory=1 memhp_default_state=online` if + hot-removal is not required and the hotpluggable memory area can be much + bigger than the normal memory. + +#### `memhp_default_state` + +This parameter controls how newly hotplugged memory is onlined by the kernel. +This parameter is required for automatically onlining new memory pages. It is +recommended to set it to `online_movable` as below for reliable memory +hot-removal. + +``` +memhp_default_state=online_movable +``` + +The `online_movable` setting ensures that: + +- Hotplugged memory is placed in the MOVABLE zone +- The kernel can migrate pages when unplugging is requested +- Memory can be successfully freed back to the host + +Other possible values (not recommended for hot-removal): + +- `online`: Places memory automatically between NORMAL and MOVABLE zone (may + prevent hot-remove) +- `online_kernel`: Places memory in NORMAL zone (may prevent hot-remove) +- `offline` (default): Memory requires manual onlining + +#### `memory_hotplug.memmap_on_memory` (optional) + +This parameter controls whether the kernel allocates memory map (`struct pages`) +for hotplugged memory from the hotplugged memory itself, rather than from boot +memory. Without this parameter, the kernel needs 64B for every 4KiB page in the +boot memory. For example, it would need 262 MiB of free "boot" memory to hotplug +16 GiB of memory. This parameter only works if the memory is not entirely +hotplugged as MOVABLE. + +``` +memory_hotplug.memmap_on_memory=1 memhp_default_state=online +``` + +This configuration is recommended in case hot-removal is not a priority, and the +hotpluggable memory area is very large. + +#### Additional Resources + +For more detailed and up-to-date information about memory hotplug in the Linux +kernel, refer to the official kernel documentation: +[Memory Hotplug](https://docs.kernel.org/admin-guide/mm/memory-hotplug.html) + +## Security Considerations + +**The `virtio-mem` device is a paravirtualized device requiring cooperation from +a driver in the guest.** + +### Memory Protection + +Firecracker provides the following security guarantees: + +- **Memory that is never plugged is protected**: Memory that has never been + plugged before is protected from the guest by not making it available to the + guest via a KVM slot and by using `mprotect` to prevent access from device + emulation. Any attempt by the guest to access unplugged memory will result in + a fault and may crash the Firecracker process. +- **Unplugged memory slots are protected**: Memory slots that have been + unplugged are removed from KVM and `mprotect`-ed. This requires the guest to + report contiguous blocks to be freed for the memory slot to be actually + protected. +- **Unplugged memory blocks are freed**: When a memory block is unplugged, the + backing pages are freed, for example using `madvise(MADV_DONTNEED)` for anon + memory, returning memory to the host at block granularity. +- **Memory isolation**: Unplugged memory never leaks between Firecracker + processes. Memory mappings use `MAP_PRIVATE` and `MAP_ANONYMOUS` flags, + ensuring that even if physical pages are reused, they are zeroed on access. + +### Trust Model + +While Firecracker enforces memory isolation at the host level, a compromised +guest driver could: + +- Fail to plug or unplug memory as requested by the device +- Attempt to access unplugged memory (will result in a fault and crash of + Firecracker) + +Users should: + +- Be prepared to handle cases where the guest doesn't cooperate with memory + operations by monitoring the GET API. +- Implement host-level memory limits and monitoring, eg through cgroup. + +## Compatibility with Other Features + +`virtio-mem` is compatible with all Firecracker features. Below are some +specific changes in the other features when using memory hotplugging. + +### Snapshots + +Full and diff snapshots will include the unplugged areas as sparse "holes" in +the memory snapshot file. Sparse file support is recommended to efficiently +handle the memory snapshot files. + +### Uffd + +The uffd handler will need to handle the entire hotpluggable memory range even +if unplugged. The uffd handler may decide to unregister unplugged memory ranges +(holes in the memory file). The uffd handler will also need to handle +`UFFD_EVENT_REMOVE` events for hot-removed blocks, either unregistering the +range or storing the information and returning an empty page on the next access. + +### Vhost-user + +`vhost-user` is fully supported, but Firecracker cannot guarantee protection of +unplugged memory from a `vhost-user` backend. A malicious guest driver may be +able to trick the backend to access unplugged memory. This is not possible in +Firecracker itself as unplugged memory slots are `mprotect`-ed.