|
| 1 | +# Memory Hotplugging with virtio-mem |
| 2 | + |
| 3 | +## What is virtio-mem |
| 4 | + |
| 5 | +`virtio-mem` is a para-virtualized memory device that enables dynamic memory |
| 6 | +resizing for virtual machines. Unlike traditional memory hot-plug mechanisms, |
| 7 | +`virtio-mem` provides a flexible and efficient solution that works across |
| 8 | +different architectures and avoids many limitations of older approaches. |
| 9 | + |
| 10 | +The `virtio-mem` device manages a contiguous memory region that is divided into |
| 11 | +fixed-size blocks. The host can request the guest to plug (make available) or |
| 12 | +unplug (release) memory by changing the device's target size, and the guest |
| 13 | +driver responds by allocating or freeing memory blocks accordingly. This |
| 14 | +approach provides fine-grained control over guest memory with minimal overhead. |
| 15 | + |
| 16 | +Firecracker further adds the concept of slots, which are a set of contiguous |
| 17 | +blocks (usually 128MiB) that can be fully protected from guest accesses to |
| 18 | +prevent malicious guests from accessing the hotpluggable memory range when not |
| 19 | +allowed by the host. |
| 20 | + |
| 21 | +## Prerequisites |
| 22 | + |
| 23 | +To support memory hotplugging via `virtio-mem`, you must use a guest kernel with |
| 24 | +the appropriate version and configuration options enabled as follows: |
| 25 | + |
| 26 | +#### Kernel Version |
| 27 | + |
| 28 | +For x86_64, Firecracker requires a kernel version >=5.16 as it requires the |
| 29 | +guest driver to negotiate `VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE`. For aarch64, a |
| 30 | +kernel >=5.18 is required. Refer to the |
| 31 | +[kernel policy documentation](kernel-policy.md) for a list of officially |
| 32 | +supported guest kernels. |
| 33 | + |
| 34 | +#### Kernel Config |
| 35 | + |
| 36 | +The following kernel configuration options are required: |
| 37 | + |
| 38 | +- `CONFIG_VIRTIO_MEM=y`: Enables the `virtio-mem` driver |
| 39 | +- `CONFIG_MEMORY_HOTPLUG=y`: Enables memory hotplug support |
| 40 | +- `CONFIG_MEMORY_HOTREMOVE=y`: Enables memory hot-remove support (recommended) |
| 41 | +- `CONFIG_STRICT_DEVMEM`: (aarch64 only) required to enable the `virtio-mem` |
| 42 | + driver. |
| 43 | + |
| 44 | +## Adding hotpluggable memory |
| 45 | + |
| 46 | +The `virtio-mem` device must be configured during VM setup with the total amount |
| 47 | +of memory that can be hotplugged, before starting the virtual machine. This can |
| 48 | +be done through a `PUT` request on `/hotplug/memory` or by including the |
| 49 | +configuration in the JSON configuration file. |
| 50 | + |
| 51 | +### Configuration Parameters |
| 52 | + |
| 53 | +- `total_size_mib` (required): The maximum size of hotpluggable memory in MiB. |
| 54 | + This defines the upper bound of memory that can be added to the VM. Must be a |
| 55 | + multiple of `slot_size_mib`. |
| 56 | + |
| 57 | +- `block_size_mib` (optional, default: 2): The size of individual memory blocks |
| 58 | + in MiB. Must be at least 2 MiB and a power of 2. Larger block sizes provide |
| 59 | + better performance but less granularity (harder for the guest to unplug). |
| 60 | + |
| 61 | +- `slot_size_mib` (optional, default: 128): The size of KVM memory slots in MiB. |
| 62 | + Must be at least `block_size_mib` and a power of 2. Larger slot sizes improve |
| 63 | + performance for large memory operations but reduce unplugging protection |
| 64 | + efficiency. |
| 65 | + |
| 66 | +We recommend leaving these values to the default unless strict memory protection |
| 67 | +is required, in which case `block_size_mib` should be equal to `slot_size_mib`. |
| 68 | +Note that this will make it harder for the guest kernel to find contiguous |
| 69 | +memory to hot-un-plug. Refer to the [Memory Protection](#memory-protection) |
| 70 | +section below for more details. |
| 71 | + |
| 72 | +### API Configuration |
| 73 | + |
| 74 | +Here is an example of how to configure the `virtio-mem` device via the API: |
| 75 | + |
| 76 | +```console |
| 77 | +socket_location=/run/firecracker.socket |
| 78 | + |
| 79 | +curl --unix-socket $socket_location -i \ |
| 80 | + -X PUT 'http://localhost/hotplug/memory' \ |
| 81 | + -H 'Accept: application/json' \ |
| 82 | + -H 'Content-Type: application/json' \ |
| 83 | + -d "{ |
| 84 | + \"total_size_mib\": 1024, |
| 85 | + \"block_size_mib\": 2, |
| 86 | + \"slot_size_mib\": 128 |
| 87 | + }" |
| 88 | +``` |
| 89 | + |
| 90 | +Note that this is only allowed before the `InstanceStart` action and not on |
| 91 | +snapshot-restored VMs (which will use the configuration saved in the snapshot). |
| 92 | + |
| 93 | +### JSON Configuration |
| 94 | + |
| 95 | +To configure via JSON, add the following to your VM configuration file: |
| 96 | + |
| 97 | +```json |
| 98 | +{ |
| 99 | + "memory-hotplug": { |
| 100 | + "total_size_mib": 1024, |
| 101 | + "block_size_mib": 2, |
| 102 | + "slot_size_mib": 128 |
| 103 | + } |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +### Checking Device Status |
| 108 | + |
| 109 | +After configuration, you can query the device status at any time: |
| 110 | + |
| 111 | +```console |
| 112 | +socket_location=/run/firecracker.socket |
| 113 | + |
| 114 | +curl --unix-socket $socket_location -i \ |
| 115 | + -X GET 'http://localhost/hotplug/memory' \ |
| 116 | + -H 'Accept: application/json' |
| 117 | +``` |
| 118 | + |
| 119 | +This returns information about the current device state, including: |
| 120 | + |
| 121 | +- `total_size_mib`: Maximum hotpluggable memory size |
| 122 | +- `block_size_mib`: Block size used by the device |
| 123 | +- `slot_size_mib`: KVM slot size |
| 124 | +- `plugged_size_mib`: Currently plugged (available) memory by the guest |
| 125 | +- `requested_size_mib`: Target memory size set by the host |
| 126 | + |
| 127 | +## Operating the virtio-mem device |
| 128 | + |
| 129 | +Once configured and the VM is running, you can dynamically adjust the amount of |
| 130 | +memory available to the guest by updating the requested size. |
| 131 | + |
| 132 | +### Hot-plugging Memory |
| 133 | + |
| 134 | +To add memory to a running VM, request a greater size from the `virtio-mem` |
| 135 | +device: |
| 136 | + |
| 137 | +```console |
| 138 | +socket_location=/run/firecracker.socket |
| 139 | + |
| 140 | +curl --unix-socket $socket_location -i \ |
| 141 | + -X PATCH 'http://localhost/hotplug/memory' \ |
| 142 | + -H 'Accept: application/json' \ |
| 143 | + -H 'Content-Type: application/json' \ |
| 144 | + -d "{ |
| 145 | + \"requested_size_mib\": 512 |
| 146 | + }" |
| 147 | +``` |
| 148 | + |
| 149 | +This updates the target memory size. The guest driver will detect this change |
| 150 | +and allocate memory blocks to reach the requested size. The process is |
| 151 | +asynchronous - the guest will incrementally plug memory until it reaches the |
| 152 | +target. It is recommended to use the GET API to monitor the current state of the |
| 153 | +hot-plugging by the driver. |
| 154 | + |
| 155 | +### Hot-removing Memory |
| 156 | + |
| 157 | +To remove memory from a running VM, request a lower size: |
| 158 | + |
| 159 | +```console |
| 160 | +socket_location=/run/firecracker.socket |
| 161 | + |
| 162 | +curl --unix-socket $socket_location -i \ |
| 163 | + -X PATCH 'http://localhost/hotplug/memory' \ |
| 164 | + -H 'Accept: application/json' \ |
| 165 | + -H 'Content-Type: application/json' \ |
| 166 | + -d "{ |
| 167 | + \"requested_size_mib\": 256 |
| 168 | + }" |
| 169 | +``` |
| 170 | + |
| 171 | +Setting a lower `requested_size_mib` value causes the guest driver to free |
| 172 | +memory blocks. Once the guest reports a block to be unplugged, the unplugged |
| 173 | +memory is immediately freed from the host process. If all blocks in a memory |
| 174 | +slot are unplugged, then Firecracker will also protect the memory slot, removing |
| 175 | +access from the guest. |
| 176 | + |
| 177 | +To remove all hotplugged memory, set `requested_size_mib` to 0: |
| 178 | + |
| 179 | +```console |
| 180 | +curl --unix-socket $socket_location -i \ |
| 181 | + -X PATCH 'http://localhost/hotplug/memory' \ |
| 182 | + -H 'Accept: application/json' \ |
| 183 | + -H 'Content-Type: application/json' \ |
| 184 | + -d '{"requested_size_mib": 0}' |
| 185 | +``` |
| 186 | + |
| 187 | +Note that this requires the guest to actually be able to find and report memory |
| 188 | +blocks that can be moved or freed. |
| 189 | + |
| 190 | +## Configuring the guest driver |
| 191 | + |
| 192 | +The guest kernel must be configured with specific boot or runtime module |
| 193 | +parameters to ensure optimal behavior of the `virtio-mem` driver and memory |
| 194 | +hotplug module. |
| 195 | + |
| 196 | +In short: |
| 197 | + |
| 198 | +- pass `memhp_default_state=online_movable` if hot-removal is required and the |
| 199 | + hotpluggable memory area is not much bigger than the normal memory. |
| 200 | +- pass `memory_hotplug.memmap_on_memory=1 memhp_default_state=online` if |
| 201 | + hot-removal is not required and the hotpluggable memory area can be much |
| 202 | + bigger than the normal memory. |
| 203 | + |
| 204 | +#### `memhp_default_state` |
| 205 | + |
| 206 | +This parameter controls how newly hotplugged memory is onlined by the kernel. |
| 207 | +This parameter is required for automatically onlining new memory pages. It is |
| 208 | +recommended to set it to `online_movable` as below for reliable memory |
| 209 | +hot-removal. |
| 210 | + |
| 211 | +``` |
| 212 | +memhp_default_state=online_movable |
| 213 | +``` |
| 214 | + |
| 215 | +The `online_movable` setting ensures that: |
| 216 | + |
| 217 | +- Hotplugged memory is placed in the MOVABLE zone |
| 218 | +- The kernel can migrate pages when unplugging is requested |
| 219 | +- Memory can be successfully freed back to the host |
| 220 | + |
| 221 | +Other possible values (not recommended for hot-removal): |
| 222 | + |
| 223 | +- `online`: Places memory automatically between NORMAL and MOVABLE zone (may |
| 224 | + prevent hot-remove) |
| 225 | +- `online_kernel`: Places memory in NORMAL zone (may prevent hot-remove) |
| 226 | +- `offline` (default): Memory requires manual onlining |
| 227 | + |
| 228 | +#### `memory_hotplug.memmap_on_memory` (optional) |
| 229 | + |
| 230 | +This parameter controls whether the kernel allocates memory map (`struct pages`) |
| 231 | +for hotplugged memory from the hotplugged memory itself, rather than from boot |
| 232 | +memory. Without this parameter, the kernel needs 64B for every 4KiB page in the |
| 233 | +boot memory. For example, it would need 262 MiB of free "boot" memory to hotplug |
| 234 | +16 GiB of memory. This parameter only works if the memory is not entirely |
| 235 | +hotplugged as MOVABLE. |
| 236 | + |
| 237 | +``` |
| 238 | +memory_hotplug.memmap_on_memory=1 memhp_default_state=online |
| 239 | +``` |
| 240 | + |
| 241 | +This configuration is recommended in case hot-removal is not a priority, and the |
| 242 | +hotpluggable memory area is very large. |
| 243 | + |
| 244 | +#### Additional Resources |
| 245 | + |
| 246 | +For more detailed and up-to-date information about memory hotplug in the Linux |
| 247 | +kernel, refer to the official kernel documentation: |
| 248 | +[Memory Hotplug](https://docs.kernel.org/admin-guide/mm/memory-hotplug.html) |
| 249 | + |
| 250 | +## Security Considerations |
| 251 | + |
| 252 | +**The `virtio-mem` device is a paravirtualized device requiring cooperation from |
| 253 | +a driver in the guest.** |
| 254 | + |
| 255 | +### Memory Protection |
| 256 | + |
| 257 | +Firecracker provides the following security guarantees: |
| 258 | + |
| 259 | +- **Memory that is never plugged is protected**: Memory that has never been |
| 260 | + plugged before is protected from the guest by not making it available to the |
| 261 | + guest via a KVM slot and by using `mprotect` to prevent access from device |
| 262 | + emulation. Any attempt by the guest to access unplugged memory will result in |
| 263 | + a fault and may crash the Firecracker process. |
| 264 | +- **Unplugged memory slots are protected**: Memory slots that have been |
| 265 | + unplugged are removed from KVM and `mprotect`-ed. This requires the guest to |
| 266 | + report contiguous blocks to be freed for the memory slot to be actually |
| 267 | + protected. |
| 268 | +- **Unplugged memory blocks are freed**: When a memory block is unplugged, the |
| 269 | + backing pages are freed, for example using `madvise(MADV_DONTNEED)` for anon |
| 270 | + memory, returning memory to the host at block granularity. |
| 271 | +- **Memory isolation**: Unplugged memory never leaks between Firecracker |
| 272 | + processes. Memory mappings use `MAP_PRIVATE` and `MAP_ANONYMOUS` flags, |
| 273 | + ensuring that even if physical pages are reused, they are zeroed on access. |
| 274 | + |
| 275 | +### Trust Model |
| 276 | + |
| 277 | +While Firecracker enforces memory isolation at the host level, a compromised |
| 278 | +guest driver could: |
| 279 | + |
| 280 | +- Fail to plug or unplug memory as requested by the device |
| 281 | +- Attempt to access unplugged memory (will result in a fault and crash of |
| 282 | + Firecracker) |
| 283 | + |
| 284 | +Users should: |
| 285 | + |
| 286 | +- Be prepared to handle cases where the guest doesn't cooperate with memory |
| 287 | + operations by monitoring the GET API. |
| 288 | +- Implement host-level memory limits and monitoring, eg through cgroup. |
| 289 | + |
| 290 | +## Compatibility with Other Features |
| 291 | + |
| 292 | +`virtio-mem` is compatible with all Firecracker features. Below are some |
| 293 | +specific changes in the other features when using memory hotplugging. |
| 294 | + |
| 295 | +### Snapshots |
| 296 | + |
| 297 | +Full and diff snapshots will include the unplugged areas as sparse "holes" in |
| 298 | +the memory snapshot file. Sparse file support is recommended to efficiently |
| 299 | +handle the memory snapshot files. |
| 300 | + |
| 301 | +### Uffd |
| 302 | + |
| 303 | +The uffd handler will need to handle the entire hotpluggable memory range even |
| 304 | +if unplugged. The uffd handler may decide to unregister unplugged memory ranges |
| 305 | +(holes in the memory file). The uffd handler will also need to handle |
| 306 | +`UFFD_EVENT_REMOVE` events for hot-removed blocks, either unregistering the |
| 307 | +range or storing the information and returning an empty page on the next access. |
| 308 | + |
| 309 | +### Vhost-user |
| 310 | + |
| 311 | +`vhost-user` is fully supported, but Firecracker cannot guarantee protection of |
| 312 | +unplugged memory from a `vhost-user` backend. A malicious guest driver may be |
| 313 | +able to trick the backend to access unplugged memory. This is not possible in |
| 314 | +Firecracker itself as unplugged memory slots are `mprotect`-ed. |
0 commit comments