Skip to content

Commit c3aab3e

Browse files
committed
doc(virtio-mem): add memory hotplug documentation
This adds documentation for the new memory hotplug feature. Signed-off-by: Riccardo Mancini <[email protected]>
1 parent 0c156c4 commit c3aab3e

File tree

1 file changed

+314
-0
lines changed

1 file changed

+314
-0
lines changed

docs/memory-hotplug.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
# Memory Hotplugging with virtio-mem
2+
3+
## What is virtio-mem
4+
5+
`virtio-mem` is a para-virtualized memory device that enables dynamic memory
6+
resizing for virtual machines. Unlike traditional memory hot-plug mechanisms,
7+
`virtio-mem` provides a flexible and efficient solution that works across
8+
different architectures and avoids many limitations of older approaches.
9+
10+
The `virtio-mem` device manages a contiguous memory region that is divided into
11+
fixed-size blocks. The host can request the guest to plug (make available) or
12+
unplug (release) memory by changing the device's target size, and the guest
13+
driver responds by allocating or freeing memory blocks accordingly. This
14+
approach provides fine-grained control over guest memory with minimal overhead.
15+
16+
Firecracker further adds the concept of slots, which are a set of contiguous
17+
blocks (usually 128MiB) that can be fully protected from guest accesses to
18+
prevent malicious guests from accessing the hotpluggable memory range when not
19+
allowed by the host.
20+
21+
## Prerequisites
22+
23+
To support memory hotplugging via `virtio-mem`, you must use a guest kernel with
24+
the appropriate version and configuration options enabled as follows:
25+
26+
#### Kernel Version
27+
28+
For x86_64, Firecracker requires a kernel version >=5.16 as it requires the
29+
guest driver to negotiate `VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE`. For aarch64, a
30+
kernel >=5.18 is required. Refer to the
31+
[kernel policy documentation](kernel_policy.md) for a list of officially
32+
supported guest kernels.
33+
34+
#### Kernel Config
35+
36+
The following kernel configuration options are required:
37+
38+
- `CONFIG_VIRTIO_MEM=y`: Enables the `virtio-mem` driver
39+
- `CONFIG_MEMORY_HOTPLUG=y`: Enables memory hotplug support
40+
- `CONFIG_MEMORY_HOTREMOVE=y`: Enables memory hot-remove support (recommended)
41+
- `CONFIG_STRICT_DEVMEM`: (aarch64 only) required to enable the `virtio-mem`
42+
driver.
43+
44+
## Adding hotpluggable memory
45+
46+
The `virtio-mem` device must be configured during VM setup with the total amount
47+
of memory that can be hotplugged, before starting the virtual machine. This can
48+
be done through a `PUT` request on `/hotplug/memory` or by including the
49+
configuration in the JSON configuration file.
50+
51+
### Configuration Parameters
52+
53+
- `total_size_mib` (required): The maximum size of hotpluggable memory in MiB.
54+
This defines the upper bound of memory that can be added to the VM. Must be a
55+
multiple of `slot_size_mib`.
56+
57+
- `block_size_mib` (optional, default: 2): The size of individual memory blocks
58+
in MiB. Must be at least 2 MiB and a power of 2. Larger block sizes provide
59+
better performance but less granularity (harder for the guest to unplug).
60+
61+
- `slot_size_mib` (optional, default: 128): The size of KVM memory slots in MiB.
62+
Must be at least `block_size_mib` and a power of 2. Larger slot sizes improve
63+
performance for large memory operations but reduce unplugging protection
64+
efficiency.
65+
66+
We recommend leaving these values to the default unless strict memory protection
67+
is required, in which case `block_size_mib` should be equal to `slot_size_mib`.
68+
Note that this will make it harder for the guest kernel to find contiguous
69+
memory to hot-un-plug. Refer to the [Memory Protection](#memory-protection)
70+
section below for more details.
71+
72+
### API Configuration
73+
74+
Here is an example of how to configure the `virtio-mem` device via the API:
75+
76+
```console
77+
socket_location=/run/firecracker.socket
78+
79+
curl --unix-socket $socket_location -i \
80+
-X PUT 'http://localhost/hotplug/memory' \
81+
-H 'Accept: application/json' \
82+
-H 'Content-Type: application/json' \
83+
-d "{
84+
\"total_size_mib\": 1024,
85+
\"block_size_mib\": 2,
86+
\"slot_size_mib\": 128
87+
}"
88+
```
89+
90+
Note that this is only allowed before the `InstanceStart` action and not on
91+
snapshot-restored VMs (which will use the configuration saved in the snapshot).
92+
93+
### JSON Configuration
94+
95+
To configure via JSON, add the following to your VM configuration file:
96+
97+
```json
98+
{
99+
"memory-hotplug": {
100+
"total_size_mib": 1024,
101+
"block_size_mib": 2,
102+
"slot_size_mib": 128
103+
}
104+
}
105+
```
106+
107+
### Checking Device Status
108+
109+
After configuration, you can query the device status at any time:
110+
111+
```console
112+
socket_location=/run/firecracker.socket
113+
114+
curl --unix-socket $socket_location -i \
115+
-X GET 'http://localhost/hotplug/memory' \
116+
-H 'Accept: application/json'
117+
```
118+
119+
This returns information about the current device state, including:
120+
121+
- `total_size_mib`: Maximum hotpluggable memory size
122+
- `block_size_mib`: Block size used by the device
123+
- `slot_size_mib`: KVM slot size
124+
- `plugged_size_mib`: Currently plugged (available) memory by the guest
125+
- `requested_size_mib`: Target memory size set by the host
126+
127+
## Operating the virtio-mem device
128+
129+
Once configured and the VM is running, you can dynamically adjust the amount of
130+
memory available to the guest by updating the requested size.
131+
132+
### Hot-plugging Memory
133+
134+
To add memory to a running VM, request a greater size from the `virtio-mem`
135+
device:
136+
137+
```console
138+
socket_location=/run/firecracker.socket
139+
140+
curl --unix-socket $socket_location -i \
141+
-X PATCH 'http://localhost/hotplug/memory' \
142+
-H 'Accept: application/json' \
143+
-H 'Content-Type: application/json' \
144+
-d "{
145+
\"requested_size_mib\": 512
146+
}"
147+
```
148+
149+
This updates the target memory size. The guest driver will detect this change
150+
and allocate memory blocks to reach the requested size. The process is
151+
asynchronous - the guest will incrementally plug memory until it reaches the
152+
target. It is recommended to use the GET API to monitor the current state of the
153+
hot-plugging by the driver.
154+
155+
### Hot-removing Memory
156+
157+
To remove memory from a running VM, request a lower size:
158+
159+
```console
160+
socket_location=/run/firecracker.socket
161+
162+
curl --unix-socket $socket_location -i \
163+
-X PATCH 'http://localhost/hotplug/memory' \
164+
-H 'Accept: application/json' \
165+
-H 'Content-Type: application/json' \
166+
-d "{
167+
\"requested_size_mib\": 256
168+
}"
169+
```
170+
171+
Setting a lower `requested_size_mib` value causes the guest driver to free
172+
memory blocks. Once the guest reports a block to be unplugged, the unplugged
173+
memory is immediately freed from the host process. If all blocks in a memory
174+
slot are unplugged, then Firecracker will also protect the memory slot, removing
175+
access from the guest.
176+
177+
To remove all hotplugged memory, set `requested_size_mib` to 0:
178+
179+
```console
180+
curl --unix-socket $socket_location -i \
181+
-X PATCH 'http://localhost/hotplug/memory' \
182+
-H 'Accept: application/json' \
183+
-H 'Content-Type: application/json' \
184+
-d '{"requested_size_mib": 0}'
185+
```
186+
187+
Note that this requires the guest to actually be able to find and report memory
188+
blocks that can be moved or freed.
189+
190+
## Configuring the guest driver
191+
192+
The guest kernel must be configured with specific boot or runtime module
193+
parameters to ensure optimal behavior of the `virtio-mem` driver and memory
194+
hotplug module.
195+
196+
In short:
197+
198+
- pass `memhp_default_state=online_movable` if hot-removal is required and the
199+
hotpluggable memory area is not much bigger than the normal memory.
200+
- pass `memory_hotplug.memmap_on_memory=1 memhp_default_state=online` if
201+
hot-removal is not required and the hotpluggable memory area can be much
202+
bigger than the normal memory.
203+
204+
#### `memhp_default_state`
205+
206+
This parameter controls how newly hotplugged memory is onlined by the kernel.
207+
This parameter is required for automatically onlining new memory pages. It is
208+
recommended to set it to `online_movable` as below for reliable memory
209+
hot-removal.
210+
211+
```
212+
memhp_default_state=online_movable
213+
```
214+
215+
The `online_movable` setting ensures that:
216+
217+
- Hotplugged memory is placed in the MOVABLE zone
218+
- The kernel can migrate pages when unplugging is requested
219+
- Memory can be successfully freed back to the host
220+
221+
Other possible values (not recommended for hot-removal):
222+
223+
- `online`: Places memory automatically between NORMAL and MOVABLE zone (may
224+
prevent hot-remove)
225+
- `online_kernel`: Places memory in NORMAL zone (may prevent hot-remove)
226+
- `offline` (default): Memory requires manual onlining
227+
228+
#### `memory_hotplug.memmap_on_memory` (optional)
229+
230+
This parameter controls whether the kernel allocates memory map (`struct pages`)
231+
for hotplugged memory from the hotplugged memory itself, rather than from boot
232+
memory. Without this parameter, the kernel needs 64B for every 4KiB page in the
233+
boot memory. For example, it would need 262 MiB of free "boot" memory to hotplug
234+
16 GiB of memory. This parameter only works if the memory is not entirely
235+
hotplugged as MOVABLE.
236+
237+
```
238+
memory_hotplug.memmap_on_memory=1 memhp_default_state=online
239+
```
240+
241+
This configuration is recommended in case hot-removal is not a priority, and the
242+
hotpluggable memory area is very large.
243+
244+
#### Additional Resources
245+
246+
For more detailed and up-to-date information about memory hotplug in the Linux
247+
kernel, refer to the official kernel documentation:
248+
[Memory Hotplug](https://docs.kernel.org/admin-guide/mm/memory-hotplug.html)
249+
250+
## Security Considerations
251+
252+
**The `virtio-mem` device is a paravirtualized device requiring cooperation from
253+
a driver in the guest.**
254+
255+
### Memory Protection
256+
257+
Firecracker provides the following security guarantees:
258+
259+
- **Memory that is never plugged is protected**: Memory that has never been
260+
plugged before is protected from the guest by not making it available to the
261+
guest via a KVM slot and by using `mprotect` to prevent access from device
262+
emulation. Any attempt by the guest to access unplugged memory will result in
263+
a fault and may crash the Firecracker process.
264+
- **Unplugged memory slots are protected**: Memory slots that have been
265+
unplugged are removed from KVM and `mprotect`-ed. This requires the guest to
266+
report contiguous blocks to be freed for the memory slot to be actually
267+
protected.
268+
- **Unplugged memory blocks are freed**: When a memory block is unplugged, the
269+
backing pages are freed, for example using `madvise(MADV_DONTNEED)` for anon
270+
memory, returning memory to the host at block granularity.
271+
- **Memory isolation**: Unplugged memory never leaks between Firecracker
272+
processes. Memory mappings use `MAP_PRIVATE` and `MAP_ANONYMOUS` flags,
273+
ensuring that even if physical pages are reused, they are zeroed on access.
274+
275+
### Trust Model
276+
277+
While Firecracker enforces memory isolation at the host level, a compromised
278+
guest driver could:
279+
280+
- Fail to plug or unplug memory as requested by the device
281+
- Attempt to access unplugged memory (will result in a fault and crash of
282+
Firecracker)
283+
284+
Users should:
285+
286+
- Be prepared to handle cases where the guest doesn't cooperate with memory
287+
operations by monitoring the GET API.
288+
- Implement host-level memory limits and monitoring, eg through cgroup.
289+
290+
## Compatibility with Other Features
291+
292+
`virtio-mem` is compatible with all Firecracker features. Below are some
293+
specific changes in the other features when using memory hotplugging.
294+
295+
### Snapshots
296+
297+
Full and diff snapshots will include the unplugged areas as sparse "holes" in
298+
the memory snapshot file. Sparse file support is recommended to efficiently
299+
handle the memory snapshot files.
300+
301+
### Uffd
302+
303+
The uffd handler will need to handle the entire hotpluggable memory range even
304+
if unplugged. The uffd handler may decide to unregister unplugged memory ranges
305+
(holes in the memory file). The uffd handler will also need to handle
306+
`UFFD_EVENT_REMOVE` events for hot-removed blocks, either unregistering the
307+
range or storing the information and returning an empty page on the next access.
308+
309+
### Vhost-user
310+
311+
`vhost-user` is fully supported, but Firecracker cannot guarantee protection of
312+
unplugged memory from a `vhost-user` backend. A malicious guest driver may be
313+
able to trick the backend to access unplugged memory. This is not possible in
314+
Firecracker itself as unplugged memory slots are `mprotect`-ed.

0 commit comments

Comments
 (0)