Skip to content

Commit 090c73f

Browse files
committed
doc(virtio-mem): add memory hotplug documentation
This adds documentation for the new memory hotplug feature. Signed-off-by: Riccardo Mancini <[email protected]>
1 parent 0c156c4 commit 090c73f

File tree

1 file changed

+325
-0
lines changed

1 file changed

+325
-0
lines changed

docs/memory-hotplug.md

Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
# Memory Hotplugging with virtio-mem
2+
3+
## What is virtio-mem
4+
5+
`virtio-mem` is a para-virtualized memory device that enables dynamic memory
6+
resizing for virtual machines. Unlike traditional memory hot-plug mechanisms,
7+
`virtio-mem` provides a flexible and efficient solution that works across
8+
different architectures and avoids many limitations of older approaches.
9+
10+
The `virtio-mem` device manages a contiguous memory region that is divided into
11+
fixed-size blocks. The host can request the guest to plug (make available) or
12+
unplug (release) memory by changing the device's target size, and the guest
13+
driver responds by allocating or freeing memory blocks accordingly. This
14+
approach provides fine-grained control over guest memory with minimal overhead.
15+
16+
Firecracker further adds the concept of slots, which are a set of contiguous
17+
blocks (usually 128MiB) that can be fully protected from guest accesses to
18+
prevent malicious guests from accessing the hotpluggable memory range when not
19+
allowed by the host.
20+
21+
## Prerequisites
22+
23+
To support memory hotplugging via `virtio-mem`, you must use a guest kernel with
24+
the appropriate version and configuration options enabled as follows:
25+
26+
#### Kernel Version Requirements
27+
28+
- `x86_64`: minimal kernel version is 5.16
29+
- Firecracker requires support for `VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE`
30+
- `aarch64`: minimal kernel version is 5.18
31+
32+
For more information about officially supported guest kernels, refer to the
33+
[kernel policy documentation](kernel-policy.md).
34+
35+
#### Kernel Config
36+
37+
The following kernel configuration options are required:
38+
39+
```
40+
# Enable the virtio-mem driver
41+
CONFIG_VIRTIO_MEM=y
42+
43+
# Enable memory hotplug support
44+
CONFIG_MEMORY_HOTPLUG=y
45+
46+
# Enable memory hot-remove support
47+
CONFIG_MEMORY_HOTREMOVE=y
48+
49+
# (aarch64 only) required to enable the virtio-mem driver.
50+
CONFIG_STRICT_DEVMEM=y
51+
```
52+
53+
## Adding hotpluggable memory
54+
55+
The `virtio-mem` device must be configured during VM setup with the total amount
56+
of memory that can be hotplugged, before starting the virtual machine. This can
57+
be done through a `PUT` request on `/hotplug/memory` or by including the
58+
configuration in the JSON configuration file.
59+
60+
### Configuration Parameters
61+
62+
- `total_size_mib` (required): The maximum size of hotpluggable memory in MiB.
63+
This defines the upper bound of memory that can be added to the VM. Must be a
64+
multiple of `slot_size_mib`.
65+
66+
- `block_size_mib` (optional, default: 2): The size of individual memory blocks
67+
in MiB. Must be at least 2 MiB and a power of 2. Larger block sizes provide
68+
better performance but less granularity (harder for the guest to unplug).
69+
70+
- `slot_size_mib` (optional, default: 128): The size of KVM memory slots in MiB.
71+
Must be at least `block_size_mib` and a power of 2. Larger slot sizes improve
72+
performance for large memory operations but reduce unplugging protection
73+
efficiency.
74+
75+
It is recommended to leave these values to the default unless strict memory
76+
protection is required, in which case `block_size_mib` should be equal to
77+
`slot_size_mib`. Note that this will make it harder for the guest kernel to find
78+
contiguous memory to hot-un-plug. Refer to the
79+
[Memory Protection](#memory-protection) section below for more details.
80+
81+
### API Configuration
82+
83+
Here is an example of how to configure the `virtio-mem` device via the API:
84+
85+
```console
86+
socket_location=/run/firecracker.socket
87+
88+
curl --unix-socket $socket_location -i \
89+
-X PUT 'http://localhost/hotplug/memory' \
90+
-H 'Accept: application/json' \
91+
-H 'Content-Type: application/json' \
92+
-d "{
93+
\"total_size_mib\": 1024,
94+
\"block_size_mib\": 2,
95+
\"slot_size_mib\": 128
96+
}"
97+
```
98+
99+
> [!Note] this is only allowed before the `InstanceStart` action and not on
100+
> snapshot-restored VMs (which will use the configuration saved in the
101+
> snapshot).
102+
103+
### JSON Configuration
104+
105+
To configure via JSON, add the following to your VM configuration file:
106+
107+
```json
108+
{
109+
"memory-hotplug": {
110+
"total_size_mib": 1024,
111+
"block_size_mib": 2,
112+
"slot_size_mib": 128
113+
}
114+
}
115+
```
116+
117+
### Checking Device Status
118+
119+
After configuration, you can query the device status at any time:
120+
121+
```console
122+
socket_location=/run/firecracker.socket
123+
124+
curl --unix-socket $socket_location -i \
125+
-X GET 'http://localhost/hotplug/memory' \
126+
-H 'Accept: application/json'
127+
```
128+
129+
This returns information about the current device state, including:
130+
131+
- `total_size_mib`: Maximum hotpluggable memory size
132+
- `block_size_mib`: Block size used by the device
133+
- `slot_size_mib`: Slot size used by Firecracker (granularity of memory
134+
protection)
135+
- `plugged_size_mib`: Currently plugged (available) memory by the guest
136+
- `requested_size_mib`: Target memory size set by the host
137+
138+
## Operating the virtio-mem device
139+
140+
Once configured and the VM is running, you can dynamically adjust the amount of
141+
memory available to the guest by updating the requested size.
142+
143+
### Hot-plugging Memory
144+
145+
To add memory to a running VM, request a greater size from the `virtio-mem`
146+
device:
147+
148+
```console
149+
socket_location=/run/firecracker.socket
150+
151+
curl --unix-socket $socket_location -i \
152+
-X PATCH 'http://localhost/hotplug/memory' \
153+
-H 'Accept: application/json' \
154+
-H 'Content-Type: application/json' \
155+
-d "{
156+
\"requested_size_mib\": 512
157+
}"
158+
```
159+
160+
This updates the target memory size. The guest driver will detect this change
161+
and allocate memory blocks to reach the requested size. The process is
162+
asynchronous - the guest will incrementally plug memory until it reaches the
163+
target. It is recommended to use the `GET` API to monitor the current state of
164+
the hot-plugging by the driver.
165+
166+
### Hot-removing Memory
167+
168+
To remove memory from a running VM, request a lower size:
169+
170+
```console
171+
socket_location=/run/firecracker.socket
172+
173+
curl --unix-socket $socket_location -i \
174+
-X PATCH 'http://localhost/hotplug/memory' \
175+
-H 'Accept: application/json' \
176+
-H 'Content-Type: application/json' \
177+
-d "{
178+
\"requested_size_mib\": 256
179+
}"
180+
```
181+
182+
Setting a lower `requested_size_mib` value causes the guest driver to free
183+
memory blocks. Once the guest reports a block to be unplugged, the unplugged
184+
memory is immediately freed from the host process. If all blocks in a memory
185+
slot are unplugged, then Firecracker will also protect the memory slot, removing
186+
access from the guest.
187+
188+
To remove all hotplugged memory, set `requested_size_mib` to 0:
189+
190+
```console
191+
curl --unix-socket $socket_location -i \
192+
-X PATCH 'http://localhost/hotplug/memory' \
193+
-H 'Accept: application/json' \
194+
-H 'Content-Type: application/json' \
195+
-d '{"requested_size_mib": 0}'
196+
```
197+
198+
Note that this requires the guest to actually be able to find and report memory
199+
blocks that can be moved or freed.
200+
201+
## Configuring the guest driver
202+
203+
The guest kernel must be configured with specific boot or runtime module
204+
parameters to ensure optimal behavior of the `virtio-mem` driver and memory
205+
hotplug module.
206+
207+
In short:
208+
209+
- pass `memhp_default_state=online_movable` if hot-removal is required and the
210+
hotpluggable memory area is not much bigger than the normal memory.
211+
- pass `memory_hotplug.memmap_on_memory=1 memhp_default_state=online` if
212+
hot-removal is not required and the hotpluggable memory area can be much
213+
bigger than the normal memory.
214+
215+
#### `memhp_default_state`
216+
217+
This parameter controls how newly hotplugged memory is onlined by the kernel.
218+
This parameter is required for automatically onlining new memory pages. It is
219+
recommended to set it to `online_movable` as below for reliable memory
220+
hot-removal.
221+
222+
```
223+
memhp_default_state=online_movable
224+
```
225+
226+
The `online_movable` setting ensures that:
227+
228+
- Hotplugged memory is placed in the MOVABLE zone
229+
- The kernel can migrate pages when unplugging is requested
230+
- Memory can be successfully freed back to the host
231+
232+
Other possible values (not recommended for hot-removal):
233+
234+
- `online`: Places memory automatically between NORMAL and MOVABLE zone (may
235+
prevent hot-remove)
236+
- `online_kernel`: Places memory in NORMAL zone (may prevent hot-remove)
237+
- `offline` (default): Memory requires manual onlining
238+
239+
#### `memory_hotplug.memmap_on_memory` (optional)
240+
241+
This parameter controls whether the kernel allocates memory map (`struct pages`)
242+
for hotplugged memory from the hotplugged memory itself, rather than from boot
243+
memory. Without this parameter, the kernel needs 64B for every 4KiB page in the
244+
boot memory. For example, it would need 262 MiB of free "boot" memory to hotplug
245+
16 GiB of memory. This parameter only works if the memory is not entirely
246+
hotplugged as MOVABLE.
247+
248+
```
249+
memory_hotplug.memmap_on_memory=1 memhp_default_state=online
250+
```
251+
252+
This configuration is recommended in case hot-removal is not a priority, and the
253+
hotpluggable memory area is very large.
254+
255+
#### Additional Resources
256+
257+
For more detailed and up-to-date information about memory hotplug in the Linux
258+
kernel, refer to the official kernel documentation:
259+
[Memory Hotplug](https://docs.kernel.org/admin-guide/mm/memory-hotplug.html)
260+
261+
## Security Considerations
262+
263+
**The `virtio-mem` device is a paravirtualized device requiring cooperation from
264+
a driver in the guest.**
265+
266+
### Memory Protection
267+
268+
Firecracker provides the following security guarantees:
269+
270+
- **Memory that is never plugged is protected**: Memory that has never been
271+
plugged before is protected from the guest by not making it available to the
272+
guest via a KVM slot and by using `mprotect` to prevent access from device
273+
emulation. Any attempt by the guest to access unplugged memory will result in
274+
a fault and may crash the Firecracker process.
275+
- **Unplugged memory slots are protected**: Memory slots that have been
276+
unplugged are removed from KVM and `mprotect`-ed. This requires the guest to
277+
report contiguous blocks to be freed for the memory slot to be actually
278+
protected.
279+
- **Unplugged memory blocks are freed**: When a memory block is unplugged, the
280+
backing pages are freed, for example using `madvise(MADV_DONTNEED)` for anon
281+
memory, returning memory to the host at block granularity.
282+
- **Memory isolation**: Unplugged memory never leaks between Firecracker
283+
processes. Memory mappings use `MAP_PRIVATE` and `MAP_ANONYMOUS` flags,
284+
ensuring that even if physical pages are reused, they are zeroed on access.
285+
286+
### Trust Model
287+
288+
While Firecracker enforces memory isolation at the host level, a compromised
289+
guest driver could:
290+
291+
- Fail to plug or unplug memory as requested by the device
292+
- Attempt to access unplugged memory (will result in a fault and crash of
293+
Firecracker)
294+
295+
Users should:
296+
297+
- Be prepared to handle cases where the guest doesn't cooperate with memory
298+
operations by monitoring the `GET` API.
299+
- Implement host-level memory limits and monitoring, eg through cgroup.
300+
301+
## Compatibility with Other Features
302+
303+
`virtio-mem` is compatible with all Firecracker features. Below are some
304+
specific changes in the other features when using memory hotplugging.
305+
306+
### Snapshots
307+
308+
Full and diff snapshots will include the unplugged areas as sparse "holes" in
309+
the memory snapshot file. Sparse file support is recommended to efficiently
310+
handle the memory snapshot files.
311+
312+
### Uffd
313+
314+
The uffd handler will need to handle the entire hotpluggable memory range even
315+
if unplugged. The uffd handler may decide to unregister unplugged memory ranges
316+
(holes in the memory file). The uffd handler will also need to handle
317+
`UFFD_EVENT_REMOVE` events for hot-removed blocks, either unregistering the
318+
range or storing the information and returning an empty page on the next access.
319+
320+
### Vhost-user
321+
322+
`vhost-user` is fully supported, but Firecracker cannot guarantee protection of
323+
unplugged memory from a `vhost-user` backend. A malicious guest driver may be
324+
able to trick the backend to access unplugged memory. This is not possible in
325+
Firecracker itself as unplugged memory slots are `mprotect`-ed.

0 commit comments

Comments
 (0)