Skip to content

Commit 9d6e93f

Browse files
committed
doc(virtio-pmem): add documentation
Add new document about virtio-pmem configuration and usage. Signed-off-by: Egor Lazarchuk <[email protected]>
1 parent 642d5c2 commit 9d6e93f

File tree

1 file changed

+205
-0
lines changed

1 file changed

+205
-0
lines changed

docs/pmem.md

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# Using the Firecracker `virtio-pmem` device
2+
3+
## What is a persistent memory device
4+
5+
Persistent memory is a type of non-volatile, CPU accessible (with usual
6+
load/store instructions) memory that does not lose its content on power loss. In
7+
other words all writes to the memory persist over the power cycle. In hardware
8+
this known as NVDIMM memory (Non Volatile Double Inline Memory Module).
9+
10+
## What is a `virtio-pmem` device:
11+
12+
[`virtio-pmem`](https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-68900019)
13+
is a device which emulates a persistent memory device without requiring a
14+
physical NVDIMM device be present on the host system. `virtio-pmem` is backed by
15+
a memory mapped file on the host side and is exposed to the guest kernel as an
16+
region in the guest physical memory. This allows the guest to directly access
17+
the host memory pages without a need to use guest driver or interact with VMM.
18+
From guest user-space perspective `virtio-pmem` devices are presented as normal
19+
block device like `/dev/pmem0`. This allows `virtio-pmem` to be used as rootfs
20+
device and make VM boot from it.
21+
22+
> [!NOTE]
23+
>
24+
> Since `virtio-pmem` is located fully in memory, when used as a block device
25+
> there is no need to use guest page cache for it's operations. This behaviour
26+
> can be configured by using `DAX` feature of the kernel.
27+
>
28+
> - To mount a device with `DAX` add `--flags=dax` to the `mount` command.
29+
> - To configure a root device with `DAX` append `rootflags=dax` to the kernel
30+
> arguments.
31+
>
32+
> `DAX` support is not uniform for all file systems. Check the kernel
33+
> [documentation](https://github.com/torvalds/linux/blob/master/Documentation/filesystems/dax.rst)
34+
> for more information.
35+
36+
## Prerequisites
37+
38+
In order to use `virtio-pmem` device, guest kernel needs to built with support
39+
for it. The full list of configuration options needed for `virtio-pmem` and
40+
`DAX`:
41+
42+
```
43+
# Needed for DAX on aarch64. Will be ignored on x86_64
44+
CONFIG_ARM64_PMEM=y
45+
46+
CONFIG_DEVICE_MIGRATION=y
47+
CONFIG_ZONE_DEVICE=y
48+
CONFIG_VIRTIO_PMEM=y
49+
CONFIG_LIBNVDIMM=y
50+
CONFIG_BLK_DEV_PMEM=y
51+
CONFIG_ND_CLAIM=y
52+
CONFIG_ND_BTT=y
53+
CONFIG_BTT=y
54+
CONFIG_ND_PFN=y
55+
CONFIG_NVDIMM_PFN=y
56+
CONFIG_NVDIMM_DAX=y
57+
CONFIG_OF_PMEM=y
58+
CONFIG_NVDIMM_KEYS=y
59+
CONFIG_DAX=y
60+
CONFIG_DEV_DAX=y
61+
CONFIG_DEV_DAX_PMEM=y
62+
CONFIG_DEV_DAX_KMEM=y
63+
CONFIG_FS_DAX=y
64+
CONFIG_FS_DAX_PMD=y
65+
```
66+
67+
## Configuration
68+
69+
Firecracker implementation exposes these config options for the `virtio-pmem`
70+
device:
71+
72+
- `id` - id of the device for internal use
73+
- `path_on_host` - path to the backing file
74+
- `root_device` - toggle to use this device as root device. Device will be
75+
marked as `rw` in the kernel arguments
76+
- `read_only` - tells Firecracker to `mmap` the backing file in read-only mode.
77+
If this device is also configured as `root_device`, it will be marked as `ro`
78+
in the kernel arguments
79+
80+
> [!NOTE]
81+
>
82+
> Devices will be exposed to the guest in the order in which they are configured
83+
> with sequential names in the for `/dev/pmem{N}` like: `/dev/pmem0`,
84+
> `/dev/pmem1` ...
85+
86+
> [!WARNING]
87+
>
88+
> Setting `virtio-pmem` device to `read-only` mode can lead to VM shutting down
89+
> on any attempt to write to the device. This is because from guest kernel
90+
> perspective `virtio-pmem` is always `read-write` capable. Use `read-only` mode
91+
> only if you want to ensure the underlying file is never written to.
92+
>
93+
> To mount the `pmem` device with `read-only` options add `-o ro` to the `mount`
94+
> command.
95+
>
96+
> The exact behaviour differs per platform:
97+
>
98+
> - x86_64 - if KVM is able to decode the write instruction used by the guest,
99+
> it will return a MMIO_WRITE to the Firecracker where it will be discarded
100+
> and the warning log will be printed.
101+
> - aarch64 - the instruction emulation is much stricter. Writes will result in
102+
> an internal KVM error which will be returned to Firecracker in a form of an
103+
> `ENOSYS` error. This will make Firecracker stop the VM with appropriate log
104+
> message.
105+
106+
> [!WARNING]
107+
>
108+
> `virtio-pmem` requires for the guest exposed memory region to be 2MB aligned.
109+
> This requirement is transitively carried to the backing file of the
110+
> `virtio-pmem`. Firecracker allows users to configure `virtio-pmem` with
111+
> backing file of any size and fills the memory gap between the end of the file
112+
> and the 2MB boundary with empty `PRIVATE | ANONYMOUS` memory pages. Users must
113+
> be careful to not write to this memory gap since it will not be synchronized
114+
> with backing file. This is not an issue if `virtio-pmem` is configured in
115+
> `read-only` mode.
116+
117+
### Config file
118+
119+
Configuration of the `virtio-pmem` device from config file follows similar
120+
pattern to `virtio-block` section. Here is an example configuration for a single
121+
`virtio-pmem` device:
122+
123+
```json
124+
"pmem": [
125+
{
126+
"id": "pmem0",
127+
"path_on_host": "./some_file",
128+
"root_device": true,
129+
"read_only": false
130+
}
131+
]
132+
```
133+
134+
### API
135+
136+
Similar to other devices `virtio-pmem` can be configured with API calls. An
137+
example of configuration request:
138+
139+
```console
140+
curl --unix-socket $socket_location -i \
141+
-X PUT 'http://localhost/pmem/pmem0' \
142+
-H 'Accept: application/json' \
143+
-H 'Content-Type: application/json' \
144+
-d "{
145+
\"id\": \"pmem0\",
146+
\"path_on_host\": \"./some_file\",
147+
\"root_device\": true,
148+
\"read_only\": false
149+
}"
150+
```
151+
152+
## Security
153+
154+
`virtio-pmem` can used for sharing of underlying backing file between multiple
155+
VMs by providing same backing file to `virtio-pmem` devices of corresponding
156+
VMs. This scenario imposes a security risk of side channel attacks between VMs.
157+
Users are encouraged to evaluate risks before using `virtio-pmem` for such
158+
scenarios.
159+
160+
## Snapshot support
161+
162+
`virtio-pmem` works with snapshot functionality of Firecracker. Snapshot will
163+
contain the configuration options provided by the user. During restoration
164+
process, Firecracker will attempt to restore `virtio-pmem` device by opening
165+
same backing file as it was configured in the first place. This means all
166+
`virtio-pmem` backing files should be present in the same locations during
167+
restore as they were during initial `virtio-pmem` configuration.
168+
169+
## Performance
170+
171+
Event thought `virtio-pmem` allows for the direct access of host pages from the
172+
guest, the performance of the first access of each page will suffer from the
173+
internal KVM page fault which will have to set up Guest physical address to Host
174+
Virtual address translation. Consecutive accesses will not need to go through
175+
this process again.
176+
177+
Since the number of page faults correlate to the size of the pages used to back
178+
`virtio-pmem` memory, it is possible to use huge pages to reduce number of
179+
required page fault. This can be done by using
180+
[`tmpfs`](https://www.kernel.org/doc/html/latest/filesystems/tmpfs.html) with
181+
transparent huge pages enabled or by using
182+
[`hugetblfs`](https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html)
183+
if `virtio-pmem` is used for memory sharing.
184+
185+
## Memory usage
186+
187+
Since `virtio-pmem` resides in host memory it does increase the maximum possible
188+
memory usage of a VM since now VM can use all of its RAM and access all of the
189+
`virtio-pmem` memory. In order to minimize the overhead, it is highly
190+
recommended to use `DAX` mode to avoid unnecessary duplication of data in guest
191+
page cache.
192+
193+
As an example, a single VM with 128MB of memory booted from `virtio-pmem` device
194+
without `DAX` has `RSS` value of ~120MB, while with `DAX` it is ~96MB. The ~96MB
195+
is similar to memory usage of a VM booted using `virtio-block` as a root device.
196+
197+
In the case where multiple VMs have `virtio-pmem` devices that point to the same
198+
underlying file the memory overhead can be amortized since total maximum memory
199+
usage will only include a single instance of `virtio-pmem` memory.
200+
201+
As an example 2 VMs configured with 128MB of RAM without `virtio-pmem` devices
202+
can consume maximum of 128 + 128 = 256MB of host memory. If each of VMs will
203+
have a 100MB `virtio-pmem` device attached with shared backing file, the maximum
204+
memory consumption will be 128 + 128 + 100 = 356MB because 100MB of
205+
`virtio-pmem` will be shared between VMs.

0 commit comments

Comments
 (0)