Skip to content

Commit 1d2aeb2

Browse files
committed
doc(virtio-pmem): add documentation
Add new document about virtio-pmem configuration and usage. Signed-off-by: Egor Lazarchuk <[email protected]>
1 parent b229e3e commit 1d2aeb2

File tree

1 file changed

+146
-0
lines changed

1 file changed

+146
-0
lines changed

docs/pmem.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Using the Firecracker `virtio-pmem` device
2+
3+
## What is a persistent memory device
4+
5+
Persistent memory is a type of non-volatile, CPU accessible (with usual
6+
load/store instructions) memory that does not lose its content on power loss. In
7+
other words all writes to the memory persist over the power cycle. In hardware
8+
this known as NVDIMM memory (Non Volatile Double Inline Memory Module).
9+
10+
## What is a `virtio-pmem` device:
11+
12+
[`virtio-pmem`](https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-68900019)
13+
is a device which emulates a persistent memory device without requiring a
14+
physical NVDIMM device be present on the host system. `virtio-pmem` is backed by
15+
a memory mapped file on the host side and is exposed to the guest kernel as an
16+
region in the guest physical memory. This allows the guest to directly access
17+
the host memory pages without a need to use guest driver or interact with VMM.
18+
From guest user-space perspective `virtio-pmem` devices are presented as normal
19+
block device like `/dev/pmem0`. This allows `virtio-pmem` to be used as rootfs
20+
device and make VM boot from it.
21+
22+
> [!NOTE]
23+
>
24+
> Since `virtio-pmem` is located fully in memory, when used as a block device
25+
> there is no need to use guest page cache for it's operations. This behaviour
26+
> can be configured by using `DAX` feature of the kernel.
27+
>
28+
> - To mount a device with `DAX` add `--flags=dax` to the `mount` command.
29+
> - To configure a root device with `DAX` append `rootflags=dax` to the kernel
30+
> arguments.
31+
>
32+
> `DAX` support is not uniform for all file systems. Check the documentation for
33+
> the file system you want to use before enabling `DAX`.
34+
35+
## Prerequisites
36+
37+
In order to use `virtio-pmem` device, guest kernel needs to built with support
38+
for it. The full list of configuration options needed for `virtio-pmem` and
39+
`DAX`:
40+
41+
```
42+
# Needed for DAX on aarch64. Will be ignored on x86_64
43+
CONFIG_ARM64_PMEM=y
44+
45+
CONFIG_DEVICE_MIGRATION=y
46+
CONFIG_ZONE_DEVICE=y
47+
CONFIG_VIRTIO_PMEM=y
48+
CONFIG_LIBNVDIMM=y
49+
CONFIG_BLK_DEV_PMEM=y
50+
CONFIG_ND_CLAIM=y
51+
CONFIG_ND_BTT=y
52+
CONFIG_BTT=y
53+
CONFIG_ND_PFN=y
54+
CONFIG_NVDIMM_PFN=y
55+
CONFIG_NVDIMM_DAX=y
56+
CONFIG_OF_PMEM=y
57+
CONFIG_NVDIMM_KEYS=y
58+
CONFIG_DAX=y
59+
CONFIG_DEV_DAX=y
60+
CONFIG_DEV_DAX_PMEM=y
61+
CONFIG_DEV_DAX_KMEM=y
62+
CONFIG_FS_DAX=y
63+
CONFIG_FS_DAX_PMD=y
64+
```
65+
66+
## Configuration
67+
68+
Firecracker implementation exposes these config options for the `virtio-pmem`
69+
device:
70+
71+
- `id` - id of the device for internal use
72+
- `path_on_host` - path to the backing file
73+
- `root_device` - toggle to use this device as root device. Device will be
74+
marked as `rw` in the kernel arguments
75+
- `read_only` - tells Firecracker to `mmap` the backing file in read-only mode.
76+
If this device is also configured as `root_device`, it will be marked as `ro`
77+
in the kernel arguments
78+
79+
> [!NOTE]
80+
>
81+
> Devices will be exposed to the guest in the order in which they are configured
82+
> with sequential names in the for `/dev/pmem{N}` like: `/dev/pmem0`,
83+
> `/dev/pmem1` ...
84+
85+
> [!WARNING]
86+
>
87+
> Setting `virtio-pmem` device to `read-only` mode can lead to VM shutting down
88+
> on any attempt to write to the device. This is because from guest kernel
89+
> perspective `virtio-pmem` is always `read-write` capable. Use `read-only` mode
90+
> only if you want to ensure the underlying file is never written to.
91+
>
92+
> The exact behaviour differs per platform:
93+
>
94+
> - x86_64 - if KVM is able to decode the write instruction used by the guest,
95+
> it will return a MMIO_WRITE to the Firecracker where it will be discarded
96+
> and the warning log will be printed.
97+
> - aarch64 - the instruction emulation is much stricter, so writes will in
98+
> internal KVM error which will be returned to Firecracker in a for of ENOSYS
99+
> return value from `KVM_RUN`. This will make Firecracker stop the VM with
100+
> appropriate log message.
101+
102+
> [!WARNING]
103+
>
104+
> `virtio-pmem` requires for the guest exposed memory region to be 2MB aligned.
105+
> This requirement is transitively carried to the backing file of the
106+
> `virtio-pmem`. Firecracker allows users to configure `virtio-pmem` with
107+
> backing file of any size and fills the memory gap between the end of the file
108+
> and the 2MB boundary with empty `PRIVATE | ANONYMOUS` memory pages. Users must
109+
> be careful to not write to this memory gap since it will not be synchronized
110+
> with backing file. This is not an issue if `virtio-pmem` is configured in
111+
> `read-only` mode.
112+
113+
### Config file
114+
115+
Configuration of the `virtio-pmem` device from config file follows similar
116+
pattern to `virtio-block` section. Here is an example configuration for a single
117+
`virtio-pmem` device:
118+
119+
```json
120+
"pmem": [
121+
{
122+
"id": "pmem0",
123+
"path_on_host": "./some_file",
124+
"root_device": true,
125+
"read_only": fasle
126+
}
127+
]
128+
```
129+
130+
### API
131+
132+
Similar to other devices `virtio-pmem` can be configured with API calls. An
133+
example of configuration request:
134+
135+
```console
136+
curl --unix-socket $socket_location -i \
137+
-X PUT 'http://localhost/pmem/pmem0' \
138+
-H 'Accept: application/json' \
139+
-H 'Content-Type: application/json' \
140+
-d "{
141+
\"id\": \"pmem0\",
142+
\"path_on_host\": \"./some_file\",
143+
\"root_device\": true,
144+
\"read_only\": false
145+
}"
146+
```

0 commit comments

Comments
 (0)