Skip to content

Commit 5101fcd

Browse files
committed
doc(virtio-pmem): add documentation
Add new document about virtio-pmem configuration and usage. Signed-off-by: Egor Lazarchuk <[email protected]>
1 parent 085543b commit 5101fcd

File tree

1 file changed

+171
-0
lines changed

1 file changed

+171
-0
lines changed

docs/pmem.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# Using the Firecracker `virtio-pmem` device
2+
3+
## What is a persistent memory device
4+
5+
Persistent memory is a type of non-volatile, CPU accessible (with usual
6+
load/store instructions) memory that does not lose its content on power loss. In
7+
other words all writes to the memory persist over the power cycle. In hardware
8+
this known as NVDIMM memory (Non Volatile Double Inline Memory Module).
9+
10+
## What is a `virtio-pmem` device:
11+
12+
[`virtio-pmem`](https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-68900019)
13+
is a device which emulates a persistent memory device without requiring a
14+
physical NVDIMM device be present on the host system. `virtio-pmem` is backed by
15+
a memory mapped file on the host side and is exposed to the guest kernel as an
16+
region in the guest physical memory. This allows the guest to directly access
17+
the host memory pages without a need to use guest driver or interact with VMM.
18+
From guest user-space perspective `virtio-pmem` devices are presented as normal
19+
block device like `/dev/pmem0`. This allows `virtio-pmem` to be used as rootfs
20+
device and make VM boot from it.
21+
22+
> [!NOTE]
23+
>
24+
> Since `virtio-pmem` is located fully in memory, when used as a block device
25+
> there is no need to use guest page cache for it's operations. This behaviour
26+
> can be configured by using `DAX` feature of the kernel.
27+
>
28+
> - To mount a device with `DAX` add `--flags=dax` to the `mount` command.
29+
> - To configure a root device with `DAX` append `rootflags=dax` to the kernel
30+
> arguments.
31+
>
32+
> `DAX` support is not uniform for all file systems. Check the documentation for
33+
> the file system you want to use before enabling `DAX`.
34+
35+
## Prerequisites
36+
37+
In order to use `virtio-pmem` device, guest kernel needs to built with support
38+
for it. The full list of configuration options needed for `virtio-pmem` and
39+
`DAX`:
40+
41+
```
42+
# Needed for DAX on aarch64. Will be ignored on x86_64
43+
CONFIG_ARM64_PMEM=y
44+
45+
CONFIG_DEVICE_MIGRATION=y
46+
CONFIG_ZONE_DEVICE=y
47+
CONFIG_VIRTIO_PMEM=y
48+
CONFIG_LIBNVDIMM=y
49+
CONFIG_BLK_DEV_PMEM=y
50+
CONFIG_ND_CLAIM=y
51+
CONFIG_ND_BTT=y
52+
CONFIG_BTT=y
53+
CONFIG_ND_PFN=y
54+
CONFIG_NVDIMM_PFN=y
55+
CONFIG_NVDIMM_DAX=y
56+
CONFIG_OF_PMEM=y
57+
CONFIG_NVDIMM_KEYS=y
58+
CONFIG_DAX=y
59+
CONFIG_DEV_DAX=y
60+
CONFIG_DEV_DAX_PMEM=y
61+
CONFIG_DEV_DAX_KMEM=y
62+
CONFIG_FS_DAX=y
63+
CONFIG_FS_DAX_PMD=y
64+
```
65+
66+
## Configuration
67+
68+
Firecracker implementation exposes these config options for the `virtio-pmem`
69+
device:
70+
71+
- `id` - id of the device for internal use
72+
- `path_on_host` - path to the backing file
73+
- `root_device` - toggle to use this device as root device. Device will be
74+
marked as `rw` in the kernel arguments
75+
- `read_only` - tells Firecracker to `mmap` the backing file in read-only mode.
76+
If this device is also configured as `root_device`, it will be marked as `ro`
77+
in the kernel arguments
78+
79+
> [!NOTE]
80+
>
81+
> Devices will be exposed to the guest in the order in which they are configured
82+
> with sequential names in the for `/dev/pmem{N}` like: `/dev/pmem0`,
83+
> `/dev/pmem1` ...
84+
85+
> [!WARNING]
86+
>
87+
> Setting `virtio-pmem` device to `read-only` mode can lead to VM shutting down
88+
> on any attempt to write to the device. This is because from guest kernel
89+
> perspective `virtio-pmem` is always `read-write` capable. Use `read-only` mode
90+
> only if you want to ensure the underlying file is never written to.
91+
>
92+
> To mount the `pmem` device with `read-only` options add `-o ro` to the `mount` command.
93+
>
94+
> The exact behaviour differs per platform:
95+
>
96+
> - x86_64 - if KVM is able to decode the write instruction used by the guest,
97+
> it will return a MMIO_WRITE to the Firecracker where it will be discarded
98+
> and the warning log will be printed.
99+
> - aarch64 - the instruction emulation is much stricter. Writes will result
100+
> in an internal KVM error which will be returned to Firecracker in a form of an `ENOSYS` error.
101+
> This will make Firecracker stop the VM with appropriate log message.
102+
103+
> [!WARNING]
104+
>
105+
> `virtio-pmem` requires for the guest exposed memory region to be 2MB aligned.
106+
> This requirement is transitively carried to the backing file of the
107+
> `virtio-pmem`. Firecracker allows users to configure `virtio-pmem` with
108+
> backing file of any size and fills the memory gap between the end of the file
109+
> and the 2MB boundary with empty `PRIVATE | ANONYMOUS` memory pages. Users must
110+
> be careful to not write to this memory gap since it will not be synchronized
111+
> with backing file. This is not an issue if `virtio-pmem` is configured in
112+
> `read-only` mode.
113+
114+
### Config file
115+
116+
Configuration of the `virtio-pmem` device from config file follows similar
117+
pattern to `virtio-block` section. Here is an example configuration for a single
118+
`virtio-pmem` device:
119+
120+
```json
121+
"pmem": [
122+
{
123+
"id": "pmem0",
124+
"path_on_host": "./some_file",
125+
"root_device": true,
126+
"read_only": false
127+
}
128+
]
129+
```
130+
131+
### API
132+
133+
Similar to other devices `virtio-pmem` can be configured with API calls. An
134+
example of configuration request:
135+
136+
```console
137+
curl --unix-socket $socket_location -i \
138+
-X PUT 'http://localhost/pmem/pmem0' \
139+
-H 'Accept: application/json' \
140+
-H 'Content-Type: application/json' \
141+
-d "{
142+
\"id\": \"pmem0\",
143+
\"path_on_host\": \"./some_file\",
144+
\"root_device\": true,
145+
\"read_only\": false
146+
}"
147+
```
148+
149+
## Security
150+
151+
`virtio-pmem` can used for sharing of underlying backing file between multiple VMs by
152+
providing same backing file to `virtio-pmem` devices of corresponding VMs. This scenario
153+
imposes a security risk of side channel attacks between VMs. Users are encouraged to evaluate
154+
risks before using `virtio-pmem` for such scenarios.
155+
156+
## Snapshot support
157+
158+
`virtio-pmem` works with snapshot functionality of Firecracker. Snapshot will contain the
159+
configuration options provided by the user. During restoration process, Firecracker will
160+
attempt to restore `virtio-pmem` device by opening same backing file as it was configured
161+
in the first place. This means all `virtio-pmem` backing files should be present in the same
162+
locations during restore as they were during initial `virtio-pmem` configuration.
163+
164+
165+
## Performance
166+
167+
Event thought `virtio-pmem` allows for the direct access of host pages from the guest,
168+
the performance of the first access of each page will suffer from the internal KVM
169+
page fault which will have to set up Guest physical address to Host Virtual address
170+
translation. Consecutive accesses will not need to go through this process
171+
again.

0 commit comments

Comments
 (0)