Skip to content

Commit f2064b2

Browse files
committed
doc(virtio-pmem): add documentation
Add new document about virtio-pmem configuration and usage. Signed-off-by: Egor Lazarchuk <[email protected]>
1 parent 52a37f0 commit f2064b2

File tree

1 file changed

+174
-0
lines changed

1 file changed

+174
-0
lines changed

docs/pmem.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# Using the Firecracker `virtio-pmem` device
2+
3+
## What is a persistent memory device
4+
5+
Persistent memory is a type of non-volatile, CPU accessible (with usual
6+
load/store instructions) memory that does not lose its content on power loss. In
7+
other words all writes to the memory persist over the power cycle. In hardware
8+
this known as NVDIMM memory (Non Volatile Double Inline Memory Module).
9+
10+
## What is a `virtio-pmem` device:
11+
12+
[`virtio-pmem`](https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-68900019)
13+
is a device which emulates a persistent memory device without requiring a
14+
physical NVDIMM device be present on the host system. `virtio-pmem` is backed by
15+
a memory mapped file on the host side and is exposed to the guest kernel as an
16+
region in the guest physical memory. This allows the guest to directly access
17+
the host memory pages without a need to use guest driver or interact with VMM.
18+
From guest user-space perspective `virtio-pmem` devices are presented as normal
19+
block device like `/dev/pmem0`. This allows `virtio-pmem` to be used as rootfs
20+
device and make VM boot from it.
21+
22+
> [!NOTE]
23+
>
24+
> Since `virtio-pmem` is located fully in memory, when used as a block device
25+
> there is no need to use guest page cache for it's operations. This behaviour
26+
> can be configured by using `DAX` feature of the kernel.
27+
>
28+
> - To mount a device with `DAX` add `--flags=dax` to the `mount` command.
29+
> - To configure a root device with `DAX` append `rootflags=dax` to the kernel
30+
> arguments.
31+
>
32+
> `DAX` support is not uniform for all file systems. Check the documentation for
33+
> the file system you want to use before enabling `DAX`.
34+
35+
## Prerequisites
36+
37+
In order to use `virtio-pmem` device, guest kernel needs to built with support
38+
for it. The full list of configuration options needed for `virtio-pmem` and
39+
`DAX`:
40+
41+
```
42+
# Needed for DAX on aarch64. Will be ignored on x86_64
43+
CONFIG_ARM64_PMEM=y
44+
45+
CONFIG_DEVICE_MIGRATION=y
46+
CONFIG_ZONE_DEVICE=y
47+
CONFIG_VIRTIO_PMEM=y
48+
CONFIG_LIBNVDIMM=y
49+
CONFIG_BLK_DEV_PMEM=y
50+
CONFIG_ND_CLAIM=y
51+
CONFIG_ND_BTT=y
52+
CONFIG_BTT=y
53+
CONFIG_ND_PFN=y
54+
CONFIG_NVDIMM_PFN=y
55+
CONFIG_NVDIMM_DAX=y
56+
CONFIG_OF_PMEM=y
57+
CONFIG_NVDIMM_KEYS=y
58+
CONFIG_DAX=y
59+
CONFIG_DEV_DAX=y
60+
CONFIG_DEV_DAX_PMEM=y
61+
CONFIG_DEV_DAX_KMEM=y
62+
CONFIG_FS_DAX=y
63+
CONFIG_FS_DAX_PMD=y
64+
```
65+
66+
## Configuration
67+
68+
Firecracker implementation exposes these config options for the `virtio-pmem`
69+
device:
70+
71+
- `id` - id of the device for internal use
72+
- `path_on_host` - path to the backing file
73+
- `root_device` - toggle to use this device as root device. Device will be
74+
marked as `rw` in the kernel arguments
75+
- `read_only` - tells Firecracker to `mmap` the backing file in read-only mode.
76+
If this device is also configured as `root_device`, it will be marked as `ro`
77+
in the kernel arguments
78+
79+
> [!NOTE]
80+
>
81+
> Devices will be exposed to the guest in the order in which they are configured
82+
> with sequential names in the for `/dev/pmem{N}` like: `/dev/pmem0`,
83+
> `/dev/pmem1` ...
84+
85+
> [!WARNING]
86+
>
87+
> Setting `virtio-pmem` device to `read-only` mode can lead to VM shutting down
88+
> on any attempt to write to the device. This is because from guest kernel
89+
> perspective `virtio-pmem` is always `read-write` capable. Use `read-only` mode
90+
> only if you want to ensure the underlying file is never written to.
91+
>
92+
> To mount the `pmem` device with `read-only` options add `-o ro` to the `mount`
93+
> command.
94+
>
95+
> The exact behaviour differs per platform:
96+
>
97+
> - x86_64 - if KVM is able to decode the write instruction used by the guest,
98+
> it will return a MMIO_WRITE to the Firecracker where it will be discarded
99+
> and the warning log will be printed.
100+
> - aarch64 - the instruction emulation is much stricter. Writes will result in
101+
> an internal KVM error which will be returned to Firecracker in a form of an
102+
> `ENOSYS` error. This will make Firecracker stop the VM with appropriate log
103+
> message.
104+
105+
> [!WARNING]
106+
>
107+
> `virtio-pmem` requires for the guest exposed memory region to be 2MB aligned.
108+
> This requirement is transitively carried to the backing file of the
109+
> `virtio-pmem`. Firecracker allows users to configure `virtio-pmem` with
110+
> backing file of any size and fills the memory gap between the end of the file
111+
> and the 2MB boundary with empty `PRIVATE | ANONYMOUS` memory pages. Users must
112+
> be careful to not write to this memory gap since it will not be synchronized
113+
> with backing file. This is not an issue if `virtio-pmem` is configured in
114+
> `read-only` mode.
115+
116+
### Config file
117+
118+
Configuration of the `virtio-pmem` device from config file follows similar
119+
pattern to `virtio-block` section. Here is an example configuration for a single
120+
`virtio-pmem` device:
121+
122+
```json
123+
"pmem": [
124+
{
125+
"id": "pmem0",
126+
"path_on_host": "./some_file",
127+
"root_device": true,
128+
"read_only": false
129+
}
130+
]
131+
```
132+
133+
### API
134+
135+
Similar to other devices `virtio-pmem` can be configured with API calls. An
136+
example of configuration request:
137+
138+
```console
139+
curl --unix-socket $socket_location -i \
140+
-X PUT 'http://localhost/pmem/pmem0' \
141+
-H 'Accept: application/json' \
142+
-H 'Content-Type: application/json' \
143+
-d "{
144+
\"id\": \"pmem0\",
145+
\"path_on_host\": \"./some_file\",
146+
\"root_device\": true,
147+
\"read_only\": false
148+
}"
149+
```
150+
151+
## Security
152+
153+
`virtio-pmem` can used for sharing of underlying backing file between multiple
154+
VMs by providing same backing file to `virtio-pmem` devices of corresponding
155+
VMs. This scenario imposes a security risk of side channel attacks between VMs.
156+
Users are encouraged to evaluate risks before using `virtio-pmem` for such
157+
scenarios.
158+
159+
## Snapshot support
160+
161+
`virtio-pmem` works with snapshot functionality of Firecracker. Snapshot will
162+
contain the configuration options provided by the user. During restoration
163+
process, Firecracker will attempt to restore `virtio-pmem` device by opening
164+
same backing file as it was configured in the first place. This means all
165+
`virtio-pmem` backing files should be present in the same locations during
166+
restore as they were during initial `virtio-pmem` configuration.
167+
168+
## Performance
169+
170+
Event thought `virtio-pmem` allows for the direct access of host pages from the
171+
guest, the performance of the first access of each page will suffer from the
172+
internal KVM page fault which will have to set up Guest physical address to Host
173+
Virtual address translation. Consecutive accesses will not need to go through
174+
this process again.

0 commit comments

Comments
 (0)