|
| 1 | +# Using the Firecracker `virtio-pmem` device |
| 2 | + |
| 3 | +## What is a persistent memory device |
| 4 | + |
| 5 | +Persistent memory is a type of non-volatile, CPU accessible (with usual |
| 6 | +load/store instructions) memory that does not lose its content on power loss. In |
| 7 | +other words all writes to the memory persist over the power cycle. In hardware |
| 8 | +this known as NVDIMM memory (Non Volatile Double Inline Memory Module). |
| 9 | + |
| 10 | +## What is a `virtio-pmem` device: |
| 11 | + |
| 12 | +[`virtio-pmem`](https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-68900019) |
| 13 | +is a device which emulates a persistent memory device without requiring a |
| 14 | +physical NVDIMM device be present on the host system. `virtio-pmem` is backed by |
| 15 | +a memory mapped file on the host side and is exposed to the guest kernel as an |
| 16 | +region in the guest physical memory. This allows the guest to directly access |
| 17 | +the host memory pages without a need to use guest driver or interact with VMM. |
| 18 | +From guest user-space perspective `virtio-pmem` devices are presented as normal |
| 19 | +block device like `/dev/pmem0`. This allows `virtio-pmem` to be used as rootfs |
| 20 | +device and make VM boot from it. |
| 21 | + |
| 22 | +> [!NOTE] |
| 23 | +> |
| 24 | +> Since `virtio-pmem` is located fully in memory, when used as a block device |
| 25 | +> there is no need to use guest page cache for it's operations. This behaviour |
| 26 | +> can be configured by using `DAX` feature of the kernel. |
| 27 | +> |
| 28 | +> - To mount a device with `DAX` add `--flags=dax` to the `mount` command. |
| 29 | +> - To configure a root device with `DAX` append `rootflags=dax` to the kernel |
| 30 | +> arguments. |
| 31 | +> |
| 32 | +> `DAX` support is not uniform for all file systems. Check the documentation for |
| 33 | +> the file system you want to use before enabling `DAX`. |
| 34 | +
|
| 35 | +## Prerequisites |
| 36 | + |
| 37 | +In order to use `virtio-pmem` device, guest kernel needs to built with support |
| 38 | +for it. The full list of configuration options needed for `virtio-pmem` and |
| 39 | +`DAX`: |
| 40 | + |
| 41 | +``` |
| 42 | +# Needed for DAX on aarch64. Will be ignored on x86_64 |
| 43 | +CONFIG_ARM64_PMEM=y |
| 44 | +
|
| 45 | +CONFIG_DEVICE_MIGRATION=y |
| 46 | +CONFIG_ZONE_DEVICE=y |
| 47 | +CONFIG_VIRTIO_PMEM=y |
| 48 | +CONFIG_LIBNVDIMM=y |
| 49 | +CONFIG_BLK_DEV_PMEM=y |
| 50 | +CONFIG_ND_CLAIM=y |
| 51 | +CONFIG_ND_BTT=y |
| 52 | +CONFIG_BTT=y |
| 53 | +CONFIG_ND_PFN=y |
| 54 | +CONFIG_NVDIMM_PFN=y |
| 55 | +CONFIG_NVDIMM_DAX=y |
| 56 | +CONFIG_OF_PMEM=y |
| 57 | +CONFIG_NVDIMM_KEYS=y |
| 58 | +CONFIG_DAX=y |
| 59 | +CONFIG_DEV_DAX=y |
| 60 | +CONFIG_DEV_DAX_PMEM=y |
| 61 | +CONFIG_DEV_DAX_KMEM=y |
| 62 | +CONFIG_FS_DAX=y |
| 63 | +CONFIG_FS_DAX_PMD=y |
| 64 | +``` |
| 65 | + |
| 66 | +## Configuration |
| 67 | + |
| 68 | +Firecracker implementation exposes these config options for the `virtio-pmem` |
| 69 | +device: |
| 70 | + |
| 71 | +- `id` - id of the device for internal use |
| 72 | +- `path_on_host` - path to the backing file |
| 73 | +- `root_device` - toggle to use this device as root device. Device will be |
| 74 | + marked as `rw` in the kernel arguments |
| 75 | +- `read_only` - tells Firecracker to `mmap` the backing file in read-only mode. |
| 76 | + If this device is also configured as `root_device`, it will be marked as `ro` |
| 77 | + in the kernel arguments |
| 78 | + |
| 79 | +> [!NOTE] |
| 80 | +> |
| 81 | +> Devices will be exposed to the guest in the order in which they are configured |
| 82 | +> with sequential names in the for `/dev/pmem{N}` like: `/dev/pmem0`, |
| 83 | +> `/dev/pmem1` ... |
| 84 | +
|
| 85 | +> [!WARNING] |
| 86 | +> |
| 87 | +> Setting `virtio-pmem` device to `read-only` mode can lead to VM shutting down |
| 88 | +> on any attempt to write to the device. This is because from guest kernel |
| 89 | +> perspective `virtio-pmem` is always `read-write` capable. Use `read-only` mode |
| 90 | +> only if you want to ensure the underlying file is never written to. |
| 91 | +> |
| 92 | +> To mount the `pmem` device with `read-only` options add `-o ro` to the `mount` |
| 93 | +> command. |
| 94 | +> |
| 95 | +> The exact behaviour differs per platform: |
| 96 | +> |
| 97 | +> - x86_64 - if KVM is able to decode the write instruction used by the guest, |
| 98 | +> it will return a MMIO_WRITE to the Firecracker where it will be discarded |
| 99 | +> and the warning log will be printed. |
| 100 | +> - aarch64 - the instruction emulation is much stricter. Writes will result in |
| 101 | +> an internal KVM error which will be returned to Firecracker in a form of an |
| 102 | +> `ENOSYS` error. This will make Firecracker stop the VM with appropriate log |
| 103 | +> message. |
| 104 | +
|
| 105 | +> [!WARNING] |
| 106 | +> |
| 107 | +> `virtio-pmem` requires for the guest exposed memory region to be 2MB aligned. |
| 108 | +> This requirement is transitively carried to the backing file of the |
| 109 | +> `virtio-pmem`. Firecracker allows users to configure `virtio-pmem` with |
| 110 | +> backing file of any size and fills the memory gap between the end of the file |
| 111 | +> and the 2MB boundary with empty `PRIVATE | ANONYMOUS` memory pages. Users must |
| 112 | +> be careful to not write to this memory gap since it will not be synchronized |
| 113 | +> with backing file. This is not an issue if `virtio-pmem` is configured in |
| 114 | +> `read-only` mode. |
| 115 | +
|
| 116 | +### Config file |
| 117 | + |
| 118 | +Configuration of the `virtio-pmem` device from config file follows similar |
| 119 | +pattern to `virtio-block` section. Here is an example configuration for a single |
| 120 | +`virtio-pmem` device: |
| 121 | + |
| 122 | +```json |
| 123 | +"pmem": [ |
| 124 | + { |
| 125 | + "id": "pmem0", |
| 126 | + "path_on_host": "./some_file", |
| 127 | + "root_device": true, |
| 128 | + "read_only": false |
| 129 | + } |
| 130 | +] |
| 131 | +``` |
| 132 | + |
| 133 | +### API |
| 134 | + |
| 135 | +Similar to other devices `virtio-pmem` can be configured with API calls. An |
| 136 | +example of configuration request: |
| 137 | + |
| 138 | +```console |
| 139 | +curl --unix-socket $socket_location -i \ |
| 140 | + -X PUT 'http://localhost/pmem/pmem0' \ |
| 141 | + -H 'Accept: application/json' \ |
| 142 | + -H 'Content-Type: application/json' \ |
| 143 | + -d "{ |
| 144 | + \"id\": \"pmem0\", |
| 145 | + \"path_on_host\": \"./some_file\", |
| 146 | + \"root_device\": true, |
| 147 | + \"read_only\": false |
| 148 | + }" |
| 149 | +``` |
| 150 | + |
| 151 | +## Security |
| 152 | + |
| 153 | +`virtio-pmem` can used for sharing of underlying backing file between multiple |
| 154 | +VMs by providing same backing file to `virtio-pmem` devices of corresponding |
| 155 | +VMs. This scenario imposes a security risk of side channel attacks between VMs. |
| 156 | +Users are encouraged to evaluate risks before using `virtio-pmem` for such |
| 157 | +scenarios. |
| 158 | + |
| 159 | +## Snapshot support |
| 160 | + |
| 161 | +`virtio-pmem` works with snapshot functionality of Firecracker. Snapshot will |
| 162 | +contain the configuration options provided by the user. During restoration |
| 163 | +process, Firecracker will attempt to restore `virtio-pmem` device by opening |
| 164 | +same backing file as it was configured in the first place. This means all |
| 165 | +`virtio-pmem` backing files should be present in the same locations during |
| 166 | +restore as they were during initial `virtio-pmem` configuration. |
| 167 | + |
| 168 | +## Performance |
| 169 | + |
| 170 | +Event thought `virtio-pmem` allows for the direct access of host pages from the |
| 171 | +guest, the performance of the first access of each page will suffer from the |
| 172 | +internal KVM page fault which will have to set up Guest physical address to Host |
| 173 | +Virtual address translation. Consecutive accesses will not need to go through |
| 174 | +this process again. |
0 commit comments