Skip to content

The shape of VM to come

pclouzet edited this page Dec 8, 2023 · 17 revisions

Install a Virtual Machine using qemu

Install qemu

Get qemu
git clone https://gitlab.com/qemu-project/qemu.git

Install qemu

mkdir build \
cd build \
../configure --enable-slirp \
make -j \
sudo make install \

Install a VM using qemu

Get an image of debian12.2.0 we want to boot on as a virtual machine:
wget https://www.debian.org/distrib/netinst/debian-12.2.0-amd64-netinst.iso .

Create a disk image (qcow2 format) where the vm will store
qemu-img create -f format qcow2 mydisk.img 20G

Install the vm running on debian with qemu:

qemu-system-x86_64 -boot d -cdrom debian-12.2.0-amd64-netinst.iso -m 4G \
-device e1000,netdev=net0,mac=52:54:00:12:34:56 -netdev user,id=net0,hostfwd=tcp::10022-:22 \
-hda mydisk.img -accel kvm

Follow all instruction from the interface and you're done. -accel kvm helps boosting the installation time (from 1h30 to 20min in my case)

Launch our new VM

Let's say we want to run debian with 8Gb of ram:
qemu-system-x86_64 -hda mydisk.img -m 8G -accel kvm

A vm can use a lot of ressources and slow down its usage, we can lighten our efforts by disabling all graphical interface: Open a terminal within the vm and run

sudo systemctl set-default multi-user.target
sudo reboot

Just in case, you can re-enable it with:

systemctl set-default graphical.target
sudo reboot

Set hardaware features

A simple example

Using qemu, let's set our VM's hardware with 4 NUMA nodes, each with 4cpus of 4,2,1 and 1Gb of memory: \

qemu-system-x86_64 -hda mydisk.img -m 8G \
        -accel kvm \
       -smp cpus=16 \
       -object memory-backend-ram,size=4G,id=ram0 \
       -object memory-backend-ram,size=2G,id=ram1 \
       -object memory-backend-ram,size=1G,id=ram2 \
       -object memory-backend-ram,size=1G,id=ram3 \
       -numa node,nodeid=0,memdev=ram0,cpus=0-3 \
       -numa node,nodeid=1,memdev=ram1,cpus=4-7 \
       -numa node,nodeid=2,memdev=ram2,cpus=8-11 \
       -numa node,nodeid=3,memdev=ram3,cpus=12-15 \

Add an nvdimm node

qemu-system-x86_64 -hda img/mydisk.img -accel kvm \
        -device e1000,netdev=net0,mac=52:54:00:12:34:56 -netdev user,id=net0,hostfwd=tcp::10022-:22 \
        -machine pc,nvdimm=on \
        -m 8G,slots=1,maxmem=9G \
        -smp cpus=16 \
        -object memory-backend-ram,size=4G,id=ram0 \
        -object memory-backend-ram,size=2G,id=ram1 \
        -object memory-backend-ram,size=1G,id=ram2 \
        -object memory-backend-ram,size=1G,id=ram3 \
        -device nvdimm,id=nvdimm1,memdev=nvdimm1,unarmed=off,node=4 \
        -object memory-backend-file,id=nvdimm1,share=on,mem-path=img/nvdimm.img,size=1G \
        -numa node,nodeid=0,memdev=ram0,cpus=0-3 \
        -numa node,nodeid=1,memdev=ram1,cpus=4-7 \
        -numa node,nodeid=2,memdev=ram2,cpus=8-11 \
        -numa node,nodeid=3,memdev=ram3,cpus=12-15 \
        -numa node,nodeid=4

By running the command: ndctl list -NRD we can list the active and enabled nvdimm devices:

{
  "dimms":[
    {
      "dev":"nmem0",
      "id":"8680-56341200",
      "handle":1,
      "phys_id":0
    }
  ],
  "regions":[
    {
      "dev":"region0",
      "size":1073741824,
      "align":16777216,
      "available_size":0,
      "max_available_extent":0,
      "type":"pmem",
      "mappings":[
        {
          "dimm":"nmem0",
          "offset":0,
          "length":1073741824,
          "position":0
        }
      ],
      "persistence_domain":"unknown",
      "namespaces":[
        {
          "dev":"namespace0.0",
          "mode":"raw",
          "size":1073741824,
          "sector_size":512,
          "blockdev":"pmem0"
        }
      ]
    }
  ]
}

By defaults, the namespaceX.Y (here namespace0.0) is set as a raw mode. Which means, the nvdimm device acts as a memory disk not supporting dax. We need to disable the namespace, create a new one and finally set mode to devdax with following commands:

sudo ndctl disable-namespace namespace0.0
sudo ndctl create-namespace -m devdax
sudo daxctl reconfigure-device -m system-ram all --force

Node 4 is now congired as dax:

{
  "dimms":[
    {
      "dev":"nmem0",
      "id":"8680-56341200",
      "handle":1,
      "phys_id":0
    }
  ],
  "regions":[
    {
      "dev":"region0",
      "size":1073741824,
      "align":16777216,
      "available_size":0,
      "max_available_extent":0,
      "type":"pmem",
      "mappings":[
        {
          "dimm":"nmem0",
          "offset":0,
          "length":1073741824,
          "position":0
        }
      ],
      "persistence_domain":"unknown",
      "namespaces":[
        {
          "dev":"namespace0.0",
          "mode":"devdax",
          "map":"dev",
          "size":1054867456,
          "uuid":"ed8bb2a9-41fb-48e0-a0b2-7dbf0d9ca9ba",
          "chardev":"dax0.0",
          "align":2097152
        }
      ]
    }
  ]
}

CXL

First we need a CXL hostbridge (Pci EXtended Bridge, i.e, pxb-cxl "cxl.1"), then we attach a root-port (cxl-rp "root_port13" here), then a Type 3 device.
In this case it is a pmem device so it needs two "memory-backend-file" objects, one for the memory ("pmem0" here) and one for its label storage area (LSA, i.e "cxl-lsa0"). Finally we need a Fixed Memory Window (FMW, i.e, cxl-fwm) to map that memory in the host:

qemu-system-x86_64 -hda img/mydisk.img -accel kvm \
        -machine q35,nvdimm=on,cxl=on \
        -device e1000,netdev=net0,mac=52:54:00:12:34:56 \
        -netdev user,id=net0,hostfwd=tcp::10022-:22 \
        -m 4G,slots=8,maxmem=8G \
        -smp 4 \
        -object memory-backend-ram,size=4G,id=mem0 \
        -numa node,nodeid=0,cpus=0-3,memdev=mem0 \
        -object memory-backend-file,id=pmem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
        -object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
        -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
        -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
        -device cxl-type3,bus=root_port13,persistent-memdev=pmem0,lsa=cxl-lsa0,id=cxl-pmem0 \
        -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G
Clone this wiki locally