-
Notifications
You must be signed in to change notification settings - Fork 195
The shape of VM to come
Get qemu
git clone https://gitlab.com/qemu-project/qemu.git
Install qemu
mkdir build \
cd build \
../configure --enable-slirp \
make -j \
sudo make install \
Get an image of debian12.2.0 we want to boot on as a virtual machine:
wget https://www.debian.org/distrib/netinst/debian-12.2.0-amd64-netinst.iso .
Create a disk image (qcow2 format) where the vm will store
qemu-img create -f format qcow2 mydisk.img 20G
Install the vm running on debian with qemu:
qemu-system-x86_64 -boot d -cdrom debian-12.2.0-amd64-netinst.iso -m 4G \
-device e1000,netdev=net0,mac=52:54:00:12:34:56 -netdev user,id=net0,hostfwd=tcp::10022-:22 \
-hda mydisk.img -accel kvm
Follow all instruction from the interface and you're done. -accel kvm helps boosting the installation time (from 1h30 to 20min in my case)
Let's say we want to run debian with 8Gb of ram:
qemu-system-x86_64 -hda mydisk.img -m 8G -accel kvm
A vm can use a lot of ressources and slow down its usage, we can lighten our efforts by disabling all graphical interface: Open a terminal within the vm and run
sudo systemctl set-default multi-user.target
sudo reboot
Just in case, you can re-enable it with:
systemctl set-default graphical.target
sudo reboot
Using qemu, let's set our VM's hardware with 4 NUMA nodes, each with 4cpus of 4,2,1 and 1Gb of memory: \
qemu-system-x86_64 -hda mydisk.img -m 8G \
-accel kvm \
-smp cpus=16 \
-object memory-backend-ram,size=4G,id=ram0 \
-object memory-backend-ram,size=2G,id=ram1 \
-object memory-backend-ram,size=1G,id=ram2 \
-object memory-backend-ram,size=1G,id=ram3 \
-numa node,nodeid=0,memdev=ram0,cpus=0-3 \
-numa node,nodeid=1,memdev=ram1,cpus=4-7 \
-numa node,nodeid=2,memdev=ram2,cpus=8-11 \
-numa node,nodeid=3,memdev=ram3,cpus=12-15 \
qemu-system-x86_64 -hda img/mydisk.img -accel kvm \
-device e1000,netdev=net0,mac=52:54:00:12:34:56 -netdev user,id=net0,hostfwd=tcp::10022-:22 \
-machine pc,nvdimm=on \
-m 8G,slots=1,maxmem=9G \
-smp cpus=16 \
-object memory-backend-ram,size=4G,id=ram0 \
-object memory-backend-ram,size=2G,id=ram1 \
-object memory-backend-ram,size=1G,id=ram2 \
-object memory-backend-ram,size=1G,id=ram3 \
-device nvdimm,id=nvdimm1,memdev=nvdimm1,unarmed=off,node=4 \
-object memory-backend-file,id=nvdimm1,share=on,mem-path=img/nvdimm.img,size=1G \
-numa node,nodeid=0,memdev=ram0,cpus=0-3 \
-numa node,nodeid=1,memdev=ram1,cpus=4-7 \
-numa node,nodeid=2,memdev=ram2,cpus=8-11 \
-numa node,nodeid=3,memdev=ram3,cpus=12-15 \
-numa node,nodeid=4
By running the command: ndctl list -NRD
we can list the active and enabled nvdimm devices:
{
"dimms":[
{
"dev":"nmem0",
"id":"8680-56341200",
"handle":1,
"phys_id":0
}
],
"regions":[
{
"dev":"region0",
"size":1073741824,
"align":16777216,
"available_size":0,
"max_available_extent":0,
"type":"pmem",
"mappings":[
{
"dimm":"nmem0",
"offset":0,
"length":1073741824,
"position":0
}
],
"persistence_domain":"unknown",
"namespaces":[
{
"dev":"namespace0.0",
"mode":"raw",
"size":1073741824,
"sector_size":512,
"blockdev":"pmem0"
}
]
}
]
}
By defaults, the namespaceX.Y (here namespace0.0) is set as a raw mode. Which means, the nvdimm device acts as a memory disk not supporting dax. We need to disable the namespace, create a new one and finally set mode to devdax with following commands:
sudo ndctl disable-namespace namespace0.0
sudo ndctl create-namespace -m devdax
sudo daxctl reconfigure-device -m system-ram all --force
Node 4 is now congired as dax:
{
"dimms":[
{
"dev":"nmem0",
"id":"8680-56341200",
"handle":1,
"phys_id":0
}
],
"regions":[
{
"dev":"region0",
"size":1073741824,
"align":16777216,
"available_size":0,
"max_available_extent":0,
"type":"pmem",
"mappings":[
{
"dimm":"nmem0",
"offset":0,
"length":1073741824,
"position":0
}
],
"persistence_domain":"unknown",
"namespaces":[
{
"dev":"namespace0.0",
"mode":"devdax",
"map":"dev",
"size":1054867456,
"uuid":"ed8bb2a9-41fb-48e0-a0b2-7dbf0d9ca9ba",
"chardev":"dax0.0",
"align":2097152
}
]
}
]
}
First we need a CXL hostbridge (Pci EXtended Bridge, i.e, pxb-cxl "cxl.1"), then we attach a root-port (cxl-rp "root_port13" here), then a Type 3 device.
In this case it is a pmem device so it needs two "memory-backend-file" objects, one for the memory ("pmem0" here) and one for its label storage area (LSA, i.e "cxl-lsa0"). Finally we need a Fixed Memory Window (FMW, i.e, cxl-fwm) to map that memory in the host:
qemu-system-x86_64 -hda img/mydisk.img -accel kvm \
-machine q35,nvdimm=on,cxl=on \
-device e1000,netdev=net0,mac=52:54:00:12:34:56 \
-netdev user,id=net0,hostfwd=tcp::10022-:22 \
-m 4G,slots=8,maxmem=8G \
-smp 4 \
-object memory-backend-ram,size=4G,id=mem0 \
-numa node,nodeid=0,cpus=0-3,memdev=mem0 \
-object memory-backend-file,id=pmem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa.raw,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,persistent-memdev=pmem0,lsa=cxl-lsa0,id=cxl-pmem0 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G