-
Notifications
You must be signed in to change notification settings - Fork 37
[RFC] Support hardware-accelerated nested translation via iommufd #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,7 +17,7 @@ use byteorder::{ByteOrder, NativeEndian}; | |
| #[cfg(feature = "vfio_cdev")] | ||
| use iommufd_bindings::*; | ||
| #[cfg(feature = "vfio_cdev")] | ||
| use iommufd_ioctls::IommuFd; | ||
| use iommufd_ioctls::{IommuFd, IommufdHwptData, IommufdVDevice, IommufdVIommu}; | ||
| use log::{debug, error, warn}; | ||
| use vfio_bindings::bindings::vfio::*; | ||
| use vm_memory::{Address, GuestMemory, GuestMemoryRegion, MemoryRegionAddress}; | ||
|
|
@@ -561,6 +561,7 @@ pub struct VfioIommufd { | |
| pub(crate) iommufd: Arc<IommuFd>, | ||
| pub(crate) ioas_id: u32, | ||
| common: VfioCommon, | ||
| s1_hwpt_data_type: Option<iommu_hwpt_data_type>, | ||
| } | ||
|
|
||
| #[cfg(feature = "vfio_cdev")] | ||
|
|
@@ -572,10 +573,14 @@ impl VfioIommufd { | |
| /// * `iommufd`: the iommufd to be bound with the VFIO device | ||
| /// * `ioas_id`: the IOAS id to be bound with the VFIO device | ||
| /// * `device_fd`: An optional file handle of the hypervisor VFIO device. | ||
| /// * `s1_hwpt_data_type`: An optional IOMMU hardware page table data type. | ||
| /// - If `None`, nested HWPT is disabled. | ||
| /// - If `Some`, nested HWPT is enabled with the provided data type. | ||
| pub fn new( | ||
| iommufd: Arc<IommuFd>, | ||
| ioas_id: Option<u32>, | ||
| device_fd: Option<VfioContainerDeviceHandle>, | ||
| s1_hwpt_data_type: Option<iommu_hwpt_data_type>, | ||
| ) -> Result<Self> { | ||
| let ioas_id = match ioas_id { | ||
| Some(ioas_id) => ioas_id, | ||
|
|
@@ -599,6 +604,7 @@ impl VfioIommufd { | |
| iommufd, | ||
| ioas_id, | ||
| common: VfioCommon { device_fd }, | ||
| s1_hwpt_data_type, | ||
| }; | ||
|
|
||
| Ok(vfio_iommufd) | ||
|
|
@@ -1120,6 +1126,194 @@ impl VfioDevice { | |
| }) | ||
| } | ||
|
|
||
| #[cfg(feature = "vfio_cdev")] | ||
| /// Creates a new VFIO device backed by IOMMUFD. | ||
| /// | ||
| /// This initializes a VFIO device with support for vIOMMU and vDevice abstractions | ||
| /// when nested hardware page tables (HWPT) are configured via the `VfioIommufd` instance. | ||
| /// | ||
| /// # Arguments | ||
| /// | ||
| /// * `sysfspath` - Path to the VFIO device in sysfs. | ||
| /// * Note: Future versions may support file descriptor interfaces to be more versatile. * | ||
| /// * `vfio_ops` - The VFIO operations wrapper (must be a `VfioIommufd` instance). | ||
| /// * `viommu` - An optional vIOMMU instance. | ||
| /// - If `None` and nested HWPT is enabled, a new vIOMMU instance is created and returned. | ||
| /// - If `Some`, the provided instance is reused. | ||
|
Comment on lines
+1141
to
+1142
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure we should allow so much flexibility. If we want to maintain a clear API, I'd rather expect the caller to always create the
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very good discussion here. The goal is to provide a unified API that supports both use cases: standard mode (current behavior with iommufd where devices are not managed by a virtual IOMMU in userspace) and accelerated mode (nested HWPT with a hardware-accelerated virtual IOMMU). With the current design, the VMM maintains a consistent workflow. The only variation is whether the VfioIommufd instance is initialized with nested_hwpt enabled. While the interfaces could be decomposed into more primitive operations, this would significantly increase the management burden on the VMM without providing clear added value. Comparison of the workflows from the caller (e.g. userspace VMM): // 1. Current Proposal (Unified API)
// The VMM only handles high-level initialization.
let (vfio_device, iommufd_vdevice) = VfioDevice::new_with_iommufd(
vfio_path,
vfio_iommufd,
&mut iommufd_viommu,
virt_sid
);// 2. Hypothetical "Primitive" API
// This forces the VMM to manually glue the components together.
let vfio_device = VfioDevice::new(vfio_path, vfio_iommufd);
// The VMM must manually extract IDs and link objects:
// new API for the accelerated mode only
let vfio_dev_id = vfio_device.get_dev_id();
// new API from iommufd that VMM needs to interact with directly
let iommufd_viommu = IommufdVIommu::new(iommufd, vfio_dev_id);
// new API from iommufd that VMM needs to interact with directly
let iommufd_vdevice = IommufdVDevice::new(iommufd_viommu, virt_sid);
// And manually attach the Stage-1 page table:
vfio_device.attach_default_s1_hwpt(iommufd_viommu); // new API for the accelerated mode only
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I acknowledge that balancing API simplicity with effective encapsulation is always a trade-off. It will be easier to gauge the trade-off with this design once we have a concrete implementation. We are currently working on that reference case: integrating accelerated vSMMUv3 support into Cloud Hypervisor.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the decision comes down to the expectation we have from this crate. I always think of this crate as a simple Rust layer, which is why I'm expecting the implementation to be as simple as possible. But if others think it's a good idea to embed a bit more logic into it, I'm fine with it! |
||
| /// * Note: The reused vIOMMU must be associated with the same physical IOMMU as this device; | ||
| /// otherwise, vDevice creation will fail and error will be returned.* | ||
| /// * `virt_sid` - The Virtual Stream ID. This is required if `s1_hwpt_data_type` is | ||
| /// configured in the `VfioIommufd` instance (i.e., nested HWPT is active). | ||
| /// | ||
| /// # Returns | ||
| /// | ||
| /// A tuple containing the initialized `VfioDevice` and an optional `IommufdVDevice` | ||
| /// (present only when nested HWPT is configured). | ||
| pub fn new_with_iommufd( | ||
| sysfspath: &Path, | ||
| vfio_ops: Arc<dyn VfioOps>, | ||
| viommu: &mut Option<Arc<IommufdVIommu>>, | ||
| virt_sid: Option<u64>, | ||
| ) -> Result<(Self, Option<IommufdVDevice>)> { | ||
| let vfio_iommufd = | ||
| if let Some(vfio_iommufd) = vfio_ops.as_any().downcast_ref::<VfioIommufd>() { | ||
| vfio_iommufd | ||
| } else { | ||
| return Err(VfioError::DowncastVfioOps); | ||
| }; | ||
|
|
||
| let (device_info, iommufd_vdevice) = { | ||
| // Open the vfio cdev file | ||
| let device = Self::get_device_cdev_from_path(sysfspath)?; | ||
|
|
||
| // Add the vfio cdev file to VFIO-KVM device tracking | ||
| vfio_iommufd | ||
| .common | ||
| .device_set_fd(device.as_raw_fd(), true)?; | ||
|
|
||
| // Bind the VFIO device to the iommufd file | ||
| let mut bind = vfio_device_bind_iommufd { | ||
| argsz: mem::size_of::<vfio_device_bind_iommufd>() as u32, | ||
| flags: 0, | ||
| iommufd: vfio_iommufd.iommufd.as_raw_fd(), | ||
| out_devid: 0, | ||
| }; | ||
| vfio_syscall::bind_device_iommufd(&device, &mut bind)?; | ||
|
|
||
| let iommufd_vdevice = match vfio_iommufd.s1_hwpt_data_type { | ||
| // When no s1 hwpt is used, associate the vfio device to the IOAS within the bound iommufd | ||
| None => { | ||
| let mut attach_data = vfio_device_attach_iommufd_pt { | ||
| argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32, | ||
| flags: 0, | ||
| pt_id: vfio_iommufd.ioas_id, | ||
| }; | ||
| vfio_syscall::attach_device_iommufd_pt(&device, &mut attach_data)?; | ||
|
|
||
| None | ||
| } | ||
| // When s1 hwpt is used, create and attach vIOMMU and vDevice for nested (s1+s2) hwpt setup | ||
| Some(s1_hwpt_data_type) => { | ||
| let virt_id = if let Some(virt_sid) = virt_sid { | ||
| virt_sid | ||
| } else { | ||
| return Err(VfioError::MissingVirtSid); | ||
| }; | ||
|
|
||
| let viommu = if let Some(viommu) = viommu { | ||
| // Reuse the passed in vIOMMU instance if available | ||
| viommu.clone() | ||
| } else { | ||
| // Allocate an instance of vIOMMU for the vfio device if no instance is passed in | ||
| let new_viommu = IommufdVIommu::new( | ||
| vfio_iommufd.iommufd.clone(), | ||
| vfio_iommufd.ioas_id, | ||
| bind.out_devid, | ||
| s1_hwpt_data_type, | ||
| ) | ||
| .map_err(VfioError::NewIommufdVIommu)?; | ||
|
|
||
| let viommu_arc = Arc::new(new_viommu); | ||
| *viommu = Some(viommu_arc.clone()); | ||
|
|
||
| viommu_arc | ||
| }; | ||
|
|
||
| // Allocate an instance of vDevice | ||
| let vdevice = IommufdVDevice::new(viommu.clone(), bind.out_devid, virt_id) | ||
| .map_err(VfioError::NewIommufdVDevice)?; | ||
|
|
||
| // Attach the vfio cdev device to the s1_bypass_hwpt | ||
| let mut attach_data = vfio_device_attach_iommufd_pt { | ||
| argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32, | ||
| flags: 0, | ||
| pt_id: viommu.bypass_hwpt_id, | ||
| }; | ||
| vfio_syscall::attach_device_iommufd_pt(&device, &mut attach_data)?; | ||
|
|
||
| Some(vdevice) | ||
| } | ||
| }; | ||
|
|
||
| let dev_info = VfioDeviceInfo::get_device_info(&device)?; | ||
| let dev_info = VfioDeviceInfo::new(device, &dev_info); | ||
|
|
||
| (dev_info, iommufd_vdevice) | ||
| }; | ||
|
|
||
| let regions = device_info.get_regions()?; | ||
| let irqs = device_info.get_irqs()?; | ||
|
|
||
| Ok(( | ||
| VfioDevice { | ||
| device: ManuallyDrop::new(device_info.device), | ||
| flags: device_info.flags, | ||
| regions, | ||
| irqs, | ||
| sysfspath: sysfspath.to_path_buf(), | ||
| vfio_ops, | ||
| }, | ||
| iommufd_vdevice, | ||
| )) | ||
| } | ||
|
|
||
| #[cfg(feature = "vfio_cdev")] | ||
| /// Uninstall s1 hwpt for the vfio device. | ||
| ///// # Parameters | ||
| /// * `vdevice`: the `IommufdVDevice` instance associated with the vfio device. | ||
| /// * `abort`: if true, use s1 abort_hwpt; if false, use s1 bypass_hwpt. | ||
| pub fn uninstall_s1_hwpt(&self, vdevice: &mut IommufdVDevice, abort: bool) -> Result<()> { | ||
| // Attach to bypass hwpt or abort hwpt based on the 'abort' flag | ||
| let hwpt_id = if abort { | ||
| vdevice.viommu.abort_hwpt_id | ||
| } else { | ||
| vdevice.viommu.bypass_hwpt_id | ||
| }; | ||
| let mut attach_data = vfio_device_attach_iommufd_pt { | ||
| argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32, | ||
| flags: 0, | ||
| pt_id: hwpt_id, | ||
| }; | ||
| vfio_syscall::attach_device_iommufd_pt(&self.device, &mut attach_data)?; | ||
|
|
||
| // Destroy s1 hwpt | ||
| vdevice | ||
| .destroy_s1_hwpt() | ||
| .map_err(VfioError::IommufdS1HwptDestroy)?; | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| #[cfg(feature = "vfio_cdev")] | ||
| /// Install s1 hwpt for the vfio device based on the input hwpt data. | ||
| //// | ||
| /// # Parameters | ||
| /// * `vdevice`: the `IommufdVDevice` instance associated with the vfio device. | ||
| /// * `hwpt_data`: the hwpt data to create s1 hwpt. | ||
| pub fn install_s1_hwpt( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I'd define the install function above the uninstall one. |
||
| &self, | ||
| vdevice: &mut IommufdVDevice, | ||
| hwpt_data: &IommufdHwptData, | ||
| ) -> Result<()> { | ||
| // Uninstall existing s1 hwpt if exists | ||
| self.uninstall_s1_hwpt(vdevice, true)?; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this should be part of the install function. The function should fail if some page tables are already there (meaning the caller should be in charge of installing/uninstalling). The API should be as simple as possible, which means it shouldn't perform too many tasks (I think the caller should be in charge of driving the creation/cleanup).
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this more or less falls into the same trade-off as discussed above, though I am much less opinionated in this case - given the |
||
|
|
||
| // Create s1 hwpt based on the input data | ||
| let s1_hwpt_id = vdevice | ||
| .allocate_s1_hwpt(hwpt_data) | ||
| .map_err(VfioError::IommufdS1HwptAlloc)?; | ||
|
|
||
| // Attach the vfio device to the newly created s1 hwpt | ||
| let mut attach_data = vfio_device_attach_iommufd_pt { | ||
| argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32, | ||
| flags: 0, | ||
| pt_id: s1_hwpt_id, | ||
| }; | ||
| vfio_syscall::attach_device_iommufd_pt(&self.device, &mut attach_data)?; | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| /// VFIO device reset only if the device supports being reset. | ||
| pub fn reset(&self) { | ||
| if self.flags & VFIO_DEVICE_FLAGS_RESET != 0 { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using
nested_hwptconveys more clearly that we're trying to useHWPT_NESTEDwith IOMMUFD.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion.