Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions vfio-ioctls/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,5 @@ mshv-bindings = { version = "0.6.5", features = [
"fam-wrappers",
], optional = true }
mshv-ioctls = { version = "0.6.5", optional = true }
iommufd-bindings = { git = "https://github.com/cloud-hypervisor/iommufd", rev = "083c016", optional = true }
iommufd-ioctls = { git = "https://github.com/cloud-hypervisor/iommufd", rev = "083c016", optional = true }
iommufd-bindings = { git = "https://github.com/likebreath/iommufd", branch = "0129/rfc_viommu_vdevice", optional = true }
iommufd-ioctls = { git = "https://github.com/likebreath/iommufd", branch = "0129/rfc_viommu_vdevice", optional = true }
15 changes: 15 additions & 0 deletions vfio-ioctls/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,21 @@ pub enum VfioError {
#[cfg(feature = "vfio_cdev")]
#[error("failed iommufd ioctl")]
IommufdIoctlError(#[source] IommufdError),
#[cfg(feature = "vfio_cdev")]
#[error("missing virt_sid for S1 HWPT setup")]
MissingVirtSid,
#[cfg(feature = "vfio_cdev")]
#[error("failed to create iommufd vIOMMU")]
NewIommufdVIommu(#[source] IommufdError),
#[cfg(feature = "vfio_cdev")]
#[error("failed to create iommufd vDevice")]
NewIommufdVDevice(#[source] IommufdError),
#[cfg(feature = "vfio_cdev")]
#[error("failed to destroy s1 hwpt")]
IommufdS1HwptDestroy(#[source] IommufdError),
#[cfg(feature = "vfio_cdev")]
#[error("failed to allocate s1 hwpt")]
IommufdS1HwptAlloc(#[source] IommufdError),
}

/// Specialized version of `Result` for VFIO subsystem.
Expand Down
196 changes: 195 additions & 1 deletion vfio-ioctls/src/vfio_device.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ use byteorder::{ByteOrder, NativeEndian};
#[cfg(feature = "vfio_cdev")]
use iommufd_bindings::*;
#[cfg(feature = "vfio_cdev")]
use iommufd_ioctls::IommuFd;
use iommufd_ioctls::{IommuFd, IommufdHwptData, IommufdVDevice, IommufdVIommu};
use log::{debug, error, warn};
use vfio_bindings::bindings::vfio::*;
use vm_memory::{Address, GuestMemory, GuestMemoryRegion, MemoryRegionAddress};
Expand Down Expand Up @@ -561,6 +561,7 @@ pub struct VfioIommufd {
pub(crate) iommufd: Arc<IommuFd>,
pub(crate) ioas_id: u32,
common: VfioCommon,
s1_hwpt_data_type: Option<iommu_hwpt_data_type>,
}

#[cfg(feature = "vfio_cdev")]
Expand All @@ -572,10 +573,14 @@ impl VfioIommufd {
/// * `iommufd`: the iommufd to be bound with the VFIO device
/// * `ioas_id`: the IOAS id to be bound with the VFIO device
/// * `device_fd`: An optional file handle of the hypervisor VFIO device.
/// * `s1_hwpt_data_type`: An optional IOMMU hardware page table data type.
/// - If `None`, nested HWPT is disabled.
/// - If `Some`, nested HWPT is enabled with the provided data type.
pub fn new(
iommufd: Arc<IommuFd>,
ioas_id: Option<u32>,
device_fd: Option<VfioContainerDeviceHandle>,
s1_hwpt_data_type: Option<iommu_hwpt_data_type>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s1_hwpt_data_type: Option<iommu_hwpt_data_type>,
nested_hwpt: Option<iommu_hwpt_data_type>,

I think using nested_hwpt conveys more clearly that we're trying to use HWPT_NESTED with IOMMUFD.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion.

) -> Result<Self> {
let ioas_id = match ioas_id {
Some(ioas_id) => ioas_id,
Expand All @@ -599,6 +604,7 @@ impl VfioIommufd {
iommufd,
ioas_id,
common: VfioCommon { device_fd },
s1_hwpt_data_type,
};

Ok(vfio_iommufd)
Expand Down Expand Up @@ -1120,6 +1126,194 @@ impl VfioDevice {
})
}

#[cfg(feature = "vfio_cdev")]
/// Creates a new VFIO device backed by IOMMUFD.
///
/// This initializes a VFIO device with support for vIOMMU and vDevice abstractions
/// when nested hardware page tables (HWPT) are configured via the `VfioIommufd` instance.
///
/// # Arguments
///
/// * `sysfspath` - Path to the VFIO device in sysfs.
/// * Note: Future versions may support file descriptor interfaces to be more versatile. *
/// * `vfio_ops` - The VFIO operations wrapper (must be a `VfioIommufd` instance).
/// * `viommu` - An optional vIOMMU instance.
/// - If `None` and nested HWPT is enabled, a new vIOMMU instance is created and returned.
/// - If `Some`, the provided instance is reused.
Comment on lines +1141 to +1142
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should allow so much flexibility. If we want to maintain a clear API, I'd rather expect the caller to always create the IommufdVIommu (when needed). That means this function should only return Result<Self>.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good discussion here.

The goal is to provide a unified API that supports both use cases: standard mode (current behavior with iommufd where devices are not managed by a virtual IOMMU in userspace) and accelerated mode (nested HWPT with a hardware-accelerated virtual IOMMU).

With the current design, the VMM maintains a consistent workflow. The only variation is whether the VfioIommufd instance is initialized with nested_hwpt enabled.

While the interfaces could be decomposed into more primitive operations, this would significantly increase the management burden on the VMM without providing clear added value.

Comparison of the workflows from the caller (e.g. userspace VMM):

// 1. Current Proposal (Unified API)
// The VMM only handles high-level initialization.
let (vfio_device, iommufd_vdevice) = VfioDevice::new_with_iommufd(
    vfio_path,
    vfio_iommufd,
    &mut iommufd_viommu,
    virt_sid
);
// 2. Hypothetical "Primitive" API
// This forces the VMM to manually glue the components together.
let vfio_device = VfioDevice::new(vfio_path, vfio_iommufd);

// The VMM must manually extract IDs and link objects:
// new API for the accelerated mode only
let vfio_dev_id = vfio_device.get_dev_id();  
// new API from iommufd that VMM needs to interact with directly
let iommufd_viommu = IommufdVIommu::new(iommufd, vfio_dev_id); 
// new API from iommufd that VMM needs to interact with directly
let iommufd_vdevice = IommufdVDevice::new(iommufd_viommu, virt_sid); 

// And manually attach the Stage-1 page table:
vfio_device.attach_default_s1_hwpt(iommufd_viommu);  // new API for the accelerated mode only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I acknowledge that balancing API simplicity with effective encapsulation is always a trade-off.

It will be easier to gauge the trade-off with this design once we have a concrete implementation. We are currently working on that reference case: integrating accelerated vSMMUv3 support into Cloud Hypervisor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decision comes down to the expectation we have from this crate. I always think of this crate as a simple Rust layer, which is why I'm expecting the implementation to be as simple as possible. But if others think it's a good idea to embed a bit more logic into it, I'm fine with it!

/// * Note: The reused vIOMMU must be associated with the same physical IOMMU as this device;
/// otherwise, vDevice creation will fail and error will be returned.*
/// * `virt_sid` - The Virtual Stream ID. This is required if `s1_hwpt_data_type` is
/// configured in the `VfioIommufd` instance (i.e., nested HWPT is active).
///
/// # Returns
///
/// A tuple containing the initialized `VfioDevice` and an optional `IommufdVDevice`
/// (present only when nested HWPT is configured).
pub fn new_with_iommufd(
sysfspath: &Path,
vfio_ops: Arc<dyn VfioOps>,
viommu: &mut Option<Arc<IommufdVIommu>>,
virt_sid: Option<u64>,
) -> Result<(Self, Option<IommufdVDevice>)> {
let vfio_iommufd =
if let Some(vfio_iommufd) = vfio_ops.as_any().downcast_ref::<VfioIommufd>() {
vfio_iommufd
} else {
return Err(VfioError::DowncastVfioOps);
};

let (device_info, iommufd_vdevice) = {
// Open the vfio cdev file
let device = Self::get_device_cdev_from_path(sysfspath)?;

// Add the vfio cdev file to VFIO-KVM device tracking
vfio_iommufd
.common
.device_set_fd(device.as_raw_fd(), true)?;

// Bind the VFIO device to the iommufd file
let mut bind = vfio_device_bind_iommufd {
argsz: mem::size_of::<vfio_device_bind_iommufd>() as u32,
flags: 0,
iommufd: vfio_iommufd.iommufd.as_raw_fd(),
out_devid: 0,
};
vfio_syscall::bind_device_iommufd(&device, &mut bind)?;

let iommufd_vdevice = match vfio_iommufd.s1_hwpt_data_type {
// When no s1 hwpt is used, associate the vfio device to the IOAS within the bound iommufd
None => {
let mut attach_data = vfio_device_attach_iommufd_pt {
argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32,
flags: 0,
pt_id: vfio_iommufd.ioas_id,
};
vfio_syscall::attach_device_iommufd_pt(&device, &mut attach_data)?;

None
}
// When s1 hwpt is used, create and attach vIOMMU and vDevice for nested (s1+s2) hwpt setup
Some(s1_hwpt_data_type) => {
let virt_id = if let Some(virt_sid) = virt_sid {
virt_sid
} else {
return Err(VfioError::MissingVirtSid);
};

let viommu = if let Some(viommu) = viommu {
// Reuse the passed in vIOMMU instance if available
viommu.clone()
} else {
// Allocate an instance of vIOMMU for the vfio device if no instance is passed in
let new_viommu = IommufdVIommu::new(
vfio_iommufd.iommufd.clone(),
vfio_iommufd.ioas_id,
bind.out_devid,
s1_hwpt_data_type,
)
.map_err(VfioError::NewIommufdVIommu)?;

let viommu_arc = Arc::new(new_viommu);
*viommu = Some(viommu_arc.clone());

viommu_arc
};

// Allocate an instance of vDevice
let vdevice = IommufdVDevice::new(viommu.clone(), bind.out_devid, virt_id)
.map_err(VfioError::NewIommufdVDevice)?;

// Attach the vfio cdev device to the s1_bypass_hwpt
let mut attach_data = vfio_device_attach_iommufd_pt {
argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32,
flags: 0,
pt_id: viommu.bypass_hwpt_id,
};
vfio_syscall::attach_device_iommufd_pt(&device, &mut attach_data)?;

Some(vdevice)
}
};

let dev_info = VfioDeviceInfo::get_device_info(&device)?;
let dev_info = VfioDeviceInfo::new(device, &dev_info);

(dev_info, iommufd_vdevice)
};

let regions = device_info.get_regions()?;
let irqs = device_info.get_irqs()?;

Ok((
VfioDevice {
device: ManuallyDrop::new(device_info.device),
flags: device_info.flags,
regions,
irqs,
sysfspath: sysfspath.to_path_buf(),
vfio_ops,
},
iommufd_vdevice,
))
}

#[cfg(feature = "vfio_cdev")]
/// Uninstall s1 hwpt for the vfio device.
///// # Parameters
/// * `vdevice`: the `IommufdVDevice` instance associated with the vfio device.
/// * `abort`: if true, use s1 abort_hwpt; if false, use s1 bypass_hwpt.
pub fn uninstall_s1_hwpt(&self, vdevice: &mut IommufdVDevice, abort: bool) -> Result<()> {
// Attach to bypass hwpt or abort hwpt based on the 'abort' flag
let hwpt_id = if abort {
vdevice.viommu.abort_hwpt_id
} else {
vdevice.viommu.bypass_hwpt_id
};
let mut attach_data = vfio_device_attach_iommufd_pt {
argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32,
flags: 0,
pt_id: hwpt_id,
};
vfio_syscall::attach_device_iommufd_pt(&self.device, &mut attach_data)?;

// Destroy s1 hwpt
vdevice
.destroy_s1_hwpt()
.map_err(VfioError::IommufdS1HwptDestroy)?;

Ok(())
}

#[cfg(feature = "vfio_cdev")]
/// Install s1 hwpt for the vfio device based on the input hwpt data.
////
/// # Parameters
/// * `vdevice`: the `IommufdVDevice` instance associated with the vfio device.
/// * `hwpt_data`: the hwpt data to create s1 hwpt.
pub fn install_s1_hwpt(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd define the install function above the uninstall one.

&self,
vdevice: &mut IommufdVDevice,
hwpt_data: &IommufdHwptData,
) -> Result<()> {
// Uninstall existing s1 hwpt if exists
self.uninstall_s1_hwpt(vdevice, true)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be part of the install function. The function should fail if some page tables are already there (meaning the caller should be in charge of installing/uninstalling). The API should be as simple as possible, which means it shouldn't perform too many tasks (I think the caller should be in charge of driving the creation/cleanup).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this more or less falls into the same trade-off as discussed above, though I am much less opinionated in this case - given the uninstall_s1_hwpt() is always exposed to the caller and is very simple to use.


// Create s1 hwpt based on the input data
let s1_hwpt_id = vdevice
.allocate_s1_hwpt(hwpt_data)
.map_err(VfioError::IommufdS1HwptAlloc)?;

// Attach the vfio device to the newly created s1 hwpt
let mut attach_data = vfio_device_attach_iommufd_pt {
argsz: mem::size_of::<vfio_device_attach_iommufd_pt>() as u32,
flags: 0,
pt_id: s1_hwpt_id,
};
vfio_syscall::attach_device_iommufd_pt(&self.device, &mut attach_data)?;

Ok(())
}

/// VFIO device reset only if the device supports being reset.
pub fn reset(&self) {
if self.flags & VFIO_DEVICE_FLAGS_RESET != 0 {
Expand Down
Loading