Skip to content

I/O virtual memory (IOMMU) support #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
183 changes: 183 additions & 0 deletions src/io_memory.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
// Copyright (C) 2025 Red Hat. All rights reserved.
//
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE-BSD-3-Clause file.
//
// SPDX-License-Identifier: Apache-2.0 OR BSD-3-Clause

//! Provides a trait for virtual I/O memory.
//!
//! This trait is more stripped down than `GuestMemory` because the fragmented nature of virtual
//! memory does not allow a direct translation to long continuous regions.
//!
//! In addition, any access to virtual memory must be annotated with the intended access mode (i.e.
//! reading and/or writing).
use crate::guest_memory::Result;
use crate::{bitmap, GuestAddress, GuestMemory, MemoryRegionAddress, VolatileSlice};

/// Permissions for accessing virtual memory.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum Permissions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about defining this like:

pub struct Permissions(u8);
impl Permissions {
    pub const No: Permissions = Permissions(0);
    pub const Read: Permissions = Permissions(1);
    pub const Write: Permissions = Permissions(2);
    pub const ReadWrite: Permissions = Permissions(3);
}

and implementing & and | as just a bitwise operation on x.0? (Basically reinventing the bitflags crate to avoid introducing a new dependency)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and also reinventing it to hide the underlying implementation of the fake enum)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not dead-set; for bitfields, such a struct with constants can make more sense. On the other hand, I like using an enum if anything just to check that match is exhaustive (e.g. when we need to translate Permissions into VhostAccess).

FWIW, starting from opt-level = 2, this does get optimized properly:

   0:   48 89 f8                mov    rax,rdi
   3:   48 21 f0                and    rax,rsi
   6:   c3                      ret

As a compromise, I could do repr(u8) with bitand(x, y) { ((x as u8) | (y as u8)).try_into().unwrap() and an accompanying TryFrom<u8> implementation. Even on opt-level = 0, that would yield:

   0:   48 89 f8                mov    rax,rdi
   3:   48 89 44 24 f0          mov    QWORD PTR [rsp-0x10],rax
   8:   48 89 74 24 f8          mov    QWORD PTR [rsp-0x8],rsi
   d:   48 21 f0                and    rax,rsi
  10:   c3                      ret

(opt-level = 1 then also removes the stack accesses, as above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, I like using an enum if anything just to check that match is exhaustive (e.g. when we need to translate Permissions into VhostAccess).

Ah, you're right. Exhaustiveness is important.

Let's go for the repr, but with a private panicking fn from_repr(x: u8 -> Self) instead of the TryFrom trait.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

/// No permissions
No,
/// Read-only
Read,
/// Write-only
Write,
/// Allow both reading and writing
ReadWrite,
}

impl Permissions {
/// Check whether the permissions `self` allow the given `access`.
pub fn allow(&self, access: Self) -> bool {
*self & access == access
}
}

impl std::ops::BitOr for Permissions {
type Output = Permissions;

/// Return the union of `self` and `rhs`.
fn bitor(self, rhs: Permissions) -> Self::Output {
use Permissions::*;

match (self, rhs) {
(No, rhs) => rhs,
(lhs, No) => lhs,
(ReadWrite, _) | (_, ReadWrite) => ReadWrite,
(Read, Read) => Read,
(Read, Write) | (Write, Read) => ReadWrite,
(Write, Write) => Write,
}
}
}

impl std::ops::BitAnd for Permissions {
type Output = Permissions;

/// Return the intersection of `self` and `rhs`.
fn bitand(self, rhs: Permissions) -> Self::Output {
use Permissions::*;

match (self, rhs) {
(No, _) | (_, No) => No,
(ReadWrite, rhs) => rhs,
(lhs, ReadWrite) => lhs,
(Read, Read) => Read,
(Read, Write) | (Write, Read) => No,
(Write, Write) => Write,
}
}
}

/// Represents virtual I/O memory.
///
/// `IoMemory` is generally backed by some “physical” `GuestMemory`, which then consists for
/// `GuestMemoryRegion` objects. However, the mapping from I/O virtual addresses (IOVAs) to
/// physical addresses may be arbitrarily fragmented. Translation is done via an IOMMU.
///
/// Note in contrast to `GuestMemory`:
/// - Any IOVA range may consist of arbitrarily many underlying ranges in physical memory.
/// - Accessing an IOVA requires passing the intended access mode, and the IOMMU will check whether
/// the given access mode is permitted for the given IOVA.
/// - The translation result for a given IOVA may change over time (i.e. the physical address
/// associated with an IOVA may change).
pub trait IoMemory {
/// Underlying `GuestMemory` type.
type PhysicalMemory: GuestMemory;

/// Return `true` if `addr..(addr + count)` is accessible with `access`.
fn range_accessible(&self, addr: GuestAddress, count: usize, access: Permissions) -> bool;

/// Invokes callback `f` to handle data in the address range `[addr, addr + count)`, with
/// permissions `access`.
///
/// The address range `[addr, addr + count)` may span more than one
/// [`GuestMemoryRegion`](trait.GuestMemoryRegion.html) object, or even have holes in it.
/// So [`try_access()`](trait.IoMemory.html#method.try_access) invokes the callback 'f'
/// for each [`GuestMemoryRegion`](trait.GuestMemoryRegion.html) object involved and returns:
/// - the error code returned by the callback 'f'
/// - the size of the already handled data when encountering the first hole
/// - the size of the already handled data when the whole range has been handled
fn try_access<'a, F>(
&'a self,
count: usize,
addr: GuestAddress,
access: Permissions,
f: F,
) -> Result<usize>
where
F: FnMut(
usize,
usize,
MemoryRegionAddress,
&'a <Self::PhysicalMemory as GuestMemory>::R,
) -> Result<usize>;

/// Returns a [`VolatileSlice`](struct.VolatileSlice.html) of `count` bytes starting at
/// `addr`.
///
/// Note that because of the fragmented nature of virtual memory, it can easily happen that the
/// range `[addr, addr + count)` is not backed by a continuous region in our own virtual
/// memory, which will make generating the slice impossible.
fn get_slice(
&self,
addr: GuestAddress,
count: usize,
access: Permissions,
) -> Result<VolatileSlice<bitmap::MS<Self::PhysicalMemory>>>;

/// If this virtual memory is just a plain `GuestMemory` object underneath without an IOMMU
/// translation layer in between, return that `GuestMemory` object.
fn physical_memory(&self) -> Option<&Self::PhysicalMemory> {
None
}
}

impl<M: GuestMemory> IoMemory for M {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider just changing GuestMemory. Is anyone using try_access directly? It's not private, but almost so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that, and I put the reasoning why I decided not to do so (and indeed failed to do so) into the opening comment: The most immediate reason is that anyone accessing memory needs to specify the required access permissions. Yes, most accesses go through the Bytes interface, i.e. already specify the intended access, but not all of them:

Which gets me to the second question: There are users (like descriptor_utils used by virtiofsd or vm-virtio/virtio-queue) that currently use get_slice(). get_slice() is dangerous because it cannot cope with fragmented memory, which becomes especially apparent when using paged virtual memory. So in fact my current draft replaces those get_slice() usages by try_access().

We could of course replace get_slice() by something like get_slices() -> impl Iterator<Item = VolatileSlice>, but really, that wouldn’t offer more than the try_access() we already have, just replace the interior iterator by an exterior one. (Though I’m open to the idea.)

Besides the access mode, there’s also the general idea I outlined in the original comment that GuestMemory is quite open about how it is structured into a limited set of GuestMemoryRegion objects internally. For virtual memory, that’s no longer applicable: Pages and guest memory regions are not only conceptually different, access to single pages is also quite useless. (Whereas GuestMemoryRegion is fundamental to how the vhost code constructs the GuestMemory from the information it gets from the front-end, so it must be visible.)

If we continued to use GuestMemory, I’m afraid it will be impossible to verify at compile time that all the accesses are sound. For example, a user could try to access virtual memory’s GuestMemoryRegion objects, and that would only return an error at runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's pretty convincing. The one thing I'd change however is to always expose IoMemory and only put IommuMemory/Iommu/Iotlb under the new feature.

This way, the Bytes<GuestAddress> implementation can simply move to IoMemory.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s how it already is, though. I’m happy to keep it that way 😅 but by putting IoMemory under the iommu feature, users could remain unchanged until they need/want IOMMU support.

(The problem is that the blanket impl Bytes<_> for IoMemory plus blanket impl IoMemory for GuestMemory does not auto-extend to Bytes<_> for GuestMemory, so with the introduction of IoMemory, users need to be changed to at least import IoMemory.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd get it just by importing Bytes...

The problem with moving IoMemory to the new feature is that features are additive. You can't make the iommu feature pick whether Bytes<> is added to GuestMemory or IoMemory.

Unrelated to this, maybe we should bite the bait and add a prelude...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testcase:

lib.rs:

pub trait GuestMemory {}
pub trait IoMemory {}

impl<T: GuestMemory> IoMemory for T {}

pub trait HelloWorld {
    fn hello_world(&self);
}

impl<T: IoMemory> HelloWorld for T {
     fn hello_world(&self) {}
}

test.rs:

use lib::{GuestMemory, HelloWorld};

pub struct Foo;
impl GuestMemory for Foo {}

fn main() {
    Foo.hello_world();
}
$ rustc --crate-type=rlib lib.rs --edition=2021
$ rustc test.rs --extern lib=liblib.rlib --edition=2021

Copy link
Author

@XanClic XanClic Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It doesn’t work for vm-virtio/virtio-queue because that has a relaxed ?Sized requirement. Extending your example:

use lib::{GuestMemory, HelloWorld};

pub struct Foo;
impl GuestMemory for Foo {}

pub fn hello<T>(foo: &T)
where
    T: GuestMemory + ?Sized,
{
    foo.hello_world();
}

fn main() {
    hello(&Foo);
}

It works fine without the ?Sized.

Removing the ?Sized from virtio-queue and adding + Sized to <M as Deref>::Target: GuestMemory where that appears makes it work indeed. But I’m not sure whether they’ll be happy about that…

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great news! If I just add + ?Sized to the IoMemory for GuestMemory impl, it works:

pub trait IoMemory {
    type PhysicalMemory: GuestMemory + ?Sized;
    /* ... */
}

impl<M: GuestMemory + ?Sized> IoMemory for M {
    /* ... */
}

type PhysicalMemory = M;

fn range_accessible(&self, addr: GuestAddress, count: usize, _access: Permissions) -> bool {
if count <= 1 {
<M as GuestMemory>::address_in_range(self, addr)
} else if let Some(end) = addr.0.checked_add(count as u64 - 1) {
<M as GuestMemory>::address_in_range(self, addr)
&& <M as GuestMemory>::address_in_range(self, GuestAddress(end))
} else {
false
}
}

fn try_access<'a, F>(
&'a self,
count: usize,
addr: GuestAddress,
_access: Permissions,
f: F,
) -> Result<usize>
where
F: FnMut(
usize,
usize,
MemoryRegionAddress,
&'a <Self::PhysicalMemory as GuestMemory>::R,
) -> Result<usize>,
{
<M as GuestMemory>::try_access(self, count, addr, f)
}

fn get_slice(
&self,
addr: GuestAddress,
count: usize,
_access: Permissions,
) -> Result<VolatileSlice<bitmap::MS<M>>> {
<M as GuestMemory>::get_slice(self, addr, count)
}

fn physical_memory(&self) -> Option<&Self::PhysicalMemory> {
Some(self)
}
}
3 changes: 3 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ pub use region::{
pub mod io;
pub use io::{ReadVolatile, WriteVolatile};

pub mod io_memory;
pub use io_memory::{IoMemory, Permissions};

#[cfg(feature = "backend-mmap")]
pub mod mmap;

Expand Down