Skip to content

Conversation

@davidkleymann
Copy link

@davidkleymann davidkleymann commented Sep 3, 2025

Summary of the PR

Add support for dirty page tracking via the Dirty ring interface. Adds KvmDirtyLogRing structure for keeping track of the indices and the base pointer to the shared memory buffer. Implements iterating over dirty pages, thereby harvesting them. Implements reset_dirty_rings on VmFd to trigger recycling of dirty ring buffer elements by the kernel after processing. Adds the dirty_log_ring field to VcpuFd.

This is a draft that needs some review and improvements, I'm hereby asking for suggestions for improving the following remaining weaknesses:

  1. If any vCPU is running while harvesting dirty pages, the user of KvmDirtyLogRing needs to call reset_dirty_rings before reading contents of dirty pages. This is currently the users responsibility, which allows for calling reset_dirty_rings after all dirty pages have been read by the VMM for cases where all vCPUs are halted, but this may be a confusing interface for other cases.

More info on the interface:
https://www.kernel.org/doc/html/latest/virt/kvm/api.html#kvm-cap-dirty-log-ring-kvm-cap-dirty-log-ring-acq-rel

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

Copy link
Member

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please squash your commits so that we do not have later commits be fixups for previous commits.

@roypat
Copy link
Member

roypat commented Sep 9, 2025

Also, do we need to do anything about KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP?

@roypat
Copy link
Member

roypat commented Sep 9, 2025

Send and Sync are probably not safe on KvmDirtyLogRing because accesses are not stateless, due to the state we need to keep track of in user space (next_dirty)

As for this, next_dirty doesnt imply that it cannot be Send/Sync, because the traits dont mean that multiple threads can access the struct at the same time (thats always unsound in Rust's memory model), just that you can transfer ownership between threads, and that read-only references can be shared between threads. A problem would only arrive if we can also Clone this structure, because then the iterator can cause data races inside the mmap'd area from different Rust threads. But as long as for each vcpu we only allow creating a single instance of the iterator (using safe code that is), there are no problems.

Now, whether Send and Sync will actually be useful is a different matter, because I'm assuming you want them to be able to harvest the dirty ring while the vcpu is still running, but with the current API in this PR, you can only get a &mut reference to the ring buffer structure, which you cannot send to another thread anyway. So in this case, you'd need some API that lets you take ownership of the KvmDirtyRingLog structure (maybe store the one that gets created at vcpu creation time in an option, and then have a function that just wraps .take() on that option, keeping in mind that we must never allow safe code to get two owned versions of it for the same vcpu).

@roypat
Copy link
Member

roypat commented Sep 9, 2025

Also sorry for the late response, we were all preparing for / traveling to KVM Forum last week!

@davidkleymann
Copy link
Author

davidkleymann commented Oct 7, 2025

Also, do we need to do anything about KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP?

Turns out we do, and if it is supported, we also need to enable KVM_CAP_DIRTY_LOG_RING_ACQ_REL prior to enabling _WITH_BITMAP. Correct me if I'm wrong.

@davidkleymann davidkleymann force-pushed the main branch 5 times, most recently from 66f0d63 to 362f0f5 Compare October 7, 2025 11:33
@davidkleymann
Copy link
Author

Why is enable_cap not available in aarch64?
#[cfg(any(target_arch = "x86_64", target_arch = "s390x", target_arch = "powerpc"))]

@davidkleymann davidkleymann force-pushed the main branch 8 times, most recently from 11c85a5 to ec0da8b Compare October 7, 2025 12:10
@roypat
Copy link
Member

roypat commented Oct 8, 2025

Why is enable_cap not available in aarch64? #[cfg(any(target_arch = "x86_64", target_arch = "s390x", target_arch = "powerpc"))]

according to the docs, it doesn't exist on arm: https://docs.kernel.org/virt/kvm/api.html#kvm-enable-cap . But it seems the docs are wrong there (they're not, there's two ioctls, one for vcpus and one for vms, with the vcpu one being arch-specific, and the VM one documented as "architectures: all"), because there is a generic implementation in the kernel for some capabilities (including the dirty log related ones) which is available on all arches. So kvm-ioctls is doing it wrong (probably whoever wrote it also misread the docs). Can you change Vm::enable_cap to be available always (e.g. without any cfgs) and add a changelog entry under "fixed"?

@davidkleymann davidkleymann force-pushed the main branch 3 times, most recently from b00ead3 to 483342a Compare October 18, 2025 20:52
@davidkleymann
Copy link
Author

The enable_cap doctest uses KVM_CAP_SPLIT_IRQCHIP as an example, which is x86 only. This breaks the test and I disabled the enable_cap call in the test on other arches.

@davidkleymann
Copy link
Author

Send and Sync are probably not safe on KvmDirtyLogRing because accesses are not stateless, due to the state we need to keep track of in user space (next_dirty)

As for this, next_dirty doesnt imply that it cannot be Send/Sync, because the traits dont mean that multiple threads can access the struct at the same time (thats always unsound in Rust's memory model), just that you can transfer ownership between threads, and that read-only references can be shared between threads. A problem would only arrive if we can also Clone this structure, because then the iterator can cause data races inside the mmap'd area from different Rust threads. But as long as for each vcpu we only allow creating a single instance of the iterator (using safe code that is), there are no problems.

Now, whether Send and Sync will actually be useful is a different matter, because I'm assuming you want them to be able to harvest the dirty ring while the vcpu is still running, but with the current API in this PR, you can only get a &mut reference to the ring buffer structure, which you cannot send to another thread anyway. So in this case, you'd need some API that lets you take ownership of the KvmDirtyRingLog structure (maybe store the one that gets created at vcpu creation time in an option, and then have a function that just wraps .take() on that option, keeping in mind that we must never allow safe code to get two owned versions of it for the same vcpu).

I needed Send and Sync to move the Vcpus to separate threads, but I only implemented what the rust compiler suggested and am still looking for a better solution for this unrelated problem.

@davidkleymann davidkleymann force-pushed the main branch 5 times, most recently from 6ccb6da to 1d09123 Compare October 22, 2025 10:14
@roypat
Copy link
Member

roypat commented Oct 28, 2025

Everything should be ready

Please still squash your commits (and there's also conflicts, although GitHub isn't telling me what they are for some reason 🤔)

@davidkleymann
Copy link
Author

So you want everything squashed into 1 commit?

@roypat
Copy link
Member

roypat commented Nov 2, 2025

So you want everything squashed into 1 commit?

I will tentatively say "yes" (sorry, been a while since I looked at the actual changes here). But please have a look and see if there's anything that can be logically separated into their own commits (e.g. preliminary refactors, or changes to tests).

@davidkleymann
Copy link
Author

I did change quite a few things so I already rebased it to seperate the commits logically, but if you require a single large commit I can change it as well. I can't really see the conflicts either, I suspect it is just because the commit introducing the DirtyLogRing cap has already been merged to main, while this PR contains my unmerged one.

@davidkleymann davidkleymann force-pushed the main branch 3 times, most recently from ec352ad to 851fe82 Compare November 11, 2025 15:48
@davidkleymann
Copy link
Author

@roypat I squashed everything as requested.

@davidkleymann davidkleymann force-pushed the main branch 2 times, most recently from 84007a9 to 3449081 Compare November 11, 2025 15:57
@zolutal
Copy link

zolutal commented Nov 21, 2025

Hi @davidkleymann , thanks for your work on this feature! I was testing it out and noticed that no VcpuExit is currently defined for KVM_EXIT_DIRTY_RING_FULL. Currently you just get a VcpuExit::Unsupported(31) when the ring buffer fills up. If it doesn't make it in this PR I can make one myself to add it, but I figured I'd mention it here.

@roypat
Copy link
Member

roypat commented Nov 24, 2025

Hi @davidkleymann , thanks for your work on this feature! I was testing it out and noticed that no VcpuExit is currently defined for KVM_EXIT_DIRTY_RING_FULL. Currently you just get a VcpuExit::Unsupported(31) when the ring buffer fills up. If it doesn't make it in this PR I can make one myself to add it, but I figured I'd mention it here.

that is a good catch, thanks! This should definitely be added in this PR :)

davidkleymann and others added 11 commits November 24, 2025 13:12
Implement dirty log ring interface with `enable_dirty_log_ring` and
`dirty_log_ring_iter` methods. Enable `VmFd` `enable_cap` and ioctl imports
on all architectures. Add memory fences in iterator for proper
synchronization on weak memory consistency architectures.

Signed-off-by: David Kleymann <[email protected]>
Bumps [rust-vmm-ci](https://github.com/rust-vmm/rust-vmm-ci) from `0b1cb86` to `fc4584d`.
- [Commits](rust-vmm/rust-vmm-ci@0b1cb86...fc4584d)

---
updated-dependencies:
- dependency-name: rust-vmm-ci
  dependency-version: fc4584d8b805be4e8193d43d9ba6519f9a0f43a5
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Make the function more fail-safe.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
No longer needed since the Rust edition 2024 upgrade.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
No longer needed since the Rust edition 2024 upgrade.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
Failed because of needless fn main() in doctest

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
Turns out that there is a flaw in upstream kvm-bindings that
wasn't caught so far. Some parts of the serde feature also need
the fam-wrappers feature.

This was first discovered here [0] [1].

[0] rust-vmm#350
[1] https://buildkite.com/rust-vmm/kvm-ci/builds/1463/steps/canvas?jid=01995738-26ed-4e8c-9e14-c3f23d6de434

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
Bumps [rust-vmm-ci](https://github.com/rust-vmm/rust-vmm-ci) from `fc4584d` to `6f7d365`.
- [Commits](rust-vmm/rust-vmm-ci@fc4584d...6f7d365)

---
updated-dependencies:
- dependency-name: rust-vmm-ci
  dependency-version: 6f7d3656a84b771f381ddf0b120062dcbacb4c04
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Two bare links and two broken intra-doc links.

Signed-off-by: Patrick Roy <[email protected]>
They were all related to testing specific features, but since the CI by
default now uses cargo-all-features, this is redundant.

Leave custom-tests.json around, as removing it would mean we also have
to modify the buildkite pipeline definition itself, and that's always a
bit messy to coordinate with the merging of PRs.

Signed-off-by: Patrick Roy <[email protected]>
Test cases introduced by SBI related changes require SBI_V01 to be
enabled during kernel compilation. At the point of `042b206` already
incorporates `c41370c`, which is capable of covering this case.

Signed-off-by: Ruoqing He <[email protected]>
@davidkleymann
Copy link
Author

davidkleymann commented Nov 24, 2025

I can't re-request review, but I added the VcpuExit @zolutal @roypat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants