RHEL-149878: viostor: Cleanups around msix_one_vector and MessageId#1503
RHEL-149878: viostor: Cleanups around msix_one_vector and MessageId#1503YanVugenfirer merged 5 commits intovirtio-win:masterfrom
Conversation
|
[Jenkins CI]: Can one of the admins verify this patch? |
Summary of ChangesHello @kevmw, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the clarity and robustness of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly refactors MSIX vector handling and the device reset mechanism, replacing msix_one_vector with msix_config_vector and introducing helper functions like QueueToMessageId and MessageToDpcIdx for better maintainability. The reset logic has been improved with reset_in_progress_count and VioStorResetBus to prevent I/O during resets. However, a critical security regression was identified: the validation of MessageID in the interrupt handling logic is too loose when the configuration vector is disabled. This could lead to out-of-bounds memory access or NULL pointer dereferences. It is crucial to tighten the validation in VirtIoMSInterruptRoutine to ensure MessageID is strictly within the expected range for all configuration modes to prevent this vulnerability and ensure the overall stability and correctness of the driver.
|
I like the approach here, Kevin. Could you consider my kvm-guest-drivers-windows/viostor/virtio_stor_hw_helper.c Lines 678 to 747 in 7229eab Anyway, I'm off to the land of nod... |
|
@benyamin-codez I don't think your changes are in conflict with mine, except that the |
For adaptExt->msix_one_vector = TRUE, the current implementation relies on the fact that effectively only the first queue is used: VioStorCompleteRequest() is only ever called with MessageID = 1, so it checks for completions only in the first queue. The reason why this assumption actually holds true is not obvious: num_queues will match the number of available msix_vectors, which may be more than 1. However, the StorPortInitializePerfOpts() call that would tell StorPort to use all of these vectors fails because perfData.LastRedirectionMessageNumber is msix_vectors + 1 and therefore out of bounds. Without a successful StorPortInitializePerfOpts() call, StorPortGetStartIoPerfParams() will just use MessageNumber = 0 for all requests. Don't rely on such subtle behaviour and instead make it explicit that we can only use a single queue for msix_one_vector = TRUE. (This restriction may be lifted later, but it matches what the code does currently.) Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The queue number is where everything starts, MessageID is just derived from it. The reverse mapping won't be as trivial as the current QueueNumber = MessageID - 1 once we start using the real vector number for msix_one_vector or even map multiple queues to a single vector, so store the queue number instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit b1ed558 move an '#include "virtio_stor_hw_helper.h"' from virtio_stor.c to the main header file virtio_stor.h for no real reason. Unfortunately, virtio_stor_hw_helper.h already included virtio_stor.h, so depending on the inclusion order, declarations from one header file are unavailable in the other. Move the #include back to virtio_stor.c so that virtio_stor_hw_helper.h can use declarations from virtio_stor.h again. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| if (adaptExt->msix_one_vector) | ||
| if (adaptExt->msix_config_vector && MessageID == VIRTIO_BLK_MSIX_CONFIG_VECTOR) | ||
| { | ||
| MessageID = 1; | ||
| } | ||
| else | ||
| { | ||
| if (MessageID == VIRTIO_BLK_MSIX_CONFIG_VECTOR) | ||
| { | ||
| RhelGetDiskGeometry(DeviceExtension); | ||
| adaptExt->sense_info.senseKey = SCSI_SENSE_UNIT_ATTENTION; | ||
| adaptExt->sense_info.additionalSenseCode = SCSI_ADSENSE_PARAMETERS_CHANGED; | ||
| adaptExt->sense_info.additionalSenseCodeQualifier = SCSI_SENSEQ_CAPACITY_DATA_CHANGED; | ||
| adaptExt->check_condition = TRUE; | ||
| DeviceChangeNotification(DeviceExtension, TRUE); | ||
| return TRUE; | ||
| } | ||
| RhelGetDiskGeometry(DeviceExtension); | ||
| adaptExt->sense_info.senseKey = SCSI_SENSE_UNIT_ATTENTION; | ||
| adaptExt->sense_info.additionalSenseCode = SCSI_ADSENSE_PARAMETERS_CHANGED; | ||
| adaptExt->sense_info.additionalSenseCodeQualifier = SCSI_SENSEQ_CAPACITY_DATA_CHANGED; | ||
| adaptExt->check_condition = TRUE; | ||
| DeviceChangeNotification(DeviceExtension, TRUE); | ||
| return TRUE; | ||
| } |
There was a problem hiding this comment.
I've been very curious how we do this when msix_config_vector = FALSE.
Is it just not possible? A limitation of running with only a single vector?
There was a problem hiding this comment.
I didn't explicitly test it, but I think it will just fall back to the non-MSI codepath in VirtIoInterrupt().
There was a problem hiding this comment.
Ok. I'll do some explicit testing to confirm.
There was a problem hiding this comment.
This appears broken for me, even on master - in vioscsi too (nothing hits the events queue)...
When trying to do a size change, I don't get an interrupt at all, definitely not on MessageId 0 anyway. Disk refresh and rescan in diskmgmt.msc don't produce any changes. The only way I can hit DeviceChangeNotification() is via the VirtIoStartIo() pathway on surprise removal. On reattaching the disk, the capacity changes.
This is on Proxmox at least... I noticed the same thing on a FreeBSD guest too in the last 12 hours.
# pveversion -v | grep qemu | grep -v backup
pve-qemu-kvm: 10.1.2-5
qemu-server: 9.1.3
# kvm --version
QEMU emulator version 10.1.2 (pve-qemu-kvm_10.1.2-5)
# qm showcmd <VMID> --pretty | grep machine
-machine 'hpet=off,type=pc-q35-10.1+pve0' \Maybe this is a QEMU bug. Any chance you could please check on a RH flavour?
There was a problem hiding this comment.
@kevmw - Just to be clear, this was with the config vector enabled.
There was a problem hiding this comment.
It seems like a lot of noise around the real change. But more importantly, does this really work? You still seem to use NO_VECTOR for config changes, so I don't think we'd ever get the config interrupt. Ah, or does it work only accidentally because the next queue interrupt might still see the old ISR bit and do the config update then?
This is the fix I have: kevmw/kvm-guest-drivers-windows@msg-cleanup...notify-config (based on this PR, so I haven't created a PR yet - still not sure how to handle dependent PRs best in Github, it doesn't seem to be supported very well).
There was a problem hiding this comment.
Yeah, mine works too because it still is MessageId = 0...
... and then I only skip config update if the ISR bit-field says it's for a queue vector.
In any case, I like your proposal:
- Proper implementation of the ISR Status bit-field checks avoids potential
0x01b\0x10bproblems. - The
u8forisrinVirtIoMSI...()is also the right type (although we seem to preferUCHARhere). - The new
VirtIoConfigUpdated()routine you carved out cleans it up too.
If you want to run with this, I might do a relocation refactor (to group the ISR routines together) plus introduce an MSI-X ISR worker routine on top of your PR. I can also update intReason in VirtIoInterrupt() to use the UCHAR instead of ULONG - unless you want to do it here. Some standardisation on relevant local variable names would be good, e.g. I used isr_status for intReason / isr and then irq_serviced rather than isInterruptServiced.
Alternatively, I could introduce your improvements into mine and do it in one step.
Let me know what you prefer.
There was a problem hiding this comment.
I think I'd like to go with my notify-config branch, once this PR is merged.
For the additional changes, maybe let's ask @YanVugenfirer first, because I understood that he doesn't like commits just for renaming things etc. My own approach is to just clean up the code that I'm touching anyway. But if there is something small we agree on (like moving both interrupt handlers next to each other), I can include it in my branch, too. In this case, you could rebase and then I could cherry-pick from your branch so that you're still credited as the author, but we get everything done in a single PR.
As for VirtIoMSInterruptWorker(), I'm not sure what the purpose of that one is. To me it looks like an arbitrary split of VirtIoMSInterruptRoutine(), which wasn't even a very long function?
In general, instead of trying to make the code a bit prettier here and there, I feel that maybe we should focus more on the things that have practical effects e.g. on performance. The locking changes you mentioned the other day would be something that seems interesting to get separated into a commit and benchmarked. Maybe we should try and coordinate on Slack which things to address next?
There was a problem hiding this comment.
I am OK with commits that renames stuff, but let's make them part of something useful. I agree regarding the coordination first.
There was a problem hiding this comment.
I will rebase ASAP, and now this has merged, my other PRs too.
|
I pushed a rebased version now that #1493 has been merged and it's no longer just a draft. |
PR virtio-win#1502 - [viostor] Use ULONG for num_queues PR virtio-win#1503 - viostor: Cleanups around msix_one_vector and MessageId These PRs will be dropped in rebase once they merge. Signed-off-by: benyamin-codez <115509179+benyamin-codez@users.noreply.github.com>
PR virtio-win#1502 - [viostor] Use ULONG for num_queues PR virtio-win#1503 - viostor: Cleanups around msix_one_vector and MessageId These PRs will be dropped in rebase once they merge. Signed-off-by: benyamin-codez <115509179+benyamin-codez@users.noreply.github.com>
MSI vector 0 is the vector that is actually used for msix_one_vector, but the code artificially used MessageId = 1 instead so that conversions between vector, queue number and index in the adaptExt->dpc array can stay the same both when msix_one_vector is TRUE and when it's FALSE. This convention is very confusing, though, because it misleads the reader into thinking that vector 1 is what is actually used. Switch to the more accurate MessageId = 0 instead and add helper functions for the conversions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
What makes msix_one_vector really different from other cases in the code is just that it means that MSI isn't used for config changes and so the vector indices move by one (because the config vector is 0 and all the queue vectors start from 1 then, whereas without the config vector, the vectors for queues already start from 0). If we wanted, we could have multiple vectors in use even without a config vector, and that's actually a very good idea when we have less vectors than num_queues + 1. msix_one_vector isn't a very good name any more then. So let's have an adaptExt->msix_has_config_vector instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
The only active conversations here don't seem to be about this PR itself any more, but more about possible future work on top of it. Does this mean we should move ahead and merge this? |
As previous discussions have shown, the way msix_one_vector works (and especially that it uses MessageId = 1 in the code while it actually uses vector 0) is quite confusing. In fact, it almost looks like it works only accidentally, not by design. This PR cleans up the code to make it more obvious how things work. A functional change is not intended.
There is a clear next step after this: Add support for mapping multiple queues to a single vector (primarily to
VioStorCompleteRequest()) and updateVirtIoHwInitialize()to make use of all available vectors (and give up the config vector first) instead of immediately falling back to a single queue mode if they don't match. However, as this is essentially feature work, I decided to keep it separate from the cleanups. If you'd prefer to have a single PR for the full thing, let me know.This is a draft because it's based on the current, non-final version of #1493