-
Notifications
You must be signed in to change notification settings - Fork 796
[UR][CUDA][HIP] Unify queue handling between adapters #17641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The CUDA and HIP adapters are both using a nearly identical complicated queue that handles creating an out-of-order UR queue from in-order CUDA/HIP streams. This patch extracts all of the queue logic into a separate templated class that can be used by both adapters. Beyond removing a lot of duplicated code, it also makes it a lot easier to maintain. There was a few functional differences between the queues in both adapters, but mostly due to fixes done in the CUDA adapter that were not ported to the HIP adapter. There might be more but I found at least one race condition (intel#15100) and one performance issue (intel#6333) that weren't fixed in the HIP adapter. This patch uses the CUDA version of the queue as a base for the generic queue, and will thus fix for HIP the race condition and performance issue mentioned above. This code is quite complex, so this patch also aimed to minimize any other changes beyond the structural changes needed to share the code. However it did do the following changes in the two adapters: CUDA: * Rename `ur_stream_guard_` to `ur_stream_guard` * Rename `getNextEventID` to `getNextEventId` * Remove duplicate `get_device` getter, use `getDevice` instead HIP: * Fix queue finish so it doesn't fail when no streams need to be synchronized
Capturing the result is no longer needed
cb3bdac to
8b178a0
Compare
| LastSyncComputeStreams{0}, LastSyncTransferStreams{0}, Flags(Flags), | ||
| URFlags(URFlags), Priority(Priority), HasOwnership{BackendOwns} { | ||
| urContextRetain(Context); | ||
| urDeviceRetain(Device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeviceRetain/Release can be removed, it's a no-op unless we expect Device to be a subdevice (I have a ticket open to rename and better document those entry points)
Device retain release is a no-op
aelovikov-intel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CODEOWNERS LGTM
|
@intel/llvm-gatekeepers I believe this is ready to merge
And this patch only affects CUDA and HIP, so missing PVC and Arc testing shouldn't be an issue. |
The CUDA and HIP adapters are both using a nearly identical complicated queue that handles creating an out-of-order UR queue from in-order CUDA/HIP streams.
This patch extracts all of the queue logic into a separate templated class that can be used by both adapters. Beyond removing a lot of duplicated code, it also makes it a lot easier to maintain.
There was a few functional differences between the queues in both adapters, but mostly due to fixes done in the CUDA adapter that were not ported to the HIP adapter. There might be more but I found at least one race condition (#15100) and one performance issue (#6333) that weren't fixed in the HIP adapter.
This patch uses the CUDA version of the queue as a base for the generic queue, and will thus fix for HIP the race condition and performance issue mentioned above.
This code is quite complex, so this patch also aimed to minimize any other changes beyond the structural changes needed to share the code. However it did do the following changes in the two adapters:
stream_queue.hpp:urDeviceRetain/Release: essentially a no-opCUDA:
ur_stream_guard_tour_stream_guardgetNextEventIDtogetNextEventIdget_devicegetter, usegetDeviceinsteadHIP: