Skip to content

Conversation

@npmiller
Copy link
Contributor

@npmiller npmiller commented Mar 25, 2025

The CUDA and HIP adapters are both using a nearly identical complicated queue that handles creating an out-of-order UR queue from in-order CUDA/HIP streams.

This patch extracts all of the queue logic into a separate templated class that can be used by both adapters. Beyond removing a lot of duplicated code, it also makes it a lot easier to maintain.

There was a few functional differences between the queues in both adapters, but mostly due to fixes done in the CUDA adapter that were not ported to the HIP adapter. There might be more but I found at least one race condition (#15100) and one performance issue (#6333) that weren't fixed in the HIP adapter.

This patch uses the CUDA version of the queue as a base for the generic queue, and will thus fix for HIP the race condition and performance issue mentioned above.

This code is quite complex, so this patch also aimed to minimize any other changes beyond the structural changes needed to share the code. However it did do the following changes in the two adapters:

stream_queue.hpp:

  • Remove urDeviceRetain/Release: essentially a no-op

CUDA:

  • Rename ur_stream_guard_ to ur_stream_guard
  • Rename getNextEventID to getNextEventId
  • Remove duplicate get_device getter, use getDevice instead

HIP:

  • Fix queue finish so it doesn't fail when no streams need to be synchronized

The CUDA and HIP adapters are both using a nearly identical complicated
queue that handles creating an out-of-order UR queue from in-order
CUDA/HIP streams.

This patch extracts all of the queue logic into a separate templated
class that can be used by both adapters. Beyond removing a lot of
duplicated code, it also makes it a lot easier to maintain.

There was a few functional differences between the queues in both
adapters, but mostly due to fixes done in the CUDA adapter that were not
ported to the HIP adapter. There might be more but I found at least one
race condition (intel#15100) and one
performance issue (intel#6333) that weren't
fixed in the HIP adapter.

This patch uses the CUDA version of the queue as a base for the generic
queue, and will thus fix for HIP the race condition and performance
issue mentioned above.

This code is quite complex, so this patch also aimed to minimize any
other changes beyond the structural changes needed to share the code.
However it did do the following changes in the two adapters:

CUDA:

* Rename `ur_stream_guard_` to `ur_stream_guard`
* Rename `getNextEventID` to `getNextEventId`
* Remove duplicate `get_device` getter, use `getDevice` instead

HIP:

* Fix queue finish so it doesn't fail when no streams need to be
  synchronized
Capturing the result is no longer needed
LastSyncComputeStreams{0}, LastSyncTransferStreams{0}, Flags(Flags),
URFlags(URFlags), Priority(Priority), HasOwnership{BackendOwns} {
urContextRetain(Context);
urDeviceRetain(Device);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeviceRetain/Release can be removed, it's a no-op unless we expect Device to be a subdevice (I have a ticket open to rename and better document those entry points)

@aarongreig aarongreig changed the title [UR][CUDA][HIP] Unifiy queue handling between adapters [UR][CUDA][HIP] Unify queue handling between adapters Mar 26, 2025
Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CODEOWNERS LGTM

@npmiller
Copy link
Contributor Author

npmiller commented Apr 2, 2025

@intel/llvm-gatekeepers I believe this is ready to merge

  • Jenkins/Precommit: CI failed to start properly (it passed in the previous run before I merged the sycl branch, so should be fine)
  • PVC: issue with the PVC node (no gpu found)
  • Arc: issue with the Arc node (no gpu found)

And this patch only affects CUDA and HIP, so missing PVC and Arc testing shouldn't be an issue.

@sarnex sarnex merged commit 24b7bc3 into intel:sycl Apr 2, 2025
38 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants