-
Notifications
You must be signed in to change notification settings - Fork 124
Add new device descriptor to query 2D block array capabilities of the Intel GPU #2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
steffenlarsen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIP and CUDA changes look good!
nrspruit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for L0, thanks!
e5e7fec to
680a2f3
Compare
|
@oneapi-src/unified-runtime-opencl-write @oneapi-src/unified-runtime-native-cpu-write Could you please take a look. |
|
Converting to a draft for now as question was raised in intel/llvm#15905 if we need to expose this in higher level. |
10d1bd2 to
17ae982
Compare
4f092a1 to
28b1429
Compare
|
intel/llvm PR has been approved, so could you please help to merge this PR. |
28b1429 to
c79df59
Compare
|
urEnqueueKernelLaunchIncrementMultiDeviceMultiThreadTest.Success/NoUseEventsNoQueuePerThread failure is unrelated as it has been seen in other PRs as well, for example here: |
I added this to a match file: #2388 |
Add esimd device descriptor to check if 2d block operations are supported by the device. UR counterpart: oneapi-src/unified-runtime#2261
Some Intel GPU devices support 2D block array operations which may be used to optimize applications on Intel GPUs.
This extension provides a device descriptor which allows to query the 2D block array capabilities of a device.
This is exposing the following low-level compute runtime experimental extension:
https://github.com/intel/compute-runtime/blob/master/level_zero/doc/experimental_extensions/2D_BLOCK_TRANSPOSE.md
Considering that this is an experimental extension in gpu runtime level, I think it's reasonable to make it experimental at UR level as well, so I tried to follow UR documentation and added new descriptor as an experimental extension, please advice if it should be done differently.
Headers of this experimental extension are only available at compute-runtime repo and are not available here https://github.com/oneapi-src/level-zero. So, introduced changes to sparsely fetch corresponding headers from compute-runtime repo, only required headers are fetched, not entire repo.
Corresponding intel/llvm PR: intel/llvm#15905