-
Notifications
You must be signed in to change notification settings - Fork 791
[SYCL] do not set zeroed GlobalOffset in kernel parameters #20082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3ab05aa
to
2b0efb7
Compare
2b0efb7
to
d89391f
Compare
for "SYCL :: Basic/submit_time.cpp" fail there is an issue #20248 @intel/sycl-graphs-reviewers @intel/llvm-reviewers-runtime please review |
const bool HasOffset = NDRDesc.GlobalOffset[0] != 0 || | ||
NDRDesc.GlobalOffset[1] != 0 || | ||
NDRDesc.GlobalOffset[2] != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be dependent on the dimensionality? I.e. say I have a 2-dimensional launch, wouldn't the offset be zero of either of the first 2 dimensions are 0? Also, in a case like that, it shouldn't matter what the 3rd dimension is, right?
What I am suggesting is basically:
const bool HasOffset = NDRDesc.GlobalOffset[0] != 0 || | |
NDRDesc.GlobalOffset[1] != 0 || | |
NDRDesc.GlobalOffset[2] != 0; | |
const bool HasOffset = NDRDesc.GlobalOffset[0] != 0 && | |
(NDRDesc.Dims < 2 || NDRDesc.GlobalOffset[1] != 0) && | |
(NDRDesc.Dims < 3 || NDRDesc.GlobalOffset[2] != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right - applied in both places where "HasOffset" is calculated and additionally to the "EnforcedLocalSize" flag, which uses the same code pattern
d89391f
to
f068bce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Adapter.call_nocheck<UrApiKind::urCommandBufferAppendKernelLaunchExp>( | ||
CommandBuffer, UrKernel, NDRDesc.Dims, &NDRDesc.GlobalOffset[0], | ||
CommandBuffer, UrKernel, NDRDesc.Dims, | ||
HasOffset ? &NDRDesc.GlobalOffset[0] : nullptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification of urCommandBufferAppendKernelLaunchExp seems to suggest that passing nullptr to the global offset should return UR_RESULT_ERROR_INVALID_NULL_POINTER
. Does the UR implementation just not align with the specification here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right—I've updated the specification, so now global offset is optional. This is consistent with EnqueueKernelLaunch, where this parameter is also optional.
f068bce
to
6b54f63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Do not set a zeroed GlobalOffset in kernel parameters - the Unified Runtime layer will handle this correctly.
Note that there is already a similar optimization in https://github.com/intel/llvm/blob/sycl/sycl/source/detail/scheduler/commands.cpp#L2477