-
Notifications
You must be signed in to change notification settings - Fork 808
[SYCL][Doc] Add spec to wait on a device #20266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
Add a proposed extension specification which allows the application to wait for all commands submitted to a device to complete.
class device { | ||
// ... | ||
void ext_oneapi_wait_and_throw(async_handler h); | ||
void ext_oneapi_wait_and_throw(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic of these operations is close to CUDA but not exact. I think the precise definition of cudaDeviceSynchronize
is that it waits for all tasks submitted to the current context of the current device to complete. (Credit to Jinghui for noticing this a couple weeks ago.) Do we think it's important for migrating CUDA code to have this same semantic?
If so, I think we could:
- Add a new overload
ext_oneapi_wait_and_throw(const context& ctxt, async_handler h)
, which waits only for commands submitted to this device AND that context to complete. - Redefine the other two overloads to mean that they wait only for commands submitted to this device AND its default context to complete.
This would require changing the proposed Level Zero API to also take a context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the CUDA adapter, the device owns the CUDA context, i.e. there is only one CUcontext
per device, which is then set as the active context whenever necessary. As such, I think the current design matches the behavior quite well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really say anything, though, about how people use the CUDA API or what semantics they would want in SYCL in order to migrate their code to SYCL. I think you are just pointing out that the currently proposed API would be easy to implement on our current design of the UR CUDA adaptor.
After discussing various designs, we think we will need to keep a list of unconsumed async errors in the device object anyway. Therefore, it won't be hard to implement the same `wait`, `wait_and_throw`, and `throw_asynchronous` APIs as `queue`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. I believe we can implement this, following some fixes to the current behavior.
@intel/llvm-gatekeepers please consider merging |
This commit makes the following changes to the behavior of asynchronous exception handling: 1. The death of a queue should not consume asynchronous exceptions. 2. Calling wait_and_throw on an event after the associated queue has died should still consume exceptions that were originally associated with the queue. This should respect the async_handler priority to the best of its ability. 3. Calling wait_and_throw or throw_asynchronous on a queue without an async_handler should fall back to using the async_handler of the associated context, then the default async_handler if none were attached to the context. Additionally, this lays the ground work for intel#20266 by moving the tracking of unconsumed asynchronous exception to the devices. Signed-off-by: Larsen, Steffen <[email protected]>
This commit makes the following changes to the behavior of asynchronous exception handling: 1. The death of a queue should not consume asynchronous exceptions. 2. Calling wait_and_throw on an event after the associated queue has died should still consume exceptions that were originally associated with the queue. This should respect the async_handler priority to the best of its ability. 3. Calling wait_and_throw or throw_asynchronous on a queue without an async_handler should fall back to using the async_handler of the associated context, then the default async_handler if none were attached to the context. Additionally, this lays the ground work for intel#20266 by moving the tracking of unconsumed asynchronous exception to the devices. Signed-off-by: Larsen, Steffen <[email protected]>
@intel/llvm-gatekeepers please consider merging |
Add a proposed extension specification which allows the application to wait for all commands submitted to a device to complete.