Skip to content

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Oct 1, 2025

Add a proposed extension specification which allows the application to wait for all commands submitted to a device to complete.

Add a proposed extension specification which allows the application to
wait for all commands submitted to a device to complete.
@gmlueck gmlueck requested a review from a team as a code owner October 1, 2025 19:26
class device {
// ...
void ext_oneapi_wait_and_throw(async_handler h);
void ext_oneapi_wait_and_throw();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic of these operations is close to CUDA but not exact. I think the precise definition of cudaDeviceSynchronize is that it waits for all tasks submitted to the current context of the current device to complete. (Credit to Jinghui for noticing this a couple weeks ago.) Do we think it's important for migrating CUDA code to have this same semantic?

If so, I think we could:

  • Add a new overload ext_oneapi_wait_and_throw(const context& ctxt, async_handler h), which waits only for commands submitted to this device AND that context to complete.
  • Redefine the other two overloads to mean that they wait only for commands submitted to this device AND its default context to complete.

This would require changing the proposed Level Zero API to also take a context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CUDA adapter, the device owns the CUDA context, i.e. there is only one CUcontext per device, which is then set as the active context whenever necessary. As such, I think the current design matches the behavior quite well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really say anything, though, about how people use the CUDA API or what semantics they would want in SYCL in order to migrate their code to SYCL. I think you are just pointing out that the currently proposed API would be easy to implement on our current design of the UR CUDA adaptor.

After discussing various designs, we think we will need to keep a list
of unconsumed async errors in the device object anyway.  Therefore, it
won't be hard to implement the same `wait`, `wait_and_throw`, and
`throw_asynchronous` APIs as `queue`.
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. I believe we can implement this, following some fixes to the current behavior.

Copy link
Contributor

github-actions bot commented Oct 3, 2025

@intel/llvm-gatekeepers please consider merging

steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Oct 3, 2025
This commit makes the following changes to the behavior of asynchronous
exception handling:

 1. The death of a queue should not consume asynchronous exceptions.
 2. Calling wait_and_throw on an event after the associated queue has
    died should still consume exceptions that were originally associated
    with the queue. This should respect the async_handler priority to
    the best of its ability.
 3. Calling wait_and_throw or throw_asynchronous on a queue without an
    async_handler should fall back to using the async_handler of the
    associated context, then the default async_handler if none were
    attached to the context.

Additionally, this lays the ground work for
intel#20266 by moving the tracking of
unconsumed asynchronous exception to the devices.

Signed-off-by: Larsen, Steffen <[email protected]>
steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Oct 3, 2025
This commit makes the following changes to the behavior of asynchronous
exception handling:

 1. The death of a queue should not consume asynchronous exceptions.
 2. Calling wait_and_throw on an event after the associated queue has
    died should still consume exceptions that were originally associated
    with the queue. This should respect the async_handler priority to
    the best of its ability.
 3. Calling wait_and_throw or throw_asynchronous on a queue without an
    async_handler should fall back to using the async_handler of the
    associated context, then the default async_handler if none were
    attached to the context.

Additionally, this lays the ground work for
intel#20266 by moving the tracking of
unconsumed asynchronous exception to the devices.

Signed-off-by: Larsen, Steffen <[email protected]>
Copy link
Contributor

github-actions bot commented Oct 6, 2025

@intel/llvm-gatekeepers please consider merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants