Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 213 additions & 0 deletions sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
= sycl_ext_oneapi_record_event

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]
:endnote: —{nbsp}end{nbsp}note

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2025 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

This extension is written against the SYCL 2020 revision 10 specification.
All references below to the "core SYCL specification" or to section numbers in
the SYCL specification refer to that revision.

This extension also depends on the following other SYCL extensions:

* link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[
sycl_ext_oneapi_enqueue_functions]


== Status

This is a proposed extension specification, intended to gather community
feedback.
Interfaces defined in this specification may not be implemented yet or may be in
a preliminary state.
The specification itself may also change in incompatible ways before it is
finalized.
*Shipping software products should not rely on APIs defined in this
specification.*


== Overview

This extension adds the ability to reuse the same `event` object in multiple
command submissions, rather than creating a new event for each submission.
This pattern may perform better on some implementations because fewer event
objects need to be created and destroyed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergey-semenov are you familiar with the way events are managed in the DPC++ SYCL runtime? If so, I'd be interested in your thoughts on whether this proposed API would allow us to implement events more efficiently. I have an intuition that it will help, but I don't know how the code really works today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we're not doing anything fancy with events. it's all very straightforward with no reuse of memory whatsoever. This extension would introduce such memory reusage, but we could also just implement a memory pool for events internally, so it shouldn't be the main point here.

This should allow us to get rid of UR event creation/release though, assuming UR and its adapters can provide and take advantage of enqueue API that reuses a UR event.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider counter based event? If not, the re-record will be hard to consider.

The pattern may also be more familiar to users porting CUDA code to SYCL.


== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro `SYCL_EXT_ONEAPI_RECORD_EVENT` to one of the values defined in the table
below. Applications can test for the existence of this macro to determine if
the implementation supports this feature, or applications can test the macro's
value to determine which of the extension's features the implementation
supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|The APIs of this experimental extension are not versioned, so the
feature-test macro always has this value.
|===

=== New kernel launch property

This extension adds a new kernel launch property:

[source,c++]
----
namespace sycl::ext::oneapi::experimental {

struct record_event {
record_event(event* evt); (1)
};
using record_event_key = record_event;

} // namespace sycl::ext::oneapi::experimental
----

This property may be passed as a launch property to the following command
submission functions from
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[
sycl_ext_oneapi_enqueue_functions]:

* The `submit` overload that takes parameters of type `queue` and `Properties`.
* The `single_task` overloads that take parameters of type `queue` and
`Properties`.
* The `parallel_for` overloads that take parameters of type `queue` and
`launch_config`.
* The `nd_launch` overloads that take parameters of type `queue` and
`launch_config`.
* The `memcpy` overload that takes parameters of type `queue` and `Properties`.
* The `copy` overload that takes parameters of type `queue` and `Properties`.
* The `memset` overload that takes parameters of type `queue` and `Properties`.
* The `fill` overload that takes parameters of type `queue` and `Properties`.
* The `prefetch` overload that takes parameters of type `queue` and
`Properties`.
* The `mem_advise` overload that takes parameters of type `queue` and
`Properties`.
* The `barrier` overload that takes parameters of type `queue` and `Properties`.
* The `partial_barrier` overload that takes parameters of type `queue` and
`Properties`.
* The `execute_graph` overload that takes parameters of type `queue` and
`Properties`.

_Effects (1)_: Constructs a `record_event` property with a pointer to an `event`
object.
When `evt` is not null, the following happens.
The status of the event is disassociated with any previously submitted command,
and its status is reset to `info::event_command_status::submitted`.
For the `submit` function, this happens when the command group function returns
back to `submit`.
The event is then associated with the newly submitted command.
Assuming the event remains associated with this command, the event's status
changes according to the execution status of that command.
When `evt` is null, the property has no effect on the command submission.

_Remarks:_

* If a recorded event is used as a command dependency for some other command
_C2_ (e.g. via `handler::depends_on`), the dependency is captured at the point
when _C2_ is submitted.
The dependency does _not_ change if the event is subsequently overwritten via
`record_event`.
Comment on lines +149 to +153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that supposed to work in scenarios where C2 submission to UR is delayed on the SYCL runtime side (current implementation of host tasks, for example)?

I think technically we could create a "proxy" event to capture the "state" of the original event at the point of C2 submission to SYCL. But then in order to support this extension, we would have to do that any time a command is delayed, which seems problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would not be a problem. The API allows the runtime to reuse the backend event, but it does not mandate this. Therefore, if there are extenuating circumstances in some cases, the runtime is still allowed to create a new backend event if it wants.

In the case you describe, couldn't the runtime do this:

  • Drop the reference to the existing backend event. This will get cleaned up however we currently clean up backend events.
  • Create a new backend event and associate it with the SYCL event the user passes in.
  • When the runtime eventually submits command C2 to the backend, it uses the backend event it created above in step 1.

We'll have to do the same sort of thing in other cases too. For example, consider the case when the event passed in via record_event is from a different backend than the new command being submitted. In this case also, the runtime will need to drop the reference to the old backend event and create a new one. This is the situation I describe below under "Implementation notes".


* If another host thread is blocked in a call to `event::wait` when that same
event is associated with a new command via `record_event`, it is unspecified
whether the call to `event:wait` unblocks.


== Example

[source,c++]
----
#include <sycl/sycl.hpp>
namespace syclex = sycl::ext::oneapi::experimental;

int main() {
sycl::queue q1;
sycl::queue q2;
sycl::event e;
sycl::range r{GLOBAL};

// Launch a command and record an event which tracks its completion.
syclex::launch_config cfg{r, syclex::record_event{&e}};
syclex::parallel_for(q1, cfg, [=](sycl::item<> it) { /* ... */ });

// Launch another command which depends on that event and also
// record completion of this new command using the same event.
syclex::submit(q2, syclex::record_event{&e}, [&](sycl::handler cgh) {
cgh.depends_on(e);
syclex::parallel_for(cgh, r, [=](sycl::item<> it) { /* ... */ });
});

// Wait for both commands to complete.
e.wait();
}
----


== Implementation notes

It is expected that the implementation will often be able to reuse the
underlying backend event object when a SYCL event is passed to `record_event`.
However, there will still be cases when the implementation needs to release the
underlying backend event and create a new one.
For example, this will happen when the existing backend event is from a
different backend or from a different context than the command being submitted.
In these cases, we expect that the implementation will release the backend event
and associate the SYCL event with a new backend event.


== Issues

* Is it possible to implement the behavior specified above regarding
`event::wait` and `record_event`?
What if the implementation needs to release the backend event when another
host thread is blocked in a call to `event:wait`?
Can we guarantee that the call to `event::wait` either remains blocked or
becomes unblocked?
(Either is fine.)
Or, is it possible that this will lead to a crash?
If a crash is possible, we need to weaken the specification to say this
condition is UB.