Skip to content

Commit a543e94

Browse files
author
Fabio Mestre
committed
Add design documentation
1 parent 81e0b00 commit a543e94

File tree

2 files changed

+57
-3
lines changed

2 files changed

+57
-3
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,60 @@ Backends which are implemented currently are: [Level Zero](#level-zero),
337337

338338
### Level Zero
339339

340+
The command-buffer implementation for the level-zero adapter has 2 different
341+
implementation paths which are chosen depending on the device and level-zero
342+
version:
343+
344+
- Immediate Append path - Relies on
345+
[zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
346+
to submit the command-buffer.
347+
- Wait event path - Relies on
348+
[zeCommandQueueExecuteCommandLists](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandqueueexecutecommandlists)
349+
to submit the command-buffer work. However, this level-zero function has
350+
limitations and, as such, this path is used only when the immediate append
351+
path is unavailable.
352+
353+
#### Immediate Append Path implementation details
354+
355+
This path is only available when the device supports immediate command-lists
356+
and the [zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
357+
API. This API has a `phWaitEvents` argument which allows for a cleaner and more efficient
358+
implementation than what can be achieved when using the wait-event path
359+
(see [this section](#wait-event-path-implementation-details-) for
360+
more details about the wait-event path).
361+
362+
This path relies on 3 different command-lists in order to execute the
363+
command-buffer:
364+
365+
- `ComputeCommandList` - Used to submit command-buffer work that requires
366+
the compute engine.
367+
- `CopyCommandList` - Used to submit command-buffer work that requires the
368+
[copy engine](#copy-engine). This command-list is not created when none of the
369+
nodes require the copy engine.
370+
- `EventResetCommandList` - Used to reset the level-zero events that are
371+
needed for every submission of the command-buffer. This is executed after
372+
the compute and copy command-lists have finished executing. For the first
373+
execution, this command-list is skipped since there is no need to reset events
374+
at this point. When counter-based events are enabled (i.e. the command-buffer
375+
is in-order), this command-list is not created since counter-based events do
376+
not need to be reset.
377+
378+
The following diagram illustrates which commands are executed on
379+
each command-list when the command-buffer is enqueued:
380+
![L0 command-buffer diagram](images/diagram_immediate_append.png)
381+
382+
Additionally,
383+
[zeCommandListImmediateAppendCommandListsExp](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/api.html#zecommandlistimmediateappendcommandlistsexp)
384+
requires an extra command-list which is used to submit the other
385+
command-lists. This command-list has a specific engine type
386+
associated to it (i.e. compute or copy engine). Hence, for our implementation,
387+
we need 2 of these helper command-lists:
388+
- The `CommandListHelper` command-list is used to submit the
389+
`ComputeCommandList`, `CommandListResetEvents` and profiling queries.
390+
- The `ZeCopyEngineImmediateListHelper` command-list is used to submit the
391+
`CopyCommandList`
392+
393+
#### Wait event path implementation details
340394
The UR `urCommandBufferEnqueueExp` interface for submitting a command-buffer
341395
takes a list of events to wait on, and returns an event representing the
342396
completion of that specific submission of the command-buffer.
@@ -364,7 +418,7 @@ is made only once (during the command-buffer finalization stage). This allows
364418
the adapter to save time when submitting the command-buffer, by executing only
365419
this command-list (i.e. without enqueuing any commands of the graph workload).
366420

367-
#### Prefix
421+
##### Prefix
368422

369423
The prefix's commands aim to:
370424
1. Handle the list of events to wait on, which is passed by the runtime
@@ -409,7 +463,7 @@ and another reset command for resetting the signal we use to signal the
409463
completion of the graph workload. This signal is called *SignalEvent* and is
410464
defined in the `ur_exp_command_buffer_handle_t` class.
411465

412-
#### Suffix
466+
##### Suffix
413467

414468
The suffix's commands aim to:
415469
1) Handle the completion of the graph workload and signal a UR return event.
@@ -435,7 +489,7 @@ with extra commands associated with *CB*, and the other after *CB*. These new
435489
command-lists are retrieved from the UR queue, which will likely reuse existing
436490
command-lists and only create a new one in the worst case.
437491

438-
#### Drawbacks
492+
##### Drawbacks
439493

440494
There are three drawbacks of this approach to implementing UR command-buffers for
441495
Level Zero:
138 KB
Loading

0 commit comments

Comments
 (0)