-
Notifications
You must be signed in to change notification settings - Fork 792
[UR][L0v2] add support for batched queue submissions #19769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
649dc1d
to
77ab98d
Compare
77ab98d
to
cd7dace
Compare
cd7dace
to
0caa66d
Compare
0caa66d
to
9115f1f
Compare
9115f1f
to
b2e0362
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you been able to run the SubmitKernel benchmarks? If so, can you please share results?
unified-runtime/source/adapters/level_zero/v2/command_list_manager.cpp
Outdated
Show resolved
Hide resolved
unified-runtime/source/adapters/level_zero/v2/command_list_manager.cpp
Outdated
Show resolved
Hide resolved
unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
Outdated
Show resolved
Hide resolved
unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
Outdated
Show resolved
Hide resolved
unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
Outdated
Show resolved
Hide resolved
unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp
Outdated
Show resolved
Hide resolved
c55074e
to
c9accf7
Compare
c9accf7
to
12f9354
Compare
|
||
namespace v2 { | ||
|
||
struct batch_manager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use three different styles of multi-line comments in this struct. I think the most commonly used style in the adapter codebase is:
//
//
//
But if you want to use block-style comments, do:
/*
* ...
* ...
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have unified the format in this file for multi-line comments, although it still differs slightly from your recommendation (this is the version from the formatter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the same for all the comments in the patch?
this is the version from the formatter
AFAIK clang-format's LLVM multi-line comment style is this:
/*
* first line
* second line
*/
I'd be odd if it changed it to something else.
423c951
to
704356d
Compare
704356d
to
c79c8ea
Compare
Batched queues enable submission of operations to the driver in batches, therefore reducing the overhead of submitting every single operation individually. Similarly to command buffers in L0v2, they use regular command lists (later referenced as 'batches'). Operations enqueued on regular command lists are not executed immediately, but only after enqueueing the regular command list on an immediate command list. However, in contrast to command buffers, batched queues also handle submission of batches (regular command lists) instead of only collecting enqueued operations, by using an internal immediate command list. Batched queues introduce: - batch_manager stores the current batch, the command list manager with an immediate command list for batch submissions, the vector of submitted batches, the generation number of the current batch. - The current batch is a command list manager with a regular command list; operations requested by users are enqueued on the current batch. The current batch may be submitted for execution on the immediate command list, replaced by a new regular command list and stored for execution completion in the vector of submitted batches. - The number of regular command lists stored for execution is limited. - The generation number of the current batch is assigned to events associated with operations enqueued on the given batch. It is incremented during every replacement of the current batch. When an event created by a batched queue appears in an eventWaitList, the batch assigned to the given event might not have been executed yet and the event might never be signalled. Comparing generation numbers enables determining whether the current batch should be submitted for execution. If the generation number of the current batch is higher than the number assigned to the given event, the batch associated with the event has already been submitted for execution and additional submission of the current batch is not needed. - Regular command lists use the regular pool cache type, whereas immediate command lists use the immediate pool cache type. Since user-requested operations are enqueued on regular command lists and immediate command lists are only used internally by the batched queue implementation, events are not created for immediate command lists. - wait_list_view is modified. Previously, it only stored the waitlist (as a ze_event_handle buffer created from events) and the corresponding event count in a single container, which could be passed as an argument to the driver API. Currently, the constructor also ensures that all associated operations will eventually be executed. Since regular command lists are not executed immediately, but only after enqueueing on immediate lists, it is necessary to enqueue the regular command list associated with the given event. Otherwise, the event would never be signalled. Additionally, support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo has been added for native CPU, which is required by the enqueueTimestampRecording tests. Currently, enqueueTimestampRecording is not supported by batched queues. Batched queues can be enabled by setting UR_QUEUE_FLAG_SUBMISSION_BATCHED in ur_queue_flags_t or globally, through the environment variable UR_L0_FORCE_BATCHED=1. Benchmark results for default in-order queues (sycl branch, commit hash: b76f12e) and batched queues: api_overhead_benchmark_ur SubmitKernel in order: 20.839 μs api_overhead_benchmark_ur SubmitKernel batched: 12.183 μs
/* | ||
Support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo is required by the | ||
enqueueTimestampRecording tests after introducing batched queues, since | ||
batched queues do not support enqueueTimestampRecording. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* | |
Support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo is required by the | |
enqueueTimestampRecording tests after introducing batched queues, since | |
batched queues do not support enqueueTimestampRecording. | |
*/ |
This sounds more like a commit message (context why a change is made), rather than a comment (why a piece of code does something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to move this change to another commit, remove this comment or do you mean something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just remove the comment. Ideally this would be a separate commit.
ba858d4
to
5ca596c
Compare
5ca596c
to
89178cc
Compare
Adding a new feature: batched queue submissions.
Batched queues enable submission of operations to the driver in batches, therefore reducing the overhead of submitting every single operation individually. Similarly to command buffers in L0v2, they use regular command lists (later referenced as 'batches'). Operations enqueued on regular command lists are not executed immediately, but only after enqueueing the regular command list on an immediate command list. However, in contrast to command buffers, batched queues also handle submission of batches (regular command lists) instead of only collecting enqueued operations, by using an internal immediate command list.
Batched queues introduce:
Additionally, support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo has been added for native CPU, which is required by the enqueueTimestampRecording tests. Currently, enqueueTimestampRecording is not supported by batched queues.
Batched queues can be enabled by setting UR_QUEUE_FLAG_SUBMISSION_BATCHED in ur_queue_flags_t or globally, through the environment variable UR_L0_FORCE_BATCHED=1.
Benchmark results for default in-order queues (sycl branch, commit hash: b76f12e) and batched queues:
api_overhead_benchmark_ur SubmitKernel in order: 20.839 μs
api_overhead_benchmark_ur SubmitKernel batched: 12.183 μs
For tests in CI, batched queues are enabled by default, which may cause failures in tests dedicated for other types of queues, i.e. in-order. This would be reset after tests in CI.