[UR][L0v2] add support for batched queue submissions #19769

EuphoricThinking · 2025-08-12T10:28:52Z

Adding a new feature: batched queue submissions.

Batched queues enable submission of operations to the driver in batches, therefore reducing the overhead of submitting every single operation individually. Similarly to command buffers in L0v2, they use regular command lists (later referenced as 'batches'). Operations enqueued on regular command lists are not executed immediately, but only after enqueueing the regular command list on an immediate command list. However, in contrast to command buffers, batched queues also handle submission of batches (regular command lists) instead of only collecting enqueued operations, by using an internal immediate command list.

Batched queues introduce:

batch_manager stores the current batch, the command list manager with an immediate command list for batch submissions, the vector of submitted batches, the generation number of the current batch.
The current batch is a command list manager with a regular command list; operations requested by users are enqueued on the current batch. The current batch may be submitted for execution on the immediate command list, replaced by a new regular command list and stored for execution completion in the vector of submitted batches.
The number of regular command lists stored for execution is limited.
The generation number of the current batch is assigned to events associated with operations enqueued on the given batch. It is incremented during every replacement of the current batch. When an event created by a batched queue appears in an eventWaitList, the batch assigned to the given event might not have been executed yet and the event might never be signalled. Comparing generation numbers enables determining whether the current batch should be submitted for execution. If the generation number of the current batch is higher than the number assigned to the given event, the batch associated with the event has already been submitted for execution and additional submission of the current batch is not needed.
Regular command lists use the regular pool cache type, whereas immediate command lists use the immediate pool cache type. Since user-requested operations are enqueued on regular command lists and immediate command lists are only used internally by the batched queue implementation, events are not created for immediate command lists.
wait_list_view is modified. Previously, it only stored the waitlist (as a ze_event_handle buffer created from events) and the corresponding event count in a single container, which could be passed as an argument to the driver API. Currently, the constructor also ensures that all associated operations will eventually be executed. Since regular command lists are not executed immediately, but only after enqueueing on immediate lists, it is necessary to enqueue the regular command list associated with the given event. Otherwise, the event would never be signalled.

Additionally, support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo has been added for native CPU, which is required by the enqueueTimestampRecording tests. Currently, enqueueTimestampRecording is not supported by batched queues.

Batched queues can be enabled by setting UR_QUEUE_FLAG_SUBMISSION_BATCHED in ur_queue_flags_t or globally, through the environment variable UR_L0_FORCE_BATCHED=1.

Benchmark results for default in-order queues (sycl branch, commit hash: b76f12e) and batched queues:
api_overhead_benchmark_ur SubmitKernel in order: 20.839 μs
api_overhead_benchmark_ur SubmitKernel batched: 12.183 μs

For tests in CI, batched queues are enabled by default, which may cause failures in tests dedicated for other types of queues, i.e. in-order. This would be reset after tests in CI.

pbalcer

Have you been able to run the SubmitKernel benchmarks? If so, can you please share results?

devops/scripts/benchmarks/utils/compute_runtime.py

unified-runtime/source/adapters/level_zero/adapter.cpp

unified-runtime/source/adapters/level_zero/v2/command_list_manager.cpp

unified-runtime/source/adapters/level_zero/v2/event.cpp

unified-runtime/source/adapters/level_zero/v2/queue_batched.cpp

pbalcer · 2025-10-16T11:36:34Z

unified-runtime/source/adapters/level_zero/v2/queue_batched.hpp

+
+namespace v2 {
+
+struct batch_manager {


You use three different styles of multi-line comments in this struct. I think the most commonly used style in the adapter codebase is:

// // //

But if you want to use block-style comments, do:

/* * ... * ... */

I have unified the format in this file for multi-line comments, although it still differs slightly from your recommendation (this is the version from the formatter)

Can you do the same for all the comments in the patch?

this is the version from the formatter
AFAIK clang-format's LLVM multi-line comment style is this:

/* * first line * second line */

I'd be odd if it changed it to something else.

Batched queues enable submission of operations to the driver in batches, therefore reducing the overhead of submitting every single operation individually. Similarly to command buffers in L0v2, they use regular command lists (later referenced as 'batches'). Operations enqueued on regular command lists are not executed immediately, but only after enqueueing the regular command list on an immediate command list. However, in contrast to command buffers, batched queues also handle submission of batches (regular command lists) instead of only collecting enqueued operations, by using an internal immediate command list. Batched queues introduce: - batch_manager stores the current batch, the command list manager with an immediate command list for batch submissions, the vector of submitted batches, the generation number of the current batch. - The current batch is a command list manager with a regular command list; operations requested by users are enqueued on the current batch. The current batch may be submitted for execution on the immediate command list, replaced by a new regular command list and stored for execution completion in the vector of submitted batches. - The number of regular command lists stored for execution is limited. - The generation number of the current batch is assigned to events associated with operations enqueued on the given batch. It is incremented during every replacement of the current batch. When an event created by a batched queue appears in an eventWaitList, the batch assigned to the given event might not have been executed yet and the event might never be signalled. Comparing generation numbers enables determining whether the current batch should be submitted for execution. If the generation number of the current batch is higher than the number assigned to the given event, the batch associated with the event has already been submitted for execution and additional submission of the current batch is not needed. - Regular command lists use the regular pool cache type, whereas immediate command lists use the immediate pool cache type. Since user-requested operations are enqueued on regular command lists and immediate command lists are only used internally by the batched queue implementation, events are not created for immediate command lists. - wait_list_view is modified. Previously, it only stored the waitlist (as a ze_event_handle buffer created from events) and the corresponding event count in a single container, which could be passed as an argument to the driver API. Currently, the constructor also ensures that all associated operations will eventually be executed. Since regular command lists are not executed immediately, but only after enqueueing on immediate lists, it is necessary to enqueue the regular command list associated with the given event. Otherwise, the event would never be signalled. Additionally, support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo has been added for native CPU, which is required by the enqueueTimestampRecording tests. Currently, enqueueTimestampRecording is not supported by batched queues. Batched queues can be enabled by setting UR_QUEUE_FLAG_SUBMISSION_BATCHED in ur_queue_flags_t or globally, through the environment variable UR_L0_FORCE_BATCHED=1. Benchmark results for default in-order queues (sycl branch, commit hash: b76f12e) and batched queues: api_overhead_benchmark_ur SubmitKernel in order: 20.839 μs api_overhead_benchmark_ur SubmitKernel batched: 12.183 μs

pbalcer · 2025-10-17T06:22:42Z

unified-runtime/source/adapters/native_cpu/queue.cpp

+    /*
+    Support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo is required by the
+    enqueueTimestampRecording tests after introducing batched queues, since
+    batched queues do not support enqueueTimestampRecording.
+    */


Suggested change

/*

Support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo is required by the

enqueueTimestampRecording tests after introducing batched queues, since

batched queues do not support enqueueTimestampRecording.

*/

This sounds more like a commit message (context why a change is made), rather than a comment (why a piece of code does something).

Is it better to move this change to another commit, remove this comment or do you mean something else?

I'd just remove the comment. Ideally this would be a separate commit.

EuphoricThinking temporarily deployed to WindowsCILock August 12, 2025 10:29 — with GitHub Actions Inactive

EuphoricThinking temporarily deployed to WindowsCILock August 12, 2025 10:50 — with GitHub Actions Inactive

EuphoricThinking force-pushed the batch4_pr branch from 649dc1d to 77ab98d Compare August 14, 2025 13:56

EuphoricThinking temporarily deployed to WindowsCILock August 14, 2025 13:56 — with GitHub Actions Inactive

EuphoricThinking temporarily deployed to WindowsCILock August 14, 2025 14:17 — with GitHub Actions Inactive

EuphoricThinking force-pushed the batch4_pr branch from 77ab98d to cd7dace Compare August 17, 2025 12:03

EuphoricThinking had a problem deploying to WindowsCILock August 17, 2025 12:04 — with GitHub Actions Error

EuphoricThinking force-pushed the batch4_pr branch from cd7dace to 0caa66d Compare August 17, 2025 12:09

EuphoricThinking temporarily deployed to WindowsCILock August 17, 2025 12:09 — with GitHub Actions Inactive

EuphoricThinking temporarily deployed to WindowsCILock August 17, 2025 12:30 — with GitHub Actions Inactive

EuphoricThinking had a problem deploying to WindowsCILock August 17, 2025 12:30 — with GitHub Actions Failure

EuphoricThinking force-pushed the batch4_pr branch from 0caa66d to 9115f1f Compare August 17, 2025 13:36

EuphoricThinking temporarily deployed to WindowsCILock August 17, 2025 13:36 — with GitHub Actions Inactive

EuphoricThinking temporarily deployed to WindowsCILock August 17, 2025 13:59 — with GitHub Actions Inactive

EuphoricThinking force-pushed the batch4_pr branch from 9115f1f to b2e0362 Compare August 18, 2025 04:59

EuphoricThinking temporarily deployed to WindowsCILock August 18, 2025 04:59 — with GitHub Actions Inactive

EuphoricThinking temporarily deployed to WindowsCILock August 18, 2025 05:22 — with GitHub Actions Inactive

pbalcer reviewed Aug 18, 2025

View reviewed changes

EuphoricThinking temporarily deployed to WindowsCILock August 18, 2025 14:26 — with GitHub Actions Inactive

EuphoricThinking had a problem deploying to WindowsCILock August 18, 2025 14:46 — with GitHub Actions Failure

EuphoricThinking temporarily deployed to WindowsCILock August 18, 2025 14:46 — with GitHub Actions Inactive

EuphoricThinking force-pushed the batch4_pr branch from c55074e to c9accf7 Compare August 20, 2025 13:55

EuphoricThinking temporarily deployed to WindowsCILock August 20, 2025 13:55 — with GitHub Actions Inactive

EuphoricThinking had a problem deploying to WindowsCILock August 20, 2025 14:17 — with GitHub Actions Failure

EuphoricThinking temporarily deployed to WindowsCILock August 20, 2025 14:17 — with GitHub Actions Inactive

EuphoricThinking force-pushed the batch4_pr branch from c9accf7 to 12f9354 Compare August 20, 2025 18:29

EuphoricThinking marked this pull request as ready for review October 16, 2025 11:31

EuphoricThinking requested review from a team as code owners October 16, 2025 11:31

EuphoricThinking requested a review from mmichel11 October 16, 2025 11:31

pbalcer reviewed Oct 16, 2025

View reviewed changes

pbalcer approved these changes Oct 16, 2025

View reviewed changes

EuphoricThinking temporarily deployed to WindowsCILock October 16, 2025 11:44 — with GitHub Actions Inactive

EuphoricThinking force-pushed the batch4_pr branch from 423c951 to 704356d Compare October 17, 2025 01:48

EuphoricThinking had a problem deploying to WindowsCILock October 17, 2025 01:50 — with GitHub Actions Error

EuphoricThinking force-pushed the batch4_pr branch from 704356d to c79c8ea Compare October 17, 2025 02:08

EuphoricThinking temporarily deployed to WindowsCILock October 17, 2025 02:09 — with GitHub Actions Inactive

EuphoricThinking had a problem deploying to WindowsCILock October 17, 2025 02:31 — with GitHub Actions Failure

EuphoricThinking temporarily deployed to WindowsCILock October 17, 2025 02:31 — with GitHub Actions Inactive

EuphoricThinking added 2 commits October 17, 2025 03:06

for tests in CI

c79c8ea

pbalcer reviewed Oct 17, 2025

View reviewed changes

EuphoricThinking had a problem deploying to WindowsCILock October 17, 2025 14:09 — with GitHub Actions Error

EuphoricThinking force-pushed the batch4_pr branch from ba858d4 to 5ca596c Compare October 17, 2025 14:30

EuphoricThinking temporarily deployed to WindowsCILock October 17, 2025 14:31 — with GitHub Actions Inactive

EuphoricThinking temporarily deployed to WindowsCILock October 17, 2025 15:38 — with GitHub Actions Inactive

EuphoricThinking had a problem deploying to WindowsCILock October 17, 2025 15:38 — with GitHub Actions Error

EuphoricThinking had a problem deploying to WindowsCILock October 17, 2025 16:39 — with GitHub Actions Failure

EuphoricThinking force-pushed the batch4_pr branch from 5ca596c to 89178cc Compare October 18, 2025 07:51

EuphoricThinking temporarily deployed to WindowsCILock October 18, 2025 07:51 — with GitHub Actions Inactive

EuphoricThinking temporarily deployed to WindowsCILock October 18, 2025 08:13 — with GitHub Actions Inactive

EuphoricThinking had a problem deploying to WindowsCILock October 18, 2025 08:13 — with GitHub Actions Failure

add batch queues workarounds and skips

89178cc


		namespace v2 {

		struct batch_manager {

[UR][L0v2] add support for batched queue submissions #19769

Are you sure you want to change the base?

[UR][L0v2] add support for batched queue submissions #19769

Uh oh!

Conversation

EuphoricThinking commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbalcer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbalcer Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

EuphoricThinking Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

pbalcer Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

pbalcer Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

EuphoricThinking Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

pbalcer Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EuphoricThinking commented Aug 12, 2025 •

edited

Loading