Skip to content

Commit 11aff75

Browse files
author
Ewan Crawford
committed
[SYCL][UR][Graph] Require OpenCL simultaneous use
To support the SYCL-Graph extension on an OpenCL backend, we currently only require the presence of the `cl_khr_command_buffer` extension. This PR introduces an extra requirement on the [CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR) capability being present. This is based on the [graph execution wording](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc#765-new-handler-member-functions) on the definition of `handler::ext_oneapi_graph()` that: > Only one instance of graph will execute at any time. If graph is submitted multiple times, dependencies are automatically added by the runtime to prevent concurrent executions of an identical graph. Such usage results in multiple calls by the SYCL runtime to `urEnqueueCommandBufferExp` with the same UR command-buffer and event dependencies to prevent concurrent execution. Without support for simultaneous-use the OpenCL adapter code cannot guarantee that the first command-buffer submission has finished execution before it makes following `clEnqueueCommandBufferKHR` calls with the `cl_event` decencies. If the first submission is still executing, then an error will be reported. Workarounds like adding blocking host waits to the OpenCL UR adapter are possible, but requiring simultaneous use reflects the vendor requirements as they are for the currently implementation. I've tried to document this all in the UR spec and SYCL-Graph design docs, which also includes a couple of cleanups I found along the way.
1 parent 752678e commit 11aff75

File tree

3 files changed

+105
-75
lines changed

3 files changed

+105
-75
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 71 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -28,30 +28,26 @@ document for details of support of different SYCL backends.
2828
### UR Command-Buffer Experimental Feature
2929

3030
The command-buffer concept has been introduced to UR as an
31-
[experimental feature](https://oneapi-src.github.io/unified-runtime/core/api.html#command-buffer-experimental)
32-
with the following entry-points:
33-
34-
| Function | Description |
35-
| -------------------------------------------- | ----------- |
36-
| `urCommandBufferCreateExp` | Create a command-buffer. |
37-
| `urCommandBufferRetainExp` | Incrementing reference count of command-buffer. |
38-
| `urCommandBufferReleaseExp` | Decrementing reference count of command-buffer. |
39-
| `urCommandBufferFinalizeExp` | No more commands can be appended, makes command-buffer ready to enqueue on a command-queue. |
40-
| `urCommandBufferAppendKernelLaunchExp` | Append a kernel execution command to command-buffer. |
41-
| `urCommandBufferAppendUSMMemcpyExp` | Append a USM memcpy command to the command-buffer. |
42-
| `urCommandBufferAppendUSMFillExp` | Append a USM fill command to the command-buffer. |
43-
| `urCommandBufferAppendMemBufferCopyExp` | Append a mem buffer copy command to the command-buffer. |
44-
| `urCommandBufferAppendMemBufferWriteExp` | Append a memory write command to a command-buffer object. |
45-
| `urCommandBufferAppendMemBufferReadExp` | Append a memory read command to a command-buffer object. |
46-
| `urCommandBufferAppendMemBufferCopyRectExp` | Append a rectangular memory copy command to a command-buffer object. |
47-
| `urCommandBufferAppendMemBufferWriteRectExp` | Append a rectangular memory write command to a command-buffer object. |
48-
| `urCommandBufferAppendMemBufferReadRectExp` | Append a rectangular memory read command to a command-buffer object. |
49-
| `urCommandBufferAppendMemBufferFillExp` | Append a memory fill command to a command-buffer object. |
50-
| `urEnqueueCommandBufferExp` | Submit command-buffer to a command-queue for execution. |
51-
| `urCommandBufferUpdateKernelLaunchExp` | Updates the parameters of a previous kernel launch command. |
52-
31+
[experimental feature](https://oneapi-src.github.io/unified-runtime/core/api.html#command-buffer-experimental).
5332
See the [UR EXP-COMMAND-BUFFER](https://oneapi-src.github.io/unified-runtime/core/EXP-COMMAND-BUFFER.html)
54-
specification for more details.
33+
specification for details.
34+
35+
Device support for SYCL-Graph is communicated to the user via two aspects.
36+
The `aspect::ext_oneapi_limited_graph` aspect for basic graph support and
37+
the `aspect::ext_oneapi_graph` aspect for full graph support.
38+
39+
The `UR_DEVICE_INFO_COMMAND_BUFFER_SUPPORT_EXP` query result is used by the
40+
SYCL-RT to inform whether to report `aspect::ext_oneapi_limited_graph`.
41+
42+
Reporting of the `aspect::ext_oneapi_graph` aspect is based on the
43+
`UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP` query result. For
44+
a device to report this aspect, the UR query must report support for all of:
45+
46+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_KERNEL_ARGUMENTS`
47+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_LOCAL_WORK_SIZE`
48+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_GLOBAL_WORK_SIZE`
49+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_GLOBAL_WORK_OFFSET`
50+
* `UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_KERNEL_HANDLE`
5551

5652
## Design
5753

@@ -608,43 +604,14 @@ SYCL-Graph is only enabled for an OpenCL backend when the
608604
extension is available, however this information isn't available until runtime
609605
due to OpenCL implementations being loaded through an ICD.
610606

611-
The `ur_exp_command_buffer` string is conditionally returned from the OpenCL
612-
command-buffer UR backend at runtime based on `cl_khr_command_buffer` support
613-
to indicate that the graph extension should be enabled. This is information
614-
is propagated to the SYCL user via the
615-
`device.get_info<info::device::graph_support>()` query for graph extension
616-
support.
617-
618-
#### Limitations
619-
620-
Due to the API mapping gaps documented in the following section, OpenCL as a
621-
SYCL backend cannot fully support the graph API. Instead, there are
622-
limitations in the types of nodes which a user can add to a graph, using
623-
an unsupported node type will cause a SYCL exception to be thrown in graph
624-
finalization with error code `sycl::errc::feature_not_supported` and a message
625-
mentioning the unsupported command. For example,
626-
627-
```
628-
terminate called after throwing an instance of 'sycl::_V1::exception'
629-
what(): USM copy command not supported by graph backend
630-
```
631-
632-
The types of commands which are unsupported, and lead to this exception are:
633-
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
634-
This corresponds to a memory buffer read command.
635-
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
636-
This corresponds to a memory buffer write command.
637-
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
638-
`dest` are USM pointers. This corresponds to a USM copy command.
639-
* `handler::fill(ptr, pattern, count)` - This corresponds to a USM memory
640-
fill command.
641-
* `handler::memset(ptr, value, numBytes)` - This corresponds to a USM memory
642-
fill command.
643-
* `handler::prefetch()`.
644-
* `handler::mem_advise()`.
645-
646-
Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
647-
is supported, as a memory buffer copy command exists in the OpenCL extension.
607+
The `UR_DEVICE_INFO_COMMAND_BUFFER_SUPPORT_EXP` UR query returns true in the
608+
OpenCL UR adapter based on
609+
the presence of `cl_khr_command_buffer`, and the OpenCL device reporting
610+
support for
611+
[CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR).
612+
The later is required to enable multiple submissions of the same executable
613+
`command_graph` object without having to do a blocking wait on prior submissions
614+
in-between.
648615

649616
#### UR API Mapping
650617

@@ -678,18 +645,56 @@ adapter where there is matching support for each function in the list.
678645
| | clGetCommandBufferInfoKHR | No |
679646
| | clCommandSVMMemcpyKHR | No |
680647
| | clCommandSVMMemFillKHR | No |
681-
| urCommandBufferUpdateKernelLaunchExp | clUpdateMutableCommandsKHR | Yes[1] |
648+
| urCommandBufferUpdateKernelLaunchExp | clUpdateMutableCommandsKHR | Partial [See Update Section](#update-support) |
682649

683650
We are looking to address these gaps in the future so that SYCL-Graph can be
684651
fully supported on a `cl_khr_command_buffer` backend.
685652

686-
[1] Support for `urCommandBufferUpdateKernelLaunchExp` used to update the
653+
#### Unsupported Command Types
654+
655+
Due to the API mapping gaps documented in the previous section, OpenCL as a
656+
SYCL backend cannot fully support the graph API. Instead, there are
657+
limitations in the types of nodes which a user can add to a graph, using
658+
an unsupported node type will cause a SYCL exception to be thrown in graph
659+
finalization with error code `sycl::errc::feature_not_supported` and a message
660+
mentioning the unsupported command. For example,
661+
662+
```
663+
terminate called after throwing an instance of 'sycl::_V1::exception'
664+
what(): USM copy command not supported by graph backend
665+
```
666+
667+
The types of commands which are unsupported, and lead to this exception are:
668+
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
669+
This corresponds to a memory buffer read command.
670+
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
671+
This corresponds to a memory buffer write command.
672+
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
673+
`dest` are USM pointers. This corresponds to a USM copy command.
674+
* `handler::fill(ptr, pattern, count)` - This corresponds to a USM memory
675+
fill command.
676+
* `handler::memset(ptr, value, numBytes)` - This corresponds to a USM memory
677+
fill command.
678+
* `handler::prefetch()`.
679+
* `handler::mem_advise()`.
680+
681+
Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
682+
is supported, as a memory buffer copy command exists in the OpenCL extension.
683+
684+
#### Update Support
685+
686+
Support for `urCommandBufferUpdateKernelLaunchExp` used to update the
687687
configuration of kernel commands requires an OpenCL implementation with the
688688
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
689-
extension. The optional capabilities that are reported by this extension must
690-
include all of of `CL_MUTABLE_DISPATCH_GLOBAL_OFFSET_KHR`,
691-
`CL_MUTABLE_DISPATCH_GLOBAL_SIZE_KHR`, `CL_MUTABLE_DISPATCH_LOCAL_SIZE_KHR`,
692-
`CL_MUTABLE_DISPATCH_ARGUMENTS_KHR`, and `CL_MUTABLE_DISPATCH_EXEC_INFO_KHR`.
689+
extension.
690+
691+
However, the OpenCL adapter can not report `aspect::ext_oneapi_graph` for full
692+
SYCL-Graph support. As the `cl_khr_command_buffer_mutable_dispatch` extension
693+
has no concept of updating the `cl_kernel` objects in kernel commands, and so
694+
can't report the
695+
`UR_DEVICE_COMMAND_BUFFER_UPDATE_CAPABILITY_FLAG_KERNEL_HANDLE` capability.
696+
This extension limitation is tracked in by the OpenCL Working Group in an
697+
[OpenCL-Docs Issue](https://github.com/KhronosGroup/OpenCL-Docs/issues/1279).
693698

694699
#### UR Command-Buffer Implementation
695700

unified-runtime/scripts/core/EXP-COMMAND-BUFFER.rst

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,11 @@ to provide additional properties for how the command-buffer should be
5858
constructed. The members defined in ${x}_exp_command_buffer_desc_t are:
5959

6060
* ``isUpdatable``, which should be set to ``true`` to support :ref:`updating
61-
command-buffer commands`.
61+
command-buffer commands`.
6262
* ``isInOrder``, which should be set to ``true`` to enable commands enqueued to
63-
a command-buffer to be executed in an in-order fashion where possible.
63+
a command-buffer to be executed in an in-order fashion where possible.
6464
* ``enableProfiling``, which should be set to ``true`` to enable profiling of
65-
the command-buffer.
65+
the command-buffer.
6666

6767
Command-buffers are reference counted and can be retained and released by
6868
calling ${x}CommandBufferRetainExp and ${x}CommandBufferReleaseExp respectively.
@@ -226,15 +226,30 @@ Enqueueing Command-Buffers
226226
Command-buffers are submitted for execution on a ${x}_queue_handle_t with an
227227
optional list of dependent events. An event is returned which tracks the
228228
execution of the command-buffer, and will be complete when all appended commands
229-
have finished executing. It is adapter specific whether command-buffers can be
230-
enqueued or executed simultaneously, and submissions may be serialized.
229+
have finished executing.
231230

232231
.. parsed-literal::
233232
${x}_event_handle_t executionEvent;
234-
235233
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
236234
&executionEvent);
237235
236+
A command-buffer can be submitted for execution while a previous submission
237+
of the same command-buffer is still awaiting completion. That is, the user is not
238+
required to do a blocking wait on the completion of the first command-buffer
239+
submission before making a second submission of the command-buffer.
240+
241+
Submissions of the same command-buffer should be synchronized to prevent
242+
concurrent execution. For example, by using events, barriers, or in-order queue
243+
dependencies. The behavior of multiple submissions of the same command-buffer
244+
that can execute concurrently is undefined.
245+
246+
.. parsed-literal::
247+
// Valid usage if hQueue is in-order but undefined behavior is out-of-order
248+
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
249+
nullptr);
250+
${x}EnqueueCommandBufferExp(hQueue, hCommandBuffer, 0, nullptr,
251+
nullptr);
252+
238253
239254
Updating Command-Buffer Commands
240255
--------------------------------------------------------------------------------

unified-runtime/source/adapters/opencl/device.cpp

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1524,9 +1524,19 @@ UR_APIEXPORT ur_result_t UR_APICALL urDeviceGetInfo(ur_device_handle_t hDevice,
15241524
CL_RETURN_ON_FAILURE(clGetDeviceInfo(Dev, CL_DEVICE_EXTENSIONS, ExtSize,
15251525
ExtStr.data(), nullptr));
15261526

1527-
std::string SupportedExtensions(ExtStr.c_str());
1528-
return ReturnValue(ExtStr.find("cl_khr_command_buffer") !=
1529-
std::string::npos);
1527+
// cl_khr_command_buffer is required for UR command-buffer support
1528+
cl_device_command_buffer_capabilities_khr Caps = 0;
1529+
if (ExtStr.find("cl_khr_command_buffer") != std::string::npos) {
1530+
// A UR command-buffer user needs to be able to enqueue another
1531+
// submission of the same UR command-buffer without having to manually
1532+
// check if the first submission has completed.
1533+
CL_RETURN_ON_FAILURE(
1534+
clGetDeviceInfo(Dev, CL_DEVICE_COMMAND_BUFFER_CAPABILITIES_KHR,
1535+
sizeof(Caps), &Caps, nullptr));
1536+
}
1537+
1538+
return ReturnValue(
1539+
0 != (Caps & CL_COMMAND_BUFFER_CAPABILITY_SIMULTANEOUS_USE_KHR));
15301540
}
15311541
case UR_DEVICE_INFO_COMMAND_BUFFER_UPDATE_CAPABILITIES_EXP: {
15321542
cl_device_id Dev = cl_adapter::cast<cl_device_id>(hDevice);

0 commit comments

Comments
 (0)