Skip to content

Commit 3e95c0c

Browse files
[SYCL][UR][MemExport][Exp] Initial spec and L0 implementation (#19018)
This patch introduces new SYCL and UR extensions for exporting memory. - SYCL: `sycl_ext_oneapi_memory_export` - UR: `EXP-MEMORY-EXPORT` The exported memory handles can be imported by external APIs and operated on without the need to perform costly copy operations. Three new SYCL APIs are added by this extension: - `alloc_exportable_device_mem` - `export_device_mem_handle` - `free_exportable_mem` Currently we can only export "linear" memory layouts. Exporting other memory layouts, such as "optimal" (Vulkan equivalent `VK_IMAGE_TILING_OPTIMAL`) is not currently supported by any UR adapter APIs.
1 parent 8e347de commit 3e95c0c

File tree

71 files changed

+3312
-266
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+3312
-266
lines changed

llvm/include/llvm/SYCLLowerIR/DeviceConfigFile.td

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ def AspectExt_intel_power_limits : Aspect<"ext_intel_power_limits">;
9494
def AspectExt_oneapi_async_memory_alloc : Aspect<"ext_oneapi_async_memory_alloc">;
9595
def AspectExt_intel_device_info_luid : Aspect<"ext_intel_device_info_luid">;
9696
def AspectExt_intel_device_info_node_mask : Aspect<"ext_intel_device_info_node_mask">;
97+
def Aspectext_oneapi_exportable_device_mem : Aspect<"ext_oneapi_exportable_device_mem">;
9798

9899
// Deprecated aspects
99100
def AspectInt64_base_atomics : Aspect<"int64_base_atomics">;
@@ -167,7 +168,8 @@ def : TargetInfo<"__TestAspectList",
167168
AspectExt_intel_power_limits,
168169
AspectExt_oneapi_async_memory_alloc,
169170
AspectExt_intel_device_info_luid,
170-
AspectExt_intel_device_info_node_mask],
171+
AspectExt_intel_device_info_node_mask,
172+
Aspectext_oneapi_exportable_device_mem],
171173
[]>;
172174
// This definition serves the only purpose of testing whether the deprecated aspect list defined in here and in SYCL RT
173175
// match.
Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
= sycl_ext_oneapi_memory_export
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
:endnote: &#8212;{nbsp}end{nbsp}note
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
21+
== Notice
22+
23+
[%hardbreaks]
24+
Copyright (C) Codeplay. All rights reserved.
25+
26+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
27+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
28+
permission by Khronos.
29+
30+
31+
== Contact
32+
33+
To report problems with this extension, please open a new issue at:
34+
35+
https://github.com/intel/llvm/issues
36+
37+
== Dependencies
38+
39+
This extension is written against the SYCL 2020 revision 10 specification. All
40+
references below to the "core SYCL specification" or to section numbers in the
41+
SYCL specification refer to that revision.
42+
43+
== Status
44+
45+
This is an experimental extension specification, intended to provide early
46+
access to features and gather community feedback. Interfaces defined in this
47+
specification are implemented in {dpcpp}, but they are not finalized and may
48+
change incompatibly in future versions of {dpcpp} without prior notice.
49+
*Shipping software products should not rely on APIs defined in this
50+
specification.*
51+
52+
== Backend support status
53+
54+
This extension is currently implemented in {dpcpp} only for GPU devices and
55+
only when using the Level Zero backend.
56+
57+
== Overview
58+
59+
This extension provides new APIs for allocating and deallocating exportable
60+
device memory in SYCL, and obtaining a handle to that memory which can be used
61+
in external APIs. This is useful when applications want to share device memory
62+
with other third-party APIs.
63+
64+
Without the ability to allocate exportable memory and obtain an interoperable
65+
handle, applications would have to copy device memory allocated by one API to
66+
the host, then copy that host memory back to the device in a memory region
67+
allocated by a second API. If the second API modifies that memory, then this
68+
process would have to be repeated in the opposite direction in order for the
69+
first API to see the changes made to that memory.
70+
71+
This extension enables copy-free sharing of SYCL allocated device memory with
72+
external APIs.
73+
74+
== Specification
75+
76+
=== Feature test macro
77+
78+
This extension provides a feature-test macro as described in the core SYCL
79+
specification. An implementation supporting this extension must predefine the
80+
macro `SYCL_EXT_ONEAPI_MEMORY_EXPORT` to one of the values defined in the
81+
table below. Applications can test for the existence of this macro to
82+
determine if the implementation supports this feature, or applications can test
83+
the macro's value to determine which of the extension's features the
84+
implementation supports.
85+
86+
[frame="none",options="header"]
87+
|======================
88+
|Rev | Description
89+
|1 | Initial draft of the proposal
90+
|======================
91+
92+
=== Querying device support
93+
94+
We provide the following device aspect to query for support of exporting memory.
95+
96+
[frame="none",options="header"]
97+
|======================
98+
|Device descriptor |Description
99+
|`aspect::ext_oneapi_exportable_device_mem` | Indicates if the device supports
100+
the allocation of exportable linear memory and exporting that memory to an
101+
interoperable handle.
102+
|======================
103+
104+
=== External Memory Resource Handle Types [[external_mem_res_handles]]
105+
106+
This extension provides an enum `external_mem_handle_type` that defines several
107+
external memory resource handle types that can be used as interoperable
108+
handles to import SYCL allocated memory into external APIs.
109+
110+
[_Note:_ Not all of the handle types defined in this enum may be supported for
111+
exporting memory by the implementation. Currently, the {dpcpp} implementation
112+
only supports exporting memory with the `opaque_fd` and `win32_nt_handle` handle
113+
types. This enum is shared with the memory import functionality defined in the
114+
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc[sycl_ext_oneapi_bindless_images]
115+
extension, where more handle types may be supported for importing memory into
116+
SYCL.
117+
_{endnote}_]
118+
119+
```c++
120+
namespace sycl::ext::oneapi::experimental {
121+
122+
// External memory resource handle types.
123+
enum class external_mem_handle_type {
124+
opaque_fd = 0,
125+
win32_nt_handle = 1,
126+
win32_nt_dx12_resource = 2,
127+
dma_buf = 3,
128+
win32_nt_dx11_resource = 4,
129+
};
130+
131+
}
132+
```
133+
134+
The `external_mem_handle_type` enum class defines the types of external memory
135+
resource handles that can be exported by this extension. The `opaque_fd` and
136+
`win32_nt_handle` values are used during allocation of exportable memory to
137+
indicate the type of handle that will later be returned by the
138+
`export_device_mem_handle` function.
139+
140+
The `opaque_fd` handle type corresponds to a POSIX file descriptor, which is
141+
represented by an `int`.
142+
143+
The `win32_nt_handle` handle type corresponds to a Windows NT handle, which is
144+
represented by a `void *`.
145+
146+
=== API of the extension
147+
148+
```c++
149+
150+
namespace sycl::ext::oneapi::experimental {
151+
152+
void *alloc_exportable_device_mem(
153+
size_t alignment, size_t size,
154+
external_mem_handle_type externalMemHandleType,
155+
const sycl::device &syclDevice, const sycl::context &syclContext,
156+
const property_list& propList = {});
157+
158+
void *alloc_exportable_device_mem(
159+
size_t alignment, size_t size,
160+
external_mem_handle_type externalMemHandleType,
161+
const sycl::queue &syclQueue,
162+
const property_list& propList = {});
163+
}
164+
```
165+
166+
The `alloc_exportable_device_mem` function allocates memory on the device marked
167+
as having the ability to later export that memory to an external memory resource
168+
handle.
169+
170+
Memory allocated through this function must only be freed using
171+
`free_exportable_mem`. Using `sycl::free` to deallocate memory allocated with
172+
this function results in undefined behavior.
173+
174+
With the exception of the `sycl::free` function from the core SYCL
175+
specification, pointers to memory allocated through this function may be passed
176+
to any core SYCL specification API accepting device USM memory pointers.
177+
178+
Memory allocated through this function is only available on device.
179+
180+
Memory allocated through this function has a linear memory layout on the device
181+
(which is the same as memory allocated by other USM allocation functions like
182+
`sycl::malloc_device`).
183+
184+
Zero or more properties can be passed in the `propList` parameter via an
185+
instance of `sycl::property_list`. Currently, this extension does not define
186+
any properties that can be used with this function, so the `propList` parameter
187+
is ignored and reserved for future use.
188+
189+
Only two values of `externalMemHandleType` are supported by this extension:
190+
191+
- `external_mem_handle_type::opaque_fd` is supported when the host is a Posix
192+
compliant operating system.
193+
194+
- `external_mem_handle_type::win32_nt_handle`` is supported when the host is
195+
Windows.
196+
197+
No other values are supported. This function will throw a `sycl::exception` with
198+
the `errc::feature_not_supported` code if an unsupported value is passed.
199+
200+
This function will throw a `sycl::exception` with `errc::feature_not_supported`
201+
if the device `syclDevice` does not have
202+
`aspect::ext_oneapi_exportable_device_mem`.
203+
204+
This function will throw a `sycl::exception` with the `errc::runtime` code if
205+
any error occurs while allocating the memory.
206+
207+
```c++
208+
209+
namespace sycl::ext::oneapi::experimental {
210+
211+
template <external_mem_handle_type ExternalMemHandleType>
212+
__return_type__
213+
export_device_mem_handle(void *deviceMemory, const sycl::device &syclDevice,
214+
const sycl::context &syclContext);
215+
216+
template <external_mem_handle_type ExternalMemHandleType>
217+
__return_type__
218+
export_device_mem_handle(void *deviceMemory, const sycl::queue &syclQueue);
219+
220+
}
221+
```
222+
223+
Constraints: `ExternalMemHandleType` is either
224+
`external_mem_handle_type::opaque_fd` or
225+
`external_mem_handle_type::win32_nt_handle`.
226+
227+
When `ExternalMemHandleType` is `external_mem_handle_type::opaque_fd`, the
228+
`+__return_type__+` is `int`.
229+
230+
When `ExternalMemHandleType` is `external_mem_handle_type::win32_nt_handle`, the
231+
`+__return_type__+` is `void *`.
232+
233+
The `export_device_mem_handle` function accepts a `void *` representing a device
234+
allocation made using `alloc_exportable_device_mem`.
235+
236+
The value of `ExternalMemHandleType` must match the value passed to
237+
`alloc_exportable_device_mem` when the memory was allocated. Passing an
238+
`ExternalMemHandleType` value that not match the value passed to
239+
`alloc_exportable_device_mem` results in undefined behavior.
240+
241+
The `syclDevice` and `syclContext` passed to `export_device_mem_handle` must
242+
match the device and context used when the `deviceMemory` was allocated using
243+
`alloc_exportable_device_mem`. If a `syclQueue` is passed, it must also be
244+
associated with the same SYCL device and context used when the memory was
245+
allocated.
246+
247+
This function will throw a `sycl::exception` with the `errc::runtime` code if
248+
any error occurs while exporting the memory handle.
249+
250+
[_Note:_ The returned handle may be used to import the SYCL allocated memory
251+
into an external API, such as Vulkan or DirectX.
252+
_{endnote}_]
253+
254+
```c++
255+
256+
namespace sycl::ext::oneapi::experimental {
257+
258+
void free_exportable_mem(void *deviceMemory,
259+
const sycl::device &syclDevice,
260+
const sycl::context &syclContext);
261+
262+
void free_exportable_mem(void *deviceMemory,
263+
const sycl::queue &syclQueue);
264+
}
265+
```
266+
267+
The `free_exportable_mem` function deallocates memory, represented by the
268+
`void *` parameter, which has been previously allocated through
269+
`alloc_exportable_device_mem`.
270+
271+
Using `free_exportable_mem` on memory allocated through any function other
272+
than `alloc_exportable_device_mem` results in undefined behavior.
273+
274+
Using `free_exportable_mem` on a memory region invalidates the handle
275+
returned by `export_device_mem_handle` for that region. The handle must not be
276+
used after the memory has been freed.
277+
278+
The `syclDevice` and `syclContext` passed to `free_exportable_mem` must
279+
match the device and context used when the `deviceMemory` was allocated using
280+
`alloc_exportable_device_mem`. If a `syclQueue` is passed, it must also be
281+
associated with the same SYCL device and context used when the memory was
282+
allocated.
283+
284+
This function will throw a `sycl::exception` with the `errc::runtime` code if
285+
any error occurs while freeing the memory.
286+
287+
== Issues and Limitations
288+
289+
=== Memory Layout
290+
291+
This extension is currently limited to exporting memory with a linear layout. It
292+
does not support exporting memory with a non-linear layout, such as the
293+
"optimal" layout which would have an equivalent in Vulkan as
294+
`VK_IMAGE_LAYOUT_OPTIMAL`, or in CUDA as `cudaArray`. These "optimal" layouts
295+
are typically optimized for texture access.
296+
297+
The reason for this limitation is that currently, no backend supported by
298+
{dpcpp} supports exporting memory with a non-linear layout. This may change in
299+
the future, and if it does, we could then amend the extension to support
300+
exporting memory with a non-linear layout.
301+
302+
=== Closing OS Handles
303+
304+
When a call is made to `export_device_mem_handle`, the {dpcpp} implementation
305+
will internally create an OS specific handle to the memory region. Both CUDA and
306+
Level Zero allow the user to specify the type of handle to be created. However,
307+
this is not always respected by the Level Zero driver. For this reason, if the
308+
user wishes to close the OS handle returned by `export_device_mem_handle`
309+
without freeing the memory, they must call the appropriate OS specific API to
310+
close the type of handle returned by the function.
311+
312+
When exporting a file descriptor handle on Linux, our testing has shown that the
313+
`close` Linux API should work.
314+
315+
On Windows systems, the type of OS handle returned by `export_device_mem_handle`
316+
may not be an NT handle (e.g. it may be a KMT handle), and therefore the user
317+
may experience issues when trying to close the handle using the `CloseHandle`
318+
Windows API.
319+
320+
The issue of closing OS handles returned by `export_device_mem_handle` is
321+
something we are aware of and want to address in future versions of this
322+
extension. Once we have a solution, we will update this specification with a
323+
SYCL API that will close the OS handles returned by `export_device_mem_handle`
324+
without freeing the memory.
325+
326+
=== Using `sycl::malloc_device ` and `sycl::free` for exportable memory
327+
328+
As this is an initial draft of an experimental extension, we provide explicit
329+
APIs for the allocation and deallocation of exportable memory. However, there
330+
is nothing in principle that should prevent this extensions from using
331+
`sycl::malloc_device` with a `sycl::property` to allocate exportable memory,
332+
and `sycl::free` to deallocate it. While the implementation of this in {dpcpp}
333+
would involve minor overhead, it would allow the user to use the same
334+
allocation and deallocation APIs for both exportable and non-exportable memory.
335+
336+
We are considering this approach for future versions of this extension, but for
337+
this initial draft we've have decided to provide explicit APIs to simplify the
338+
implementation and gather early feedback.
339+
340+
=== Querying Supported External Memory Handle Types
341+
342+
Currently, there is no way to query which external memory handle types are
343+
supported by the implementation. As this is an initial draft of an
344+
experimental extension intended to gather early feedback, we have not
345+
implemented this functionality yet. However, we are aware of this limitation
346+
and plan to address it in future versions of this extension.
347+
348+
== Revision History
349+
350+
[frame="none",options="header"]
351+
|===============================================================================
352+
|Rev |Date | Author | Changes
353+
|1.0 |2025-07-18 | Przemek Malon | Initial draft
354+
|===============================================================================

sycl/include/sycl/ext/oneapi/bindless_images_interop.hpp

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,23 +8,15 @@
88

99
#pragma once
1010

11-
#include <ur_api.h>
11+
#include "interop_common.hpp" // For external_mem_handle_type.
1212

13-
#include <stddef.h> // for size_t
13+
#include <stddef.h> // For size_t.
14+
#include <ur_api.h>
1415

1516
namespace sycl {
1617
inline namespace _V1 {
1718
namespace ext::oneapi::experimental {
1819

19-
// Types of external memory handles
20-
enum class external_mem_handle_type {
21-
opaque_fd = 0,
22-
win32_nt_handle = 1,
23-
win32_nt_dx12_resource = 2,
24-
dma_buf = 3,
25-
win32_nt_dx11_resource = 4,
26-
};
27-
2820
// Types of external semaphore handles
2921
enum class external_semaphore_handle_type {
3022
opaque_fd = 0,

0 commit comments

Comments
 (0)