|
| 1 | += sycl_ext_oneapi_memory_export |
| 2 | + |
| 3 | +:source-highlighter: coderay |
| 4 | +:coderay-linenums-mode: table |
| 5 | + |
| 6 | +// This section needs to be after the document title. |
| 7 | +:doctype: book |
| 8 | +:toc2: |
| 9 | +:toc: left |
| 10 | +:encoding: utf-8 |
| 11 | +:lang: en |
| 12 | +:dpcpp: pass:[DPC++] |
| 13 | +:endnote: —{nbsp}end{nbsp}note |
| 14 | + |
| 15 | +// Set the default source code type in this document to C++, |
| 16 | +// for syntax highlighting purposes. This is needed because |
| 17 | +// docbook uses c++ and html5 uses cpp. |
| 18 | +:language: {basebackend@docbook:c++:cpp} |
| 19 | + |
| 20 | + |
| 21 | +== Notice |
| 22 | + |
| 23 | +[%hardbreaks] |
| 24 | +Copyright (C) Codeplay. All rights reserved. |
| 25 | + |
| 26 | +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks |
| 27 | +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by |
| 28 | +permission by Khronos. |
| 29 | + |
| 30 | + |
| 31 | +== Contact |
| 32 | + |
| 33 | +To report problems with this extension, please open a new issue at: |
| 34 | + |
| 35 | +https://github.com/intel/llvm/issues |
| 36 | + |
| 37 | +== Dependencies |
| 38 | + |
| 39 | +This extension is written against the SYCL 2020 revision 10 specification. All |
| 40 | +references below to the "core SYCL specification" or to section numbers in the |
| 41 | +SYCL specification refer to that revision. |
| 42 | + |
| 43 | +== Status |
| 44 | + |
| 45 | +This is an experimental extension specification, intended to provide early |
| 46 | +access to features and gather community feedback. Interfaces defined in this |
| 47 | +specification are implemented in {dpcpp}, but they are not finalized and may |
| 48 | +change incompatibly in future versions of {dpcpp} without prior notice. |
| 49 | +*Shipping software products should not rely on APIs defined in this |
| 50 | +specification.* |
| 51 | + |
| 52 | +== Backend support status |
| 53 | + |
| 54 | +This extension is currently implemented in {dpcpp} only for GPU devices and |
| 55 | +only when using the Level Zero backend. |
| 56 | + |
| 57 | +== Overview |
| 58 | + |
| 59 | +This extension provides new APIs for allocating and deallocating exportable |
| 60 | +device memory in SYCL, and obtaining a handle to that memory which can be used |
| 61 | +in external APIs. This is useful when applications want to share device memory |
| 62 | +with other third-party APIs. |
| 63 | + |
| 64 | +Without the ability to allocate exportable memory and obtain an interoperable |
| 65 | +handle, applications would have to copy device memory allocated by one API to |
| 66 | +the host, then copy that host memory back to the device in a memory region |
| 67 | +allocated by a second API. If the second API modifies that memory, then this |
| 68 | +process would have to be repeated in the opposite direction in order for the |
| 69 | +first API to see the changes made to that memory. |
| 70 | + |
| 71 | +This extension enables copy-free sharing of SYCL allocated device memory with |
| 72 | +external APIs. |
| 73 | + |
| 74 | +== Specification |
| 75 | + |
| 76 | +=== Feature test macro |
| 77 | + |
| 78 | +This extension provides a feature-test macro as described in the core SYCL |
| 79 | +specification. An implementation supporting this extension must predefine the |
| 80 | +macro `SYCL_EXT_ONEAPI_MEMORY_EXPORT` to one of the values defined in the |
| 81 | +table below. Applications can test for the existence of this macro to |
| 82 | +determine if the implementation supports this feature, or applications can test |
| 83 | +the macro's value to determine which of the extension's features the |
| 84 | +implementation supports. |
| 85 | + |
| 86 | +[frame="none",options="header"] |
| 87 | +|====================== |
| 88 | +|Rev | Description |
| 89 | +|1 | Initial draft of the proposal |
| 90 | +|====================== |
| 91 | + |
| 92 | +=== Querying device support |
| 93 | + |
| 94 | +We provide the following device aspect to query for support of exporting memory. |
| 95 | + |
| 96 | +[frame="none",options="header"] |
| 97 | +|====================== |
| 98 | +|Device descriptor |Description |
| 99 | +|`aspect::ext_oneapi_exportable_device_mem` | Indicates if the device supports |
| 100 | +the allocation of exportable linear memory and exporting that memory to an |
| 101 | +interoperable handle. |
| 102 | +|====================== |
| 103 | + |
| 104 | +=== External Memory Resource Handle Types [[external_mem_res_handles]] |
| 105 | + |
| 106 | +This extension provides an enum `external_mem_handle_type` that defines several |
| 107 | +external memory resource handle types that can be used as interoperable |
| 108 | +handles to import SYCL allocated memory into external APIs. |
| 109 | + |
| 110 | +[_Note:_ Not all of the handle types defined in this enum may be supported for |
| 111 | +exporting memory by the implementation. Currently, the {dpcpp} implementation |
| 112 | +only supports exporting memory with the `opaque_fd` and `win32_nt_handle` handle |
| 113 | +types. This enum is shared with the memory import functionality defined in the |
| 114 | +https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc[sycl_ext_oneapi_bindless_images] |
| 115 | +extension, where more handle types may be supported for importing memory into |
| 116 | +SYCL. |
| 117 | +_{endnote}_] |
| 118 | + |
| 119 | +```c++ |
| 120 | +namespace sycl::ext::oneapi::experimental { |
| 121 | + |
| 122 | +// External memory resource handle types. |
| 123 | +enum class external_mem_handle_type { |
| 124 | + opaque_fd = 0, |
| 125 | + win32_nt_handle = 1, |
| 126 | + win32_nt_dx12_resource = 2, |
| 127 | + dma_buf = 3, |
| 128 | + win32_nt_dx11_resource = 4, |
| 129 | +}; |
| 130 | + |
| 131 | +} |
| 132 | +``` |
| 133 | + |
| 134 | +The `external_mem_handle_type` enum class defines the types of external memory |
| 135 | +resource handles that can be exported by this extension. The `opaque_fd` and |
| 136 | +`win32_nt_handle` values are used during allocation of exportable memory to |
| 137 | +indicate the type of handle that will later be returned by the |
| 138 | +`export_device_mem_handle` function. |
| 139 | + |
| 140 | +The `opaque_fd` handle type corresponds to a POSIX file descriptor, which is |
| 141 | +represented by an `int`. |
| 142 | + |
| 143 | +The `win32_nt_handle` handle type corresponds to a Windows NT handle, which is |
| 144 | +represented by a `void *`. |
| 145 | + |
| 146 | +=== API of the extension |
| 147 | + |
| 148 | +```c++ |
| 149 | + |
| 150 | +namespace sycl::ext::oneapi::experimental { |
| 151 | + |
| 152 | +void *alloc_exportable_device_mem( |
| 153 | + size_t alignment, size_t size, |
| 154 | + external_mem_handle_type externalMemHandleType, |
| 155 | + const sycl::device &syclDevice, const sycl::context &syclContext, |
| 156 | + const property_list& propList = {}); |
| 157 | + |
| 158 | +void *alloc_exportable_device_mem( |
| 159 | + size_t alignment, size_t size, |
| 160 | + external_mem_handle_type externalMemHandleType, |
| 161 | + const sycl::queue &syclQueue, |
| 162 | + const property_list& propList = {}); |
| 163 | +} |
| 164 | +``` |
| 165 | + |
| 166 | +The `alloc_exportable_device_mem` function allocates memory on the device marked |
| 167 | +as having the ability to later export that memory to an external memory resource |
| 168 | +handle. |
| 169 | + |
| 170 | +Memory allocated through this function must only be freed using |
| 171 | +`free_exportable_mem`. Using `sycl::free` to deallocate memory allocated with |
| 172 | +this function results in undefined behavior. |
| 173 | + |
| 174 | +With the exception of the `sycl::free` function from the core SYCL |
| 175 | +specification, pointers to memory allocated through this function may be passed |
| 176 | +to any core SYCL specification API accepting device USM memory pointers. |
| 177 | + |
| 178 | +Memory allocated through this function is only available on device. |
| 179 | + |
| 180 | +Memory allocated through this function has a linear memory layout on the device |
| 181 | +(which is the same as memory allocated by other USM allocation functions like |
| 182 | +`sycl::malloc_device`). |
| 183 | + |
| 184 | +Zero or more properties can be passed in the `propList` parameter via an |
| 185 | +instance of `sycl::property_list`. Currently, this extension does not define |
| 186 | +any properties that can be used with this function, so the `propList` parameter |
| 187 | +is ignored and reserved for future use. |
| 188 | + |
| 189 | +Only two values of `externalMemHandleType` are supported by this extension: |
| 190 | + |
| 191 | +- `external_mem_handle_type::opaque_fd` is supported when the host is a Posix |
| 192 | + compliant operating system. |
| 193 | + |
| 194 | +- `external_mem_handle_type::win32_nt_handle`` is supported when the host is |
| 195 | + Windows. |
| 196 | + |
| 197 | +No other values are supported. This function will throw a `sycl::exception` with |
| 198 | +the `errc::feature_not_supported` code if an unsupported value is passed. |
| 199 | + |
| 200 | +This function will throw a `sycl::exception` with `errc::feature_not_supported` |
| 201 | +if the device `syclDevice` does not have |
| 202 | +`aspect::ext_oneapi_exportable_device_mem`. |
| 203 | + |
| 204 | +This function will throw a `sycl::exception` with the `errc::runtime` code if |
| 205 | +any error occurs while allocating the memory. |
| 206 | + |
| 207 | +```c++ |
| 208 | + |
| 209 | +namespace sycl::ext::oneapi::experimental { |
| 210 | + |
| 211 | +template <external_mem_handle_type ExternalMemHandleType> |
| 212 | +__return_type__ |
| 213 | +export_device_mem_handle(void *deviceMemory, const sycl::device &syclDevice, |
| 214 | + const sycl::context &syclContext); |
| 215 | + |
| 216 | +template <external_mem_handle_type ExternalMemHandleType> |
| 217 | +__return_type__ |
| 218 | +export_device_mem_handle(void *deviceMemory, const sycl::queue &syclQueue); |
| 219 | + |
| 220 | +} |
| 221 | +``` |
| 222 | + |
| 223 | +Constraints: `ExternalMemHandleType` is either |
| 224 | +`external_mem_handle_type::opaque_fd` or |
| 225 | +`external_mem_handle_type::win32_nt_handle`. |
| 226 | + |
| 227 | +When `ExternalMemHandleType` is `external_mem_handle_type::opaque_fd`, the |
| 228 | +`+__return_type__+` is `int`. |
| 229 | + |
| 230 | +When `ExternalMemHandleType` is `external_mem_handle_type::win32_nt_handle`, the |
| 231 | +`+__return_type__+` is `void *`. |
| 232 | + |
| 233 | +The `export_device_mem_handle` function accepts a `void *` representing a device |
| 234 | +allocation made using `alloc_exportable_device_mem`. |
| 235 | + |
| 236 | +The value of `ExternalMemHandleType` must match the value passed to |
| 237 | +`alloc_exportable_device_mem` when the memory was allocated. Passing an |
| 238 | +`ExternalMemHandleType` value that not match the value passed to |
| 239 | +`alloc_exportable_device_mem` results in undefined behavior. |
| 240 | + |
| 241 | +The `syclDevice` and `syclContext` passed to `export_device_mem_handle` must |
| 242 | +match the device and context used when the `deviceMemory` was allocated using |
| 243 | +`alloc_exportable_device_mem`. If a `syclQueue` is passed, it must also be |
| 244 | +associated with the same SYCL device and context used when the memory was |
| 245 | +allocated. |
| 246 | + |
| 247 | +This function will throw a `sycl::exception` with the `errc::runtime` code if |
| 248 | +any error occurs while exporting the memory handle. |
| 249 | + |
| 250 | +[_Note:_ The returned handle may be used to import the SYCL allocated memory |
| 251 | +into an external API, such as Vulkan or DirectX. |
| 252 | +_{endnote}_] |
| 253 | + |
| 254 | +```c++ |
| 255 | + |
| 256 | +namespace sycl::ext::oneapi::experimental { |
| 257 | + |
| 258 | +void free_exportable_mem(void *deviceMemory, |
| 259 | + const sycl::device &syclDevice, |
| 260 | + const sycl::context &syclContext); |
| 261 | + |
| 262 | +void free_exportable_mem(void *deviceMemory, |
| 263 | + const sycl::queue &syclQueue); |
| 264 | +} |
| 265 | +``` |
| 266 | + |
| 267 | +The `free_exportable_mem` function deallocates memory, represented by the |
| 268 | +`void *` parameter, which has been previously allocated through |
| 269 | +`alloc_exportable_device_mem`. |
| 270 | + |
| 271 | +Using `free_exportable_mem` on memory allocated through any function other |
| 272 | +than `alloc_exportable_device_mem` results in undefined behavior. |
| 273 | + |
| 274 | +Using `free_exportable_mem` on a memory region invalidates the handle |
| 275 | +returned by `export_device_mem_handle` for that region. The handle must not be |
| 276 | +used after the memory has been freed. |
| 277 | + |
| 278 | +The `syclDevice` and `syclContext` passed to `free_exportable_mem` must |
| 279 | +match the device and context used when the `deviceMemory` was allocated using |
| 280 | +`alloc_exportable_device_mem`. If a `syclQueue` is passed, it must also be |
| 281 | +associated with the same SYCL device and context used when the memory was |
| 282 | +allocated. |
| 283 | + |
| 284 | +This function will throw a `sycl::exception` with the `errc::runtime` code if |
| 285 | +any error occurs while freeing the memory. |
| 286 | + |
| 287 | +== Issues and Limitations |
| 288 | + |
| 289 | +=== Memory Layout |
| 290 | + |
| 291 | +This extension is currently limited to exporting memory with a linear layout. It |
| 292 | +does not support exporting memory with a non-linear layout, such as the |
| 293 | +"optimal" layout which would have an equivalent in Vulkan as |
| 294 | +`VK_IMAGE_LAYOUT_OPTIMAL`, or in CUDA as `cudaArray`. These "optimal" layouts |
| 295 | +are typically optimized for texture access. |
| 296 | + |
| 297 | +The reason for this limitation is that currently, no backend supported by |
| 298 | +{dpcpp} supports exporting memory with a non-linear layout. This may change in |
| 299 | +the future, and if it does, we could then amend the extension to support |
| 300 | +exporting memory with a non-linear layout. |
| 301 | + |
| 302 | +=== Closing OS Handles |
| 303 | + |
| 304 | +When a call is made to `export_device_mem_handle`, the {dpcpp} implementation |
| 305 | +will internally create an OS specific handle to the memory region. Both CUDA and |
| 306 | +Level Zero allow the user to specify the type of handle to be created. However, |
| 307 | +this is not always respected by the Level Zero driver. For this reason, if the |
| 308 | +user wishes to close the OS handle returned by `export_device_mem_handle` |
| 309 | +without freeing the memory, they must call the appropriate OS specific API to |
| 310 | +close the type of handle returned by the function. |
| 311 | + |
| 312 | +When exporting a file descriptor handle on Linux, our testing has shown that the |
| 313 | +`close` Linux API should work. |
| 314 | + |
| 315 | +On Windows systems, the type of OS handle returned by `export_device_mem_handle` |
| 316 | +may not be an NT handle (e.g. it may be a KMT handle), and therefore the user |
| 317 | +may experience issues when trying to close the handle using the `CloseHandle` |
| 318 | +Windows API. |
| 319 | + |
| 320 | +The issue of closing OS handles returned by `export_device_mem_handle` is |
| 321 | +something we are aware of and want to address in future versions of this |
| 322 | +extension. Once we have a solution, we will update this specification with a |
| 323 | +SYCL API that will close the OS handles returned by `export_device_mem_handle` |
| 324 | +without freeing the memory. |
| 325 | + |
| 326 | +=== Using `sycl::malloc_device ` and `sycl::free` for exportable memory |
| 327 | + |
| 328 | +As this is an initial draft of an experimental extension, we provide explicit |
| 329 | +APIs for the allocation and deallocation of exportable memory. However, there |
| 330 | +is nothing in principle that should prevent this extensions from using |
| 331 | +`sycl::malloc_device` with a `sycl::property` to allocate exportable memory, |
| 332 | +and `sycl::free` to deallocate it. While the implementation of this in {dpcpp} |
| 333 | +would involve minor overhead, it would allow the user to use the same |
| 334 | +allocation and deallocation APIs for both exportable and non-exportable memory. |
| 335 | + |
| 336 | +We are considering this approach for future versions of this extension, but for |
| 337 | +this initial draft we've have decided to provide explicit APIs to simplify the |
| 338 | +implementation and gather early feedback. |
| 339 | + |
| 340 | +=== Querying Supported External Memory Handle Types |
| 341 | + |
| 342 | +Currently, there is no way to query which external memory handle types are |
| 343 | +supported by the implementation. As this is an initial draft of an |
| 344 | +experimental extension intended to gather early feedback, we have not |
| 345 | +implemented this functionality yet. However, we are aware of this limitation |
| 346 | +and plan to address it in future versions of this extension. |
| 347 | + |
| 348 | +== Revision History |
| 349 | + |
| 350 | +[frame="none",options="header"] |
| 351 | +|=============================================================================== |
| 352 | +|Rev |Date | Author | Changes |
| 353 | +|1.0 |2025-07-18 | Przemek Malon | Initial draft |
| 354 | +|=============================================================================== |
0 commit comments