Skip to content
Open
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
eec1fe5
[SYCL][Docs] Add sycl_ext_oneapi_inter_process_communication
steffenlarsen Sep 4, 2025
e4aefbf
Set ptracer permission in test
steffenlarsen Sep 15, 2025
7c85049
Document known issues
steffenlarsen Sep 15, 2025
6debadb
Disable on Windows L0
steffenlarsen Sep 15, 2025
d3b6c56
Add tracker
steffenlarsen Sep 16, 2025
29cbc37
Open handle directly from data
steffenlarsen Sep 17, 2025
baa805f
Adjust for UR changes
steffenlarsen Sep 17, 2025
7170da9
Address source-check issues
steffenlarsen Sep 17, 2025
ce89cfe
Address more source checks
steffenlarsen Sep 17, 2025
49ce33d
Add missing symbols
steffenlarsen Sep 17, 2025
c3f139b
Remove known issue
steffenlarsen Sep 18, 2025
d1ba0de
Add put explicit internal release arg
steffenlarsen Sep 22, 2025
4070634
Adjust sycl impl
steffenlarsen Sep 22, 2025
5d5dc0f
Merge remote-tracking branch 'intel/sycl' into ext_oneapi_inter_proce…
steffenlarsen Sep 22, 2025
db03d7b
Remove unused param
steffenlarsen Sep 22, 2025
0075694
Add missing Windows symbol
steffenlarsen Sep 22, 2025
7a332b9
Add missing newline
steffenlarsen Sep 24, 2025
ce651ce
Merge remote-tracking branch 'intel/sycl' into ext_oneapi_inter_proce…
steffenlarsen Sep 24, 2025
acf286f
Change IPC API (SYCL)
steffenlarsen Oct 6, 2025
fa44def
Change IPC API (UR)
steffenlarsen Oct 6, 2025
7873e13
Fix formatting
steffenlarsen Oct 6, 2025
2001baa
Fix failures
steffenlarsen Oct 6, 2025
3555827
Merge remote-tracking branch 'intel/sycl' into ext_oneapi_inter_proce…
steffenlarsen Oct 13, 2025
68ea80e
Address feedback
steffenlarsen Oct 13, 2025
d1d9a1a
Fix unittest
steffenlarsen Oct 13, 2025
01e3a65
Switch to std::byte
steffenlarsen Oct 13, 2025
3bf5099
Fix include
steffenlarsen Oct 13, 2025
ebdc9e3
Fix tests
steffenlarsen Oct 14, 2025
02e9613
Fix stylistic issues
steffenlarsen Oct 14, 2025
eeb0a8a
Make device lookup when failing
steffenlarsen Oct 15, 2025
0103ab4
Merge remote-tracking branch 'intel/sycl' into ext_oneapi_inter_proce…
steffenlarsen Oct 16, 2025
0f3b66f
Fix typo and add shortcut
steffenlarsen Oct 17, 2025
31aa6a3
Address code comments
steffenlarsen Oct 17, 2025
33d34a9
Address spec comments
steffenlarsen Oct 17, 2025
a1a7945
Clarify put result
steffenlarsen Oct 17, 2025
3bc26d0
Add note about matching open and close
steffenlarsen Oct 17, 2025
6cb2cf3
Add note about validity of opened ptrs after put
steffenlarsen Oct 17, 2025
7883b93
Clarify last comment a little more
steffenlarsen Oct 17, 2025
4410d55
Address newest comments
steffenlarsen Oct 17, 2025
63315d9
Make arguments const and make work-around
steffenlarsen Oct 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion llvm/include/llvm/SYCLLowerIR/DeviceConfigFile.td
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ def Aspectext_oneapi_clock_sub_group : Aspect<"ext_oneapi_clock_sub_group">;
def Aspectext_oneapi_clock_work_group : Aspect<"ext_oneapi_clock_work_group">;
def Aspectext_oneapi_clock_device : Aspect<"ext_oneapi_clock_device">;
def Aspectext_oneapi_is_integrated_gpu : Aspect<"ext_oneapi_is_integrated_gpu">;
def Aspectext_oneapi_ipc_memory : Aspect<"ext_oneapi_ipc_memory">;

// Deprecated aspects
def AspectInt64_base_atomics : Aspect<"int64_base_atomics">;
Expand Down Expand Up @@ -176,7 +177,8 @@ def : TargetInfo<"__TestAspectList",
Aspectext_oneapi_clock_sub_group,
Aspectext_oneapi_clock_work_group,
Aspectext_oneapi_clock_device,
Aspectext_oneapi_is_integrated_gpu],
Aspectext_oneapi_is_integrated_gpu,
Aspectext_oneapi_ipc_memory],
[]>;
// This definition serves the only purpose of testing whether the deprecated aspect list defined in here and in SYCL RT
// match.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,359 @@
= sycl_ext_oneapi_inter_process_communication

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]
:endnote: &#8212;{nbsp}end{nbsp}note

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2025 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

:khr-default-context: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:khr-default-context

This extension is written against the SYCL 2020 revision 10 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

[_Note:_ The APIs in this extension uses the concept of a per-platform
default context as specified in section 4.6.2 "Platform class" of the core SYCL
specification.
As a convenience, this extension specification describes the behavior of its
APIs by using the `khr_get_default_context` function from {khr-default-context}[
sycl_khr_default_context], however there is no true dependency on that
extension.
An implementation could still implement
sycl_ext_oneapi_inter_process_communication even without implementing
sycl_khr_default_context because the core SYCL specification still requires
there to be a per-platform default context even if the core SYCL specification
does not provide a convenient way to get it.
_{endnote}_]


== Status

This is an experimental extension specification, intended to provide early
access to features and gather community feedback. Interfaces defined in this
specification are implemented in {dpcpp}, but they are not finalized and may
change incompatibly in future versions of {dpcpp} without prior notice.
*Shipping software products should not rely on APIs defined in this
specification.*


== Overview

This extension adds the ability for SYCL programs to share device USM memory
allocations between processes. This is done by the allocating process creating
a new IPC memory handle through the new free functions and transferring the
returned handle data to the other processes. The other processes can use the
handle data to retrieve the corresponding device USM memory.


== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro `SYCL_EXT_ONEAPI_INTER_PROCESS_COMMUNICATION` to one of the values defined
in the table below. Applications can test for the existence of this macro to
determine if the implementation supports this feature, or applications can test
the macro's value to determine which of the extension's features the
implementation supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|The APIs of this experimental extension are not versioned, so the
feature-test macro always has this value.
|===

=== Extension to `enum class aspect`

[source]
----
namespace sycl {
enum class aspect {
...
ext_oneapi_ipc_memory
}
}
----

If a SYCL device has this aspect, that device supports the `get` and `open`
functions specified in the following section.


=== Inter-process communicable memory


This extension adds new free functions under the `ipc_memory` experimental
namespace.

```
namespace sycl::ext::oneapi::experimental::ipc_memory {

using handle_data_t = std::vector<std::byte>;

handle_data_t get(void *ptr, const sycl::context &ctx);

handle_data_t get(void *ptr);

void put(handle_data_t &handle_data, const sycl::context &ctx);

void put(handle_data_t &handle_data);

void *open(handle_data_t &handle_data, const sycl::context &ctx,
const sycl::device &dev);

void *open(handle_data_t &handle_data, const sycl::device &dev);

void *open(handle_data_t &handle_data);

void close(void *ptr, const sycl::context &ctx);

void close(void *ptr);

}
```

|====
a|
[frame=all,grid=none]
!====
a!
[source]
----
handle_data_t get(void *ptr, const sycl::context &ctx)
----
!====

_Preconditions:_ `ptr` is a pointer to USM device memory on some device _D_, and
`ctx` is the same context as `ptr` was allocated against, using the USM device
memory allocation routines.

_Returns:_ An IPC "handle" to this USM memory allocation. The bytes of this
handle can be transferred to another process on the same system, and the other
process can use the handle to get a pointer to the same USM allocation through a
call to the `open` function.

_Throws:_ An exception with the `errc::feature_not_supported` error code if
device _D_ does not have `aspect::ext_oneapi_ipc_memory`.

!====
a!
[source]
----
handle_data_t get(void *ptr)
----
!====

_Effects_: Equivalent to:

[source,c++,indent=2]
----
sycl::device d;
sycl::context ctxt = d.get_platform().khr_get_default_context();
return ipc_memory::get(ptr, ctxt);
----

!====
a!
[source]
----
void put(handle_data_t &handle_data, const sycl::context &ctx)
----
!====

_Preconditions:_ `handle_data` is the IPC "handle" to USM device memory that was
returned from a call to `get` either in this process or in some other process on
the same system. The USM device memory has not yet been freed in this process.

_Effects:_ Deallocates resources associated with the handle. These resources are
automatically deallocated when the USM device memory is freed, so it is not
strictly necessary to call the `put` function. After the resources associated
with the handle have been deallocated, i.e. through a call to the `put` function
or through freeing the USM device memory, the handle data is invalid and using
it in the `put` and `open` functions will result in undefined behavior.

[_Note:_ Any pointers retrieved through a call to the `open` function in any
process on the system will still be valid after a call to the `put` function and
must still be freed through calls to the `close` function.
_{endnote}_]

!====
a!
[source]
----
void put(handle_data_t &handle_data)
----
!====

_Effects_: Equivalent to:

[source,c++,indent=2]
----
sycl::device d;
sycl::context ctxt = d.get_platform().khr_get_default_context();
ipc_memory::put(handle_data, ctxt);
----

!====
a!
[source]
----
void *open(handle_data_t &handle_data, const sycl::context &ctx,
const sycl::device &dev)
----
!====

_Preconditions:_ `handle_data` is the IPC "handle" to USM device memory that was
returned from a call to the `get` function either in this process or in some
other process on the same system. That USM device memory is accessible on device
`dev`.

_Returns:_ A pointer to the same USM device memory represented by `handle_data`.
The returned pointer is associated with context `ctx`. It can be used wherever a
USM device pointer for device `dev` and context `ctx` is expected, except it
cannot be passed to `sycl::free`. Instead, use the `close` function to free this
memory pointer.

[_Note:_ The `open` function can be called multiple times on the same handle
within the same process. The number of calls to the `close` function must be
equal to the number of calls to the `open` function to free the memory pointer.
_{endnote}_]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at the Level Zero documentation, I saw that it says this:

Multiple calls to this function with the same IPC handle will return unique pointers

Therefore, I think we should change this note to say:

[Note: The open function can be called multiple times on the same handle
within the same process. Each call to open may return a unique pointer value
even for the same handle, therefore each call to open must have a matching call
to close.
{endnote}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinser52 - Does UMF let the calls go through to Level Zero each time, or will repeat calls to open return the same UMF handle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter what UMF currently does? Even if UMF "squashes" the repeated zeMemOpenIpcHandle calls now, it might not in the future. The proposed wording gives us the freedom to implement it either way.


[_Note:_ The pointer returned from a call to the `open` function is no longer
valid if the associated USM device memory is freed through a call to the
`sycl::free` function.
_{endnote}_]

_Throws:_

* An exception with the `errc::feature_not_supported` error code if device
`dev` does not have `aspect::ext_oneapi_ipc_memory`.
* An exception with the `errc::invalid` error code if the handle data
`handle_data` has an unexpected number of bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the number of bytes is correct, but the content is garbage? Can we broaden this exception like this:

  • An exception with the errc::invalid error code if the handle data handle_data does not represent a valid IPC handle for USM memory on this host system.

That would also cover the case when the handle had the wrong number of bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reluctant to promise an error code if handle data is corrupt. The byte size is something we can check at a SYCL/UR level, but the validity of the data is up to UMF to decide and the API documentation doesn't specify which error is returned if the data is off. Maybe a precondition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should push this requirement down to the UMF and down to Level Zero if necessary. They should be able to define an error code for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open an issue against UMF, but I think it might be tricky. What happens if the data looks correct to UMF, but using it causes UB or invalid access in the implementation? I suppose that's something they can look into though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UMF shouldn't look at the handle content at all. It should just pass it to the underlying driver API, and that API should return an error code if the handle content is invalid. This assumes that each driver is able to diagnose an error in such a case, but that seems quite reasonable to me. For example, cudaIpcOpenMemHandle is defined to return cudaErrorInvalidResourceHandle in this case. I don't see an error code listed for Level Zero, but that seems like a bug. They should be able to return an error code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some wrapper information in the UMF IPC handle, but I don't know the exact details. Maybe @vinser52 has some insight into this?


!====
a!
[source]
----
void *open(handle_data_t &handle_data, const sycl::device &dev)
----
!====

_Effects_: Equivalent to:

[source,c++,indent=2]
----
sycl::context ctxt = dev.get_platform().khr_get_default_context();
return ipc_memory::put(handle_data, ctxt, dev);
----

!====
a!
[source]
----
void *open(handle_data_t &handle_data, const sycl::context &ctx,
const sycl::device &dev)
----
!====

_Effects_: Equivalent to:

[source,c++,indent=2]
----
sycl::device d;
sycl::context ctxt = d.get_platform().khr_get_default_context();
return ipc_memory::open(handle_data, ctxt, d);
----

!====
a!
[source]
----
void close(void *ptr, const sycl::context &ctx)
----
!====

_Precondition:_ `ptr` was previously returned from a call to the `open` function
in this same process, where `ctx` was passed as the context. This `ptr` value
has not yet been closed by calling the `close` function.

_Effects:_ Closes a device USM pointer previously returned by a call to
the `open` function.

!====
a!
[source]
----
void close(void *ptr)
----
!====

_Effects_: Equivalent to:

[source,c++,indent=2]
----
sycl::device d;
sycl::context ctxt = d.get_platform().khr_get_default_context();
ipc_memory::close(ptr, ctxt);
----

|====


== Issues

=== Level Zero file descriptor duplication dependency

The IPC memory APIs in Level Zero on Linux currently requires the ability to
duplicate file descriptors between processes. For security this is not allowed
by default on Linux-based systems, so in order for the IPC memory APIs to work
with Level Zero on Linux the user must either call `prctl(PR_SET_PTRACER, ...)`
in the IPC handle owner process or enable the functionality globally using

```bash
sudo bash -c "echo 0 > /proc/sys/kernel/yama/ptrace_scope"
```

See also https://github.com/oneapi-src/unified-memory-framework/tree/main?tab=readme-ov-file#level-zero-memory-provider.


=== Level Zero IPC memory Windows support

The new IPC memory APIs are not currently supported on the Level Zero backend on
Windows systems.

Loading
Loading