Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
17576dd
[SYCL][Docs] Add SYCLBIN feature and format design document
steffenlarsen Feb 3, 2025
60ff95f
Move to Markdown format
steffenlarsen Feb 4, 2025
b54afa8
Fix tables, links and titles
steffenlarsen Feb 4, 2025
12a6cad
Fix xrefs
steffenlarsen Feb 4, 2025
0aad200
Use link for clang design
steffenlarsen Feb 4, 2025
1277bd9
Address first set of comments
steffenlarsen Feb 5, 2025
1d2b5c8
Remove redundant description
steffenlarsen Feb 5, 2025
73b9c07
Update sycl/doc/design/SYCLBINDesign.md
steffenlarsen Feb 6, 2025
c7c1512
Update sycl/doc/design/SYCLBINDesign.md
steffenlarsen Feb 6, 2025
edca48e
Add kernel names back and fix array types
steffenlarsen Feb 6, 2025
1361d48
Switch to headers-based structure and add property set design document
steffenlarsen Feb 24, 2025
533e901
Address PropertySets.md comments
steffenlarsen Feb 26, 2025
63d0f9a
Address SYCLBIN design comments
steffenlarsen Feb 26, 2025
fbf54ad
Removed unfinished line
steffenlarsen Feb 26, 2025
05481f1
Move alignment and size guarantees
steffenlarsen Mar 3, 2025
f59fcab
Specify offset is in the byte table
steffenlarsen Mar 3, 2025
ad8251c
Be more specific in property sets
steffenlarsen Mar 3, 2025
63aa572
Add new docs to TOC
steffenlarsen Mar 3, 2025
f7e905d
Update sycl/doc/design/SYCLBINDesign.md
steffenlarsen Mar 6, 2025
72f62ac
Add note about whitespaces and minor editorial changes
steffenlarsen Mar 11, 2025
71892bc
Add motivation
steffenlarsen Mar 12, 2025
3d76c2a
Add target and specify arch
steffenlarsen Mar 12, 2025
6e9a6b0
Remove undocumented option
steffenlarsen Mar 14, 2025
ef7f2a2
Update sycl/doc/design/SYCLBINDesign.md
steffenlarsen Mar 26, 2025
0bc59d5
Apply suggestions from code review
steffenlarsen Mar 27, 2025
13fad43
Address comments
steffenlarsen Mar 28, 2025
1ca7ec0
Expand post-processing
steffenlarsen Apr 1, 2025
874c3bf
Specify -fsyclbin being ignored if used with -fsycl-device-only
steffenlarsen Apr 1, 2025
bd71eb5
ignored -> unused
steffenlarsen Apr 1, 2025
e61efb2
Update sycl/doc/design/SYCLBINDesign.md
steffenlarsen Apr 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions sycl/doc/design/SYCLBINDesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# SYCLBIN - A format for separately compiled SYCL device code

This design document details the SYCLBIN binary format used for storing SYCL
device binaries to be loaded dynamically by the SYCL runtime. It also details
how the toolchain produces, links and packages these binaries, as well as how
the SYCL runtime library handles files of this format.

(syclbin_format)=
## SYCLBIN binary format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing and debugging would require some capabilities of searching/extracting information from files of this format.
It would require us to provide such tools/utilities.
I think that making this format based on ELF would allows us to reuse some available utilities like standard gnu packages (readelf, objdump and i.e) and llvm utilities (llvm-objdump and rich LLVM library). LLVM library's support of ELF could be reused in the development as well.

Custom binary format requires custom support in many ways that significantly burdens the development and maintaining. I think we should strive to the generic Offloading Format from LLVM as much as we can.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree with the general sentiment and I am open to the idea of reusing ELF as the format. However, I don't see how this format fits with ELF, as we would just be fitting bogus into a lot of the pre-defined ELF headers and sections. SYCLBIN is not an executable format per-se and to do appropriate linking the linker will have to consider the binary metadata too, which we would have to retrofit into some text section of the ELF file.

For tooling, I could see it, but are there any other tools than llvm-objdump (and readelf and objdump) that we would get "for free" if we used ELF? Even if we do, would users be able to get much out of the metadata without us adding additional functionality to these tool or entirely new tooling? When you mention "rich LLVM library" could you be more specific?

I've previously tried and failed to fit the SYCLBIN format into the existing ELF format in a way that isn't just what the current design is but separated into best-effort chunks in the ELF format, so please, if you have a suggestion of how to structure the format based off ELF, please do explain your thoughts.


The files produced by the new compilation path will follow the format described
in this section. The intention of defining a new format for these is to give
the DPC++ implementation an extendable and lightweight wrapper around the
multiple modules and corresponding metadata captured in the SYCLBIN file.
The content of the SYCLBIN may be contained as an entry in the offloading binary
format produced by the clang-offload-packager, as described in
[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst).

The following illustration gives an overview of how the file format is
structured.

![SYCLBIN binary file format illustration](SYCLBIN_file_format_illustration.svg)

### Header

The header segment appears as the first part of the SYCLBIN file. Like many
other file-formats, it defines a magic number to help identify the format, which
is 0x53594249 (or "SYBI".) Immediately following the magic number is the version
number, which is used by SYCLBIN consumers when parsing data in the rest of the
file.

| Type | Description | Value variable |
| ---------- | ------------------------------------------------------------------ | -------------- |
| `uint32_t` | Magic number. (0x53594249) | |
| `uint32_t` | SYCLBIN version number. | |
| `uint8_t` | `sycl::bundle_state` corresponding to the contents of the SYCLBIN. | |

The `sycl::bundle_state` is an integer with the values as follows:

| `sycl::bundle_state` | Value |
| -------------------- | ----- |
| `input` | 0 |
| `object` | 1 |
| `executable` | 2 |


### Body

Immediately after the header is the body of the SYCLBIN file. The body consists
of a list of abstract modules.

| Type | Description | Value variable |
| ---------- | ------------------------------------------ | -------------- |
| `uint64_t` | Byte size of the list of abstract modules. | `B` |
| `B` | List of abstract modules. | |


#### Abstract module

Each abstract module represents a set of kernels, the corresponding metadata, 0
or more IR modules containing these kernels, and 0 or more native device code
images containing the kernels.

| Type | Description | Value variable |
| ---------- | ----------------------------------------------- | -------------- |
| `uint64_t` | Byte size of the list of the metadata. | `M` |
| `M` | Module metadata. | |
| `uint64_t` | Byte size of list of IR modules. | `IR` |
| `IR` | List of IR modules. | |
| `uint64_t` | Byte size of list of native device code images. | `ND` |
| `ND` | List of native device code images. | |


##### Module metadata

The module metadata contains the following information about the contents of the
module.

| Type | Description | Value variable |
| ---------- | -------------------------------------------------------------- | -------------- |
| `uint32_t` | Byte size of the list of kernel names. | `K` |
| `K` | List of kernel names. (String list) | |
| `uint32_t` | Byte size of the list of imported symbols. | `I` |
| `I` | List of imported symbols. (String list) | |
| `uint32_t` | Byte size of the list of exported symbols. | `E` |
| `E` | List of exported symbols. (String list) | |
| `uint32_t` | Byte size of property set data. | `P` |
| `P` | Property set data. | |


*NOTE:* Optional features used is embedded in the property set data.
*TODO:* Consolidate and/or document the property set data in this document.

##### String list

A string list in this binary format consists of a `uint32_t` at the beginning
containing the number of elements in the list, followed by that number of
entries with the format:

| Type | Description | Value variable |
| ---------- | ------------------------ | -------------- |
| `uint32_t` | Byte size of the string. | `S` |
| `S` | String bytes. | |


##### IR module

An IR module contains the binary data for the corresponding module compiled to a
given IR representation, identified by the IR type field.

| Type | Description | Value variable |
| ---------- | ------------------------------ | -------------- |
| `uint8_t` | IR type. | |
| `uint32_t` | Byte size of the raw IR bytes. | `IB` |
| `IB` | Raw IR bytes. | |

*TODO:* Do we need a target-specific blob inside this structure? E.g. for CUDA
we may want to embed the SM version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are talking about a specific CUDA property, is it still IR module? I suppose that I don't understand well enough what is PTX and what is its place in a toolchain.

Target-specific for me means "native", i.e. as if PTX is incorrectly assumed as IR module. Also, all IR modules are expected to share the same properties within an abstract module, right? If so, then maybe we should propagate that property up to the abstract module level and have PTX modules compiled for different SM versions as separate abstract modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are talking about a specific CUDA property, is it still IR module?

Because of the forward-compatibility of SM archs, my understanding is that we want PTX to be considered an IR type.

Target-specific for me means "native", i.e. as if PTX is incorrectly assumed as IR module. Also, all IR modules are expected to share the same properties within an abstract module, right? If so, then maybe we should propagate that property up to the abstract module level and have PTX modules compiled for different SM versions as separate abstract modules?

If we were to put the SM architecture information at abstract module level, I don't see how an abstract module would ever have more than one IR module and more than one native device code image. Granted, having the exact same properties is somewhat rare, but I would expect it to be the case if the user was to compile for multiple SM versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it seems like we would probably want to annotate the IR module with the CUDA virtual architecture in the case when the IR is PTX. I was thinking that we would use the IR-level metadata for stuff like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a "target" field to the IR module metadata which contains the value of -fsycl-targets used for the given module, similar to "arch" in native device code images, with the exception that this key may be missing from the metadata in the case that the option wasn't specified, in which case the IR type should be enough to infer from.



##### IR types

The IR types must be one of the following values:

| IR type | Value |
| ------- | ----- |
| SPIR-V | 0 |
| PTX | 1 |
| AMDGCN | 2 |


##### Native device code image

An native device code image contains the binary data for the corresponding
module AOT compiled for a specific device, identified by the architecture
string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to specify what is the architecture string here.

Is it target triple? Is it value passed to -fsycl-targets? Is it value from architecture enum from our device architecture extension?

It is not clear how RT can use this field without such specifiation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually a good question. I am not sure what the architecture string would be for cases like SASS binaries. For example, lets say we've compiled to PTX through our compiler, then load that to a kernel-bundle, compile that kernel bundle to native device code and then serialize that to SYCLBIN. The -fsycl-targets would not be enough to express the architecture here, I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the -fsycl-targets values.

I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.

In any case, I agree with @AlexeySachkov. I think the set of possible architectures should be specified in the file format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the -fsycl-targets values.

I will have to do some research here. I know PTX can be associated with SM architectures, but I don't know if the same applies to the native device code produced from PTX. It may be device-specific and as such more strict than the SM version.

I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.

Since the compiler will need to know about these architectures too, I am reluctant to try and match enum values between the runtime and library for this purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've specified that the "arch" value is the string that corresponds to the -fsycl-targets value used for the binary. The runtime should generally be able to convert that to the enums in sycl_ext_oneapi_device_architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've specified that the "arch" value is the string that corresponds to the -fsycl-targets value used for the binary. The runtime should generally be able to convert that to the enums in sycl_ext_oneapi_device_architecture.

The approach works for me, but I feel like the clarification is actually missing from the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've added the reference to the SYCL extension with a note about the runtime attempting to make the appropriate mapping.


| Type | Description | Value variable |
| ---------- | ------------------------------------------------ | -------------- |
| `uint32_t` | Byte size of the architecture string. | `A` |
| `A` | Architecture string. | |
| `uint32_t` | Byte size of the native device code image bytes. | `NB` |
| NB | Native device code image bytes. | |


### SYCLBIN version changelog

The SYCLBIN format is subject to change, but any such changes must come with an
increment to the version number in the header and a subsection to this section
describing the change.

#### Version 1

* Initial version of the layout.


## Clang driver changes

The clang driver needs to accept the following new flags:

<table>
<tr>
<th>Option</th>
<th>Description</th>
</tr>
<tr>
<td>`-fsyclbin`</td>
<td>
If this option is set, the output of the invocation is a SYCLBIN file with the
.syclbin file extension. This skips the host-compilation invocation of the typical
`-fsycl` pipeline, instead passing the output of the clang-offloat-packager
invocation to clang-linker-wrapper together with the new `--syclbin` flag.

Setting this option will override `-fsycl` and `-fsycl-device-only`.

This option currently requires `--offload-new-driver` to be set.
</td>
</tr>
<tr>
<td>`--offload-ir`</td>
<td>*TODO*</td>
</tr>
<tr>
<td>`--offload-rdc`</td>
<td>This is an alias of `-fgpu-rdc`.</td>
</tr>
</table>

Additionally, `-fsycl-link` should work with .syclbin files. Semantics of how
SYCLBIN files are linked together is yet to be specified.


## clang-linker-wrapper changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in the process of moving most of the SYCL specific functionality from clang-linker-wrapper into a new tool called clang-sycl-linker. So, this documentation will need to be updated based on that. For the purposes of this PR, we can use clang-linker-wrapper.
Just heads up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up! Would it make sense to change it now? From a documentation POV, is it as simple as a search-and-replace or is there an important semantic difference between the tools?


The clang-linker-wrapper is responsible for doing post-processing and linking of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by post-processing here? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-splitting and metadata analysis/extraction mainly. I've expanded it a bit.

device binaries, as described in [OffloadDesign.md](OffloadDesign.md).
However, to support SYCLBIN files, the clang-linker-wrapper must be able to
unpack an offload binary (as described in
[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst))
directly, instead of extracting it from a host binary. This should be done when
a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is
Comment on lines +252 to +253
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, .syclbin files cannot be used if the output is not .syclbin, right?

I'm not sure if I have a use case for that, just wanted to double-check the intent.

A potential use-case, though, is ability to embed .syclbin into an application as if that device code was originally compiled as part of the application. I.e. you had your dynamically loadable .syclbin, but at some point decided to embed it and stop shipping it separately. But that will have some implications on the API, I assume: we need to design then how to use such embedded SYCLBIN.

responsible to package the resulting device binaries and produced metadata into
the format described in [SYCLBIN binary format section](#syclbin_format).
Additionally, in this case the clang-linker-wrapper will skip the wrapping of
the device code and the host code linking stage, as there is no host code to
wrap the device code in and link.

*TODO:* Describe the details of linking SYCLBIN files.


## SYCL runtime library changes

Using the interfaces from the
[sycl_ext_oneapi_syclbin](../extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc)
extension, the runtime must be able to parse the SYCLBIN format, as described in
the [SYCLBIN binary format section](#syclbin_format). To avoid large amounts of
code duplication, the runtime uses the implementation of SYCLBIN reading and
writing implemented in LLVM.

When creating a `kernel_bundle` from a SYCLBIN file, the runtime reads the
contents of the SYCLBIN file and creates the corresponding data structure from
it. In order for the SYCL runtime library's existing logic to use the binaries,
the runtime then creates a collection of `sycl_device_binary_struct` objects and
its constituents, pointing to the data in the parsed SYCLBIN object. Passing
these objects to the runtime library's `ProgramManager` allows it to reuse the
logic for compiling, linking and building SYCL binaries.

In the other direction, users can request the "contents" of a `kernel_bundle`.
When this is done, the runtime library must ensure that a SYCLBIN file is
available for the contents of the `kernel_bundle` and must then write the
SYCLBIN object to the corresponding binary representation in the format
described in the [SYCLBIN binary format section](#syclbin_format). In cases
where the `kernel_bundle` was created with a SYCLBIN file, the SYCLBIN
representation is immediately available and can be serialized directly. In other
cases, the runtime library creates a new SYCLBIN object from the binaries
associated with the `kernel_bundle`, then serializes it and returns the result.

1 change: 1 addition & 0 deletions sycl/doc/design/SYCLBIN_file_format_illustration.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading