-
Notifications
You must be signed in to change notification settings - Fork 794
[SYCL][Docs] Add SYCLBIN feature and format design document #16872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
17576dd
60ff95f
b54afa8
12a6cad
0aad200
1277bd9
1d2b5c8
73b9c07
c7c1512
edca48e
1361d48
533e901
63d0f9a
fbf54ad
05481f1
f59fcab
ad8251c
63aa572
f7e905d
72f62ac
71892bc
3d76c2a
6e9a6b0
ef7f2a2
0bc59d5
13fad43
1ca7ec0
874c3bf
bd71eb5
e61efb2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,237 @@ | ||
| # SYCLBIN - A format for separately compiled SYCL device code | ||
|
|
||
| This design document details the SYCLBIN binary format used for storing SYCL | ||
| device binaries to be loaded dynamically by the SYCL runtime. It also details | ||
| how the toolchain produces, links and packages these binaries, as well as how | ||
| the SYCL runtime library handles files of this format. | ||
|
|
||
| (syclbin_format)= | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ## SYCLBIN binary format | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Testing and debugging would require some capabilities of searching/extracting information from files of this format. Custom binary format requires custom support in many ways that significantly burdens the development and maintaining. I think we should strive to the generic Offloading Format from LLVM as much as we can. What do you think?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do agree with the general sentiment and I am open to the idea of reusing ELF as the format. However, I don't see how this format fits with ELF, as we would just be fitting bogus into a lot of the pre-defined ELF headers and sections. SYCLBIN is not an executable format per-se and to do appropriate linking the linker will have to consider the binary metadata too, which we would have to retrofit into some text section of the ELF file. For tooling, I could see it, but are there any other tools than I've previously tried and failed to fit the SYCLBIN format into the existing ELF format in a way that isn't just what the current design is but separated into best-effort chunks in the ELF format, so please, if you have a suggestion of how to structure the format based off ELF, please do explain your thoughts. |
||
|
|
||
| The files produced by the new compilation path will follow the format described | ||
| in this section. The intention of defining a new format for these is to give | ||
| the DPC++ implementation an extendable and lightweight wrapper around the | ||
| multiple modules and corresponding metadata captured in the SYCLBIN file. | ||
| The content of the SYCLBIN may be contained as an entry in the offloading binary | ||
| format produced by the clang-offload-packager, as described in | ||
| [ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst). | ||
|
|
||
| The following illustration gives an overview of how the file format is | ||
| structured. | ||
|
|
||
|  | ||
gmlueck marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Header | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The header segment appears as the first part of the SYCLBIN file. Like many | ||
| other file-formats, it defines a magic number to help identify the format, which | ||
| is 0x53594249 (or "SYBI".) Immediately following the magic number is the version | ||
| number, which is used by SYCLBIN consumers when parsing data in the rest of the | ||
| file. | ||
|
|
||
| | Type | Description | Value variable | | ||
| | ---------- | ------------------------------------------------------------------ | -------------- | | ||
| | `uint32_t` | Magic number. (0x53594249) | | | ||
| | `uint32_t` | SYCLBIN version number. | | | ||
| | `uint8_t` | `sycl::bundle_state` corresponding to the contents of the SYCLBIN. | | | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
cperkinsintel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| The `sycl::bundle_state` is an integer with the values as follows: | ||
|
|
||
| | `sycl::bundle_state` | Value | | ||
| | -------------------- | ----- | | ||
| | `input` | 0 | | ||
| | `object` | 1 | | ||
| | `executable` | 2 | | ||
|
|
||
|
|
||
| ### Body | ||
|
|
||
| Immediately after the header is the body of the SYCLBIN file. The body consists | ||
| of a list of abstract modules. | ||
|
|
||
| | Type | Description | Value variable | | ||
| | ---------- | ------------------------------------------ | -------------- | | ||
| | `uint64_t` | Byte size of the list of abstract modules. | `B` | | ||
| | `B` | List of abstract modules. | | | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| #### Abstract module | ||
|
|
||
| Each abstract module represents a set of kernels, the corresponding metadata, 0 | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| or more IR modules containing these kernels, and 0 or more native device code | ||
| images containing the kernels. | ||
|
|
||
| | Type | Description | Value variable | | ||
| | ---------- | ----------------------------------------------- | -------------- | | ||
| | `uint64_t` | Byte size of the list of the metadata. | `M` | | ||
| | `M` | Module metadata. | | | ||
| | `uint64_t` | Byte size of list of IR modules. | `IR` | | ||
| | `IR` | List of IR modules. | | | ||
| | `uint64_t` | Byte size of list of native device code images. | `ND` | | ||
| | `ND` | List of native device code images. | | | ||
|
|
||
|
|
||
| ##### Module metadata | ||
|
|
||
| The module metadata contains the following information about the contents of the | ||
| module. | ||
|
|
||
| | Type | Description | Value variable | | ||
| | ---------- | -------------------------------------------------------------- | -------------- | | ||
| | `uint32_t` | Byte size of the list of kernel names. | `K` | | ||
| | `K` | List of kernel names. (String list) | | | ||
| | `uint32_t` | Byte size of the list of imported symbols. | `I` | | ||
| | `I` | List of imported symbols. (String list) | | | ||
| | `uint32_t` | Byte size of the list of exported symbols. | `E` | | ||
| | `E` | List of exported symbols. (String list) | | | ||
| | `uint32_t` | Byte size of property set data. | `P` | | ||
| | `P` | Property set data. | | | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| *NOTE:* Optional features used is embedded in the property set data. | ||
| *TODO:* Consolidate and/or document the property set data in this document. | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ##### String list | ||
|
|
||
| A string list in this binary format consists of a `uint32_t` at the beginning | ||
| containing the number of elements in the list, followed by that number of | ||
| entries with the format: | ||
|
|
||
| | Type | Description | Value variable | | ||
| | ---------- | ------------------------ | -------------- | | ||
| | `uint32_t` | Byte size of the string. | `S` | | ||
| | `S` | String bytes. | | | ||
|
|
||
|
|
||
| ##### IR module | ||
|
|
||
| An IR module contains the binary data for the corresponding module compiled to a | ||
| given IR representation, identified by the IR type field. | ||
|
|
||
| | Type | Description | Value variable | | ||
| | ---------- | ------------------------------ | -------------- | | ||
| | `uint8_t` | IR type. | | | ||
| | `uint32_t` | Byte size of the raw IR bytes. | `IB` | | ||
| | `IB` | Raw IR bytes. | | | ||
|
|
||
| *TODO:* Do we need a target-specific blob inside this structure? E.g. for CUDA | ||
| we may want to embed the SM version. | ||
|
||
|
|
||
|
|
||
| ##### IR types | ||
|
|
||
| The IR types must be one of the following values: | ||
|
|
||
| | IR type | Value | | ||
| | ------- | ----- | | ||
| | SPIR-V | 0 | | ||
| | PTX | 1 | | ||
| | AMDGCN | 2 | | ||
|
|
||
|
|
||
| ##### Native device code image | ||
|
|
||
| An native device code image contains the binary data for the corresponding | ||
| module AOT compiled for a specific device, identified by the architecture | ||
| string. | ||
|
||
|
|
||
| | Type | Description | Value variable | | ||
| | ---------- | ------------------------------------------------ | -------------- | | ||
| | `uint32_t` | Byte size of the architecture string. | `A` | | ||
| | `A` | Architecture string. | | | ||
| | `uint32_t` | Byte size of the native device code image bytes. | `NB` | | ||
| | NB | Native device code image bytes. | | | ||
|
|
||
|
|
||
| ### SYCLBIN version changelog | ||
|
|
||
| The SYCLBIN format is subject to change, but any such changes must come with an | ||
| increment to the version number in the header and a subsection to this section | ||
| describing the change. | ||
|
|
||
| #### Version 1 | ||
|
|
||
| * Initial version of the layout. | ||
|
|
||
|
|
||
| ## Clang driver changes | ||
|
|
||
| The clang driver needs to accept the following new flags: | ||
|
|
||
| <table> | ||
| <tr> | ||
| <th>Option</th> | ||
| <th>Description</th> | ||
| </tr> | ||
| <tr> | ||
| <td>`-fsyclbin`</td> | ||
| <td> | ||
| If this option is set, the output of the invocation is a SYCLBIN file with the | ||
| .syclbin file extension. This skips the host-compilation invocation of the typical | ||
| `-fsycl` pipeline, instead passing the output of the clang-offloat-packager | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| invocation to clang-linker-wrapper together with the new `--syclbin` flag. | ||
mdtoguchi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Setting this option will override `-fsycl` and `-fsycl-device-only`. | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| This option currently requires `--offload-new-driver` to be set. | ||
steffenlarsen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td>`--offload-ir`</td> | ||
| <td>*TODO*</td> | ||
| </tr> | ||
| <tr> | ||
| <td>`--offload-rdc`</td> | ||
| <td>This is an alias of `-fgpu-rdc`.</td> | ||
mdtoguchi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| </tr> | ||
| </table> | ||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Additionally, `-fsycl-link` should work with .syclbin files. Semantics of how | ||
| SYCLBIN files are linked together is yet to be specified. | ||
|
|
||
|
|
||
| ## clang-linker-wrapper changes | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are in the process of moving most of the SYCL specific functionality from clang-linker-wrapper into a new tool called clang-sycl-linker. So, this documentation will need to be updated based on that. For the purposes of this PR, we can use clang-linker-wrapper.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the heads up! Would it make sense to change it now? From a documentation POV, is it as simple as a search-and-replace or is there an important semantic difference between the tools? |
||
|
|
||
| The clang-linker-wrapper is responsible for doing post-processing and linking of | ||
|
||
| device binaries, as described in [OffloadDesign.md](OffloadDesign.md). | ||
| However, to support SYCLBIN files, the clang-linker-wrapper must be able to | ||
| unpack an offload binary (as described in | ||
| [ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst)) | ||
| directly, instead of extracting it from a host binary. This should be done when | ||
| a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is | ||
|
Comment on lines
+252
to
+253
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, I'm not sure if I have a use case for that, just wanted to double-check the intent. A potential use-case, though, is ability to embed |
||
| responsible to package the resulting device binaries and produced metadata into | ||
| the format described in [SYCLBIN binary format section](#syclbin_format). | ||
| Additionally, in this case the clang-linker-wrapper will skip the wrapping of | ||
| the device code and the host code linking stage, as there is no host code to | ||
| wrap the device code in and link. | ||
|
|
||
| *TODO:* Describe the details of linking SYCLBIN files. | ||
|
|
||
|
|
||
| ## SYCL runtime library changes | ||
|
|
||
| Using the interfaces from the | ||
| [sycl_ext_oneapi_syclbin](../extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc) | ||
| extension, the runtime must be able to parse the SYCLBIN format, as described in | ||
| the [SYCLBIN binary format section](#syclbin_format). To avoid large amounts of | ||
| code duplication, the runtime uses the implementation of SYCLBIN reading and | ||
| writing implemented in LLVM. | ||
|
|
||
| When creating a `kernel_bundle` from a SYCLBIN file, the runtime reads the | ||
| contents of the SYCLBIN file and creates the corresponding data structure from | ||
| it. In order for the SYCL runtime library's existing logic to use the binaries, | ||
| the runtime then creates a collection of `sycl_device_binary_struct` objects and | ||
| its constituents, pointing to the data in the parsed SYCLBIN object. Passing | ||
| these objects to the runtime library's `ProgramManager` allows it to reuse the | ||
| logic for compiling, linking and building SYCL binaries. | ||
|
|
||
| In the other direction, users can request the "contents" of a `kernel_bundle`. | ||
| When this is done, the runtime library must ensure that a SYCLBIN file is | ||
| available for the contents of the `kernel_bundle` and must then write the | ||
| SYCLBIN object to the corresponding binary representation in the format | ||
| described in the [SYCLBIN binary format section](#syclbin_format). In cases | ||
| where the `kernel_bundle` was created with a SYCLBIN file, the SYCLBIN | ||
| representation is immediately available and can be serialized directly. In other | ||
| cases, the runtime library creates a new SYCLBIN object from the binaries | ||
| associated with the `kernel_bundle`, then serializes it and returns the result. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.