|
| 1 | +# [RFC] SYCLBIN - A format for SYCL device code |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +This RFC proposes the addition of SYCLBIN, a new binary format for storing SYCL device code. The format provides a lightweight, extensible wrapper around device modules and their corresponding SYCL-specific metadata to be produced/consumed by tools/SYCL runtime. |
| 6 | + |
| 7 | +## Purpose of this RFC |
| 8 | + |
| 9 | +This RFC seeks community feedback on the proposed SYCLBIN binary format design, including: |
| 10 | + |
| 11 | +- The binary format specification and layout |
| 12 | +- Toolchain integration approach and new compiler flags |
| 13 | + |
| 14 | +Community input is particularly valuable regarding potential integration challenges with existing LLVM offloading implementations. |
| 15 | + |
| 16 | +## Motivation and alternatives considered |
| 17 | + |
| 18 | +### Metadata unique for SYCL programming model |
| 19 | + |
| 20 | +- [ ] This section needs re-writing... |
| 21 | + |
| 22 | +LLVM offloading infrastructure supports the following binary formats: Object, Bitcode, Cubin, Fatbinary, PTX and SPIRV which could be placed into OffloadBinary format. None of it satisfies the needs of SYCL programming model. |
| 23 | + |
| 24 | +- [ ] Steffen, I need to discuss with you, why other existing formats did not satisfy our needs. I think we need to provide short summary why each format doesn't work for us somewhere in this section. |
| 25 | + |
| 26 | +Specifically, SYCL needs to keep the following metadata necessary for SYCL runtime, which is not supported by any of existing formats: |
| 27 | + |
| 28 | +1. Device target triple (e.g. spirv64_unknown_unknown). |
| 29 | +2. Compiler and linker options to pass to JIT compiler in case of JITing. |
| 30 | +3. List of entry points exposed by an image |
| 31 | +4. Arrays of property sets. |
| 32 | + |
| 33 | +While #1 and #2 can be saved to StringData of OffloadBinary, #3 requires additional handling, since StringData serialization infrastructure assumes that value is a single null-terminated string, so to restore multiple null-terminated strings from StringData format, they need to be concatenated with split symbol and then split during deserialization. |
| 34 | + |
| 35 | +[Property sets](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/PropertySets.md) (#4) would be even more complicated. |
| 36 | + |
| 37 | +### Abstraction: simplify support in offloading tools |
| 38 | + |
| 39 | +Another motivation to add SYCLBIN format is to encapsulate SYCL-specific logic to SYCL-specific parts of toolchain (clang-sycl-linker, SYCL runtime) and hide SYCL specifics from offloading tools intended to support multiple programming models. Without this format, we would need to use the following workflow to pass metadata (#1 - #4) from compiler to runtime: |
| 40 | + |
| 41 | +1. clang-sycl-linker would use OffloadingImage’s StringData to save metadata #1-#4. |
| 42 | +Problem: OffloadingImage’s StringData is not intended for composite objects like arrays or property sets. |
| 43 | +2. clang-linker-wrapper would open OffloadingImages prepared by clang-sycl-linker and generate device image binary descriptor for each image in some format that SYCL runtime could read. |
| 44 | +Problem: clang-linker-wrapper needs to maintain SYCL-specific formats necessary for SYCL runtime, which means unnecessary duplication. |
| 45 | +3. Then SYCL runtime would use this format to decode metadata. |
| 46 | + |
| 47 | +If SYCLBIN is accepted, then the scheme could be simplified, resolving problems highlighted above: |
| 48 | + |
| 49 | +1. clang-sycl-linker would prepare SYCLBIN with all metadata encoded put it inside OffloadingImage as image. |
| 50 | +2. clang-linker-wrapper would generate only host register and unregister calls, but would know nothing about what’s inside SYCLBIN. |
| 51 | +3. SYCL runtime would work with SYCLBIN directly. |
| 52 | + |
| 53 | +### Enable modular dynamic loading of device binaries at runtime |
| 54 | + |
| 55 | +Some applications may want to dynamically load device binaries at runtime, e.g. for modularity and to avoid having to recompile the entire application. To facilitate that SYCLBIN format defines the interface between |
| 56 | +the compiler-produced binaries and the runtime's handling of it. |
| 57 | + |
| 58 | +## Detailed Design |
| 59 | + |
| 60 | +### SYCLBIN binary format |
| 61 | + |
| 62 | +The SYCLBIN format consists of: |
| 63 | + |
| 64 | +- A [file header](#file-header) with magic number (0x53594249 "SYBI") and version information |
| 65 | +- Three lists of headers: the [abstract module header](#abstract-module-header) list, the |
| 66 | +[IR module header](#ir-module-header) list and |
| 67 | +[native device code image header](#native-device-code-image-header) list, |
| 68 | +containing information about the [abstract modules](#abstract-module), |
| 69 | +[IR modules](#ir-module) and |
| 70 | +[native device code images](#native-device-code-image) respectively. |
| 71 | +- Two byte tables containing metadata and binary data |
| 72 | + |
| 73 | +#### File Structure |
| 74 | + |
| 75 | +| | |
| 76 | +| --------------------------------------------------------------------- | |
| 77 | +| [File header](#file-header) | |
| 78 | +| [Abstract module header](#abstract-module-header) 1 | |
| 79 | +| ... | |
| 80 | +| [Abstract module header](#abstract-module-header) N | |
| 81 | +| [IR module header](#ir-module-header) 1 | |
| 82 | +| ... | |
| 83 | +| [IR module header](#ir-module-header) M | |
| 84 | +| [Native device code image header](#native-device-code-image-header) 1 | |
| 85 | +| ... | |
| 86 | +| [Native device code image header](#native-device-code-image-header) L | |
| 87 | +| Metadata byte table | |
| 88 | +| Binary byte table | |
| 89 | + |
| 90 | +The headers and each byte table are all aligned to 8 bytes. The fields in the |
| 91 | +headers use C/C++ type notation, including the fixed-size integer types defined |
| 92 | +in the `<cstdint>` header, and will have the same size and alignment. For |
| 93 | +consistency, all these types use little endian layout. |
| 94 | + |
| 95 | +#### Component Details |
| 96 | + |
| 97 | +- [ ] Do we want to provide that level of details in the RFC, or it is better to clean it up to |
| 98 | +keep only key info and for details provide reference to design document? |
| 99 | + |
| 100 | +##### File header |
| 101 | + |
| 102 | +| Type | Description | |
| 103 | +| ---------- | ----------------------------------------------------------------------------- | |
| 104 | +| `uint32_t` | Magic number. (0x53594249) | |
| 105 | +| `uint32_t` | SYCLBIN version number. | |
| 106 | +| `uint32_t` | Number of abstract modules. | |
| 107 | +| `uint32_t` | Number of IR modules. | |
| 108 | +| `uint32_t` | Number of native device code images. | |
| 109 | +| `uint64_t` | Byte size of the metadata byte table. | |
| 110 | +| `uint64_t` | Byte size of the binary byte table. | |
| 111 | +| `uint64_t` | Byte offset of the global metadata from the start of the metadata byte table. | |
| 112 | +| `uint64_t` | Byte size of the global metadata. | |
| 113 | + |
| 114 | +##### Global metadata |
| 115 | + |
| 116 | +The global metadata entry contains a single property set with the identifying |
| 117 | +name "SYCLBIN/global metadata", as described in the |
| 118 | +[PropertySets.md](PropertySets.md#syclbinglobal-metadata) design document. |
| 119 | + |
| 120 | +##### Abstract module |
| 121 | + |
| 122 | +An abstract module is a collection of device binaries that share properties, |
| 123 | +including, but not limited to: kernel names, imported symbols, exported symbols, |
| 124 | +aspect requirements, and specialization constants. |
| 125 | + |
| 126 | +The device binaries contained inside an abstract module must either be an IR |
| 127 | +module or a native device code image. IR modules contain device binaries in some |
| 128 | +known intermediate representation, such as SPIR-V, while the native device code |
| 129 | +images can be an architecture-specific binary format. There is no requirement |
| 130 | +that all device binaries in an abstract module are usable on the same device or |
| 131 | +are specific to a single vendor. |
| 132 | + |
| 133 | +##### Abstract module header |
| 134 | + |
| 135 | +A abstract module header contains the following fields in the stated order: |
| 136 | + |
| 137 | +| Type | Description | |
| 138 | +| ---------- | ------------------------------------------------------------------------------------------ | |
| 139 | +| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. | |
| 140 | +| `uint64_t` | Byte size of the metadata in the metadata byte table. | |
| 141 | +| `uint32_t` | Number of IR modules. | |
| 142 | +| `uint32_t` | Index of the first IR module header in the IR module header array. | |
| 143 | +| `uint32_t` | Number of native device code images. | |
| 144 | +| `uint32_t` | Index of the first native device code images header native device code image header array. | |
| 145 | + |
| 146 | +##### Abstract module metadata |
| 147 | + |
| 148 | +An abstract module metadata entry contains any number of property sets, as |
| 149 | +described in [PropertySets.md](PropertySets.md), excluding: |
| 150 | + |
| 151 | +- ["SYCLBIN/global metadata"](PropertySets.md#syclbinglobal-metadata) |
| 152 | +- ["SYCLBIN/ir module metadata"](PropertySets.md#syclbinir-module-metadata) |
| 153 | +- ["SYCLBIN/native device code image module metadata"](PropertySets.md#syclbinnative-device-code-image-metadata) |
| 154 | + |
| 155 | +##### IR module |
| 156 | + |
| 157 | +An IR module contains the binary data for the corresponding module compiled to a |
| 158 | +given IR representation, identified by the IR type field. |
| 159 | + |
| 160 | +##### IR module header |
| 161 | + |
| 162 | +A IR module header contains the following fields in the stated order: |
| 163 | + |
| 164 | +| Type | Description | |
| 165 | +| ---------- | ------------------------------------------------------------------------ | |
| 166 | +| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. | |
| 167 | +| `uint64_t` | Byte size of the metadata in the metadata byte table. | |
| 168 | +| `uint64_t` | Byte offset of the raw IR bytes from the start of the binary byte table. | |
| 169 | +| `uint64_t` | Byte size of the raw IR bytes in the binary byte table. | |
| 170 | + |
| 171 | +##### IR module metadata |
| 172 | + |
| 173 | +An IR module metadata entry contains a single property set with the identifying |
| 174 | +name "SYCLBIN/ir module metadata", as described in the |
| 175 | +[PropertySets.md](PropertySets.md#syclbinir-module-metadata) design document. |
| 176 | + |
| 177 | +##### Native device code image |
| 178 | + |
| 179 | +An native device code image contains the binary data for the corresponding |
| 180 | +module AOT compiled for a specific device, identified by the architecture |
| 181 | +string. The runtime library will attempt to map these to the architecture |
| 182 | +enumerators in the |
| 183 | +[sycl_ext_oneapi_device_architecture](../extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc) |
| 184 | +extension. |
| 185 | + |
| 186 | +##### Native device code image header |
| 187 | + |
| 188 | +A native device code image header contains the following fields in the stated |
| 189 | +order: |
| 190 | + |
| 191 | +| Type | Description | |
| 192 | +| ---------- | ----------------------------------------------------------------------------------- | |
| 193 | +| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. | |
| 194 | +| `uint64_t` | Byte size of the metadata in the metadata byte table. | |
| 195 | +| `uint64_t` | Byte offset of the device code image bytes from the start of the binary byte table. | |
| 196 | +| `uint64_t` | Byte size of the device code image bytes in the binary byte table. | |
| 197 | + |
| 198 | +##### Native device code image metadata |
| 199 | + |
| 200 | +A native device code image metadata entry contains a single property set with |
| 201 | +the identifying name "SYCLBIN/native device code image module metadata", as |
| 202 | +described in the |
| 203 | +[PropertySets.md](PropertySets.md#syclbinnative-device-code-image-metadata) |
| 204 | +design document. |
| 205 | + |
| 206 | +##### Byte tables |
| 207 | + |
| 208 | +A byte table contains dynamic data, such as metadata and binary blobs. The |
| 209 | +contents of it is generally referenced by an offset specified in the headers. |
| 210 | + |
| 211 | +## Toolchain integration |
| 212 | + |
| 213 | +The content of the SYCLBIN may be contained as an image in the [offload binary](https://github.com/llvm/llvm-project/blame/main/llvm/include/llvm/Object/OffloadBinary.h) produced by the [clang-sycl-linker](https://github.com/llvm/llvm-project/tree/main/clang/tools/clang-sycl-linker). |
| 214 | + |
| 215 | +### Clang driver changes |
| 216 | + |
| 217 | +- [ ] This needs to be rewritten... |
| 218 | + |
| 219 | +The clang driver needs to accept the following new flags: |
| 220 | + |
| 221 | +<table> |
| 222 | +<tr> |
| 223 | +<th>Option</th> |
| 224 | +<th>Description</th> |
| 225 | +</tr> |
| 226 | +<tr> |
| 227 | +<td>`-fsyclbin`</td> |
| 228 | +<td> |
| 229 | +If this option is set, the output of the invocation is a SYCLBIN file with the |
| 230 | +.syclbin file extension. This skips the host-compilation invocation of the |
| 231 | +typical `-fsycl` pipeline, instead passing the output of the |
| 232 | +clang-offload-packager invocation to clang-linker-wrapper together with the new |
| 233 | +`--syclbin` flag. |
| 234 | + |
| 235 | +Setting this option will override `-fsycl`. Passing`-fsycl-device-only` with |
| 236 | +`-fsyclbin` will cause `-fsyclbin` to be considered unused. |
| 237 | + |
| 238 | +The behavior is dependent on using the clang-linker-wrapper. |
| 239 | +</td> |
| 240 | +</tr> |
| 241 | +<tr> |
| 242 | +<td>`--offload-rdc`</td> |
| 243 | +<td>This is an alias of `-fgpu-rdc`.</td> |
| 244 | +</tr> |
| 245 | +</table> |
| 246 | + |
| 247 | +Additionally, `-fsycl-link` should work with .syclbin files. Semantics of how |
| 248 | +SYCLBIN files are linked together is yet to be specified. |
| 249 | + |
| 250 | +### clang-sycl-linker changes |
| 251 | + |
| 252 | +- [ ] to be updated... |
| 253 | + |
| 254 | +The clang-linker-wrapper is responsible for doing module-splitting, metadata |
| 255 | +extraction and linking of device binaries, as described in |
| 256 | +[OffloadDesign.md](OffloadDesign.md). However, to support SYCLBIN files, the |
| 257 | +clang-linker-wrapper must be able to unpack an offload binary (as described in |
| 258 | +[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst)) |
| 259 | +directly, instead of extracting it from a host binary. This should be done when |
| 260 | +a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is |
| 261 | +responsible to package the resulting device binaries and produced metadata into |
| 262 | +the format described in [SYCLBIN binary format section](#syclbin-binary-format). |
| 263 | + |
| 264 | +### clang-linker-wrapper changes |
| 265 | + |
| 266 | +- [ ] to be updated... |
| 267 | + |
| 268 | +Additionally, in this case the clang-linker-wrapper will skip the wrapping of |
| 269 | +the device code and the host code linking stage, as there is no host code to |
| 270 | +wrap the device code in and link. |
| 271 | + |
| 272 | +### SYCL runtime library changes |
| 273 | + |
| 274 | +- [ ] do we want to provide details in RFC or limit it to basic info? |
| 275 | + |
| 276 | +Using the interfaces from the |
| 277 | +[sycl_ext_oneapi_syclbin](../extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc) |
| 278 | +extension, the runtime must be able to parse the SYCLBIN format, as described in |
| 279 | +the [SYCLBIN binary format section](#syclbin-binary-format). To avoid large |
| 280 | +amounts of code duplication, the runtime uses the implementation of SYCLBIN |
| 281 | +reading and writing implemented in LLVM. |
| 282 | + |
| 283 | +When creating a `kernel_bundle` from a SYCLBIN file, the runtime reads the |
| 284 | +contents of the SYCLBIN file and creates the corresponding data structure from |
| 285 | +it. |
| 286 | + |
| 287 | +- [ ] this part below needs to be rewritten I think.... |
| 288 | + |
| 289 | +In order for the SYCL runtime library's existing logic to use the binaries, |
| 290 | +the runtime then creates a collection of `sycl_device_binary_struct` objects and |
| 291 | +its constituents, pointing to the data in the parsed SYCLBIN object. Passing |
| 292 | +these objects to the runtime library's `ProgramManager` allows it to reuse the |
| 293 | +logic for compiling, linking and building SYCL binaries. |
| 294 | + |
| 295 | +In the other direction, users can request the "contents" of a `kernel_bundle`. |
| 296 | +When this is done, the runtime library must ensure that a SYCLBIN file is |
| 297 | +available for the contents of the `kernel_bundle` and must then write the |
| 298 | +SYCLBIN object to the corresponding binary representation in the format |
| 299 | +described in the [SYCLBIN binary format section](#syclbin-binary-format). In cases |
| 300 | +where the `kernel_bundle` was created with a SYCLBIN file, the SYCLBIN |
| 301 | +representation is immediately available and can be serialized directly. In other |
| 302 | +cases, the runtime library creates a new SYCLBIN object from the binaries |
| 303 | +associated with the `kernel_bundle`, then serializes it and returns the result. |
| 304 | + |
| 305 | +## Versioning and Extensibility |
| 306 | + |
| 307 | +The SYCLBIN format is subject to change, but any such changes must come with an |
| 308 | +increment to the version number in the header. |
| 309 | +Additionally, any changes to the property set structure that affects the way the |
| 310 | +runtime has to parse the contained property sets will require an increase in the |
| 311 | +SYCLBIN version. Adding new property set names or new predefined properties only |
| 312 | +require a SYCLBIN version change if the the SYCLBIN consumer cannot safely |
| 313 | +ignore the property. |
| 314 | + |
| 315 | +## Upstreaming Plan |
| 316 | + |
| 317 | +- Phase 1: Upstream SYCLBIN format specification, including parsing/writing |
| 318 | +- Phase 2: Add clang driver, sycl-linker and linker-wrapper support |
| 319 | +- Phase 3: Integrate SYCLBIN support into SYCL runtime |
| 320 | + |
| 321 | +## Opens, common todos |
| 322 | + |
| 323 | +- [ ] Do we want to extend SYCL spec with SYCLBIN format? Do we want to somehow mention it in the RFC? Is it relevant? |
| 324 | +- [ ] Need to add/update links in this RFC |
| 325 | +- [ ] Polish formatting, fix typos, etc... |
0 commit comments