Skip to content

Commit 14e2cac

Browse files
committed
[SYCL][Doc] RFC draft for SYCLBIN format
Not to be merged to `sycl` branch. Created for discussion and feedback.
1 parent 9898e9a commit 14e2cac

File tree

1 file changed

+325
-0
lines changed

1 file changed

+325
-0
lines changed
Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
# [RFC] SYCLBIN - A format for SYCL device code
2+
3+
## Summary
4+
5+
This RFC proposes the addition of SYCLBIN, a new binary format for storing SYCL device code. The format provides a lightweight, extensible wrapper around device modules and their corresponding SYCL-specific metadata to be produced/consumed by tools/SYCL runtime.
6+
7+
## Purpose of this RFC
8+
9+
This RFC seeks community feedback on the proposed SYCLBIN binary format design, including:
10+
11+
- The binary format specification and layout
12+
- Toolchain integration approach and new compiler flags
13+
14+
Community input is particularly valuable regarding potential integration challenges with existing LLVM offloading implementations.
15+
16+
## Motivation and alternatives considered
17+
18+
### Metadata unique for SYCL programming model
19+
20+
- [ ] This section needs re-writing...
21+
22+
LLVM offloading infrastructure supports the following binary formats: Object, Bitcode, Cubin, Fatbinary, PTX and SPIRV which could be placed into OffloadBinary format. None of it satisfies the needs of SYCL programming model.
23+
24+
- [ ] Steffen, I need to discuss with you, why other existing formats did not satisfy our needs. I think we need to provide short summary why each format doesn't work for us somewhere in this section.
25+
26+
Specifically, SYCL needs to keep the following metadata necessary for SYCL runtime, which is not supported by any of existing formats:
27+
28+
1. Device target triple (e.g. spirv64_unknown_unknown).
29+
2. Compiler and linker options to pass to JIT compiler in case of JITing.
30+
3. List of entry points exposed by an image
31+
4. Arrays of property sets.
32+
33+
While #1 and #2 can be saved to StringData of OffloadBinary, #3 requires additional handling, since StringData serialization infrastructure assumes that value is a single null-terminated string, so to restore multiple null-terminated strings from StringData format, they need to be concatenated with split symbol and then split during deserialization.
34+
35+
[Property sets](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/PropertySets.md) (#4) would be even more complicated.
36+
37+
### Abstraction: simplify support in offloading tools
38+
39+
Another motivation to add SYCLBIN format is to encapsulate SYCL-specific logic to SYCL-specific parts of toolchain (clang-sycl-linker, SYCL runtime) and hide SYCL specifics from offloading tools intended to support multiple programming models. Without this format, we would need to use the following workflow to pass metadata (#1 - #4) from compiler to runtime:
40+
41+
1. clang-sycl-linker would use OffloadingImage’s StringData to save metadata #1-#4.
42+
Problem: OffloadingImage’s StringData is not intended for composite objects like arrays or property sets.
43+
2. clang-linker-wrapper would open OffloadingImages prepared by clang-sycl-linker and generate device image binary descriptor for each image in some format that SYCL runtime could read.
44+
Problem: clang-linker-wrapper needs to maintain SYCL-specific formats necessary for SYCL runtime, which means unnecessary duplication.
45+
3. Then SYCL runtime would use this format to decode metadata.
46+
47+
If SYCLBIN is accepted, then the scheme could be simplified, resolving problems highlighted above:
48+
49+
1. clang-sycl-linker would prepare SYCLBIN with all metadata encoded put it inside OffloadingImage as image.
50+
2. clang-linker-wrapper would generate only host register and unregister calls, but would know nothing about what’s inside SYCLBIN.
51+
3. SYCL runtime would work with SYCLBIN directly.
52+
53+
### Enable modular dynamic loading of device binaries at runtime
54+
55+
Some applications may want to dynamically load device binaries at runtime, e.g. for modularity and to avoid having to recompile the entire application. To facilitate that SYCLBIN format defines the interface between
56+
the compiler-produced binaries and the runtime's handling of it.
57+
58+
## Detailed Design
59+
60+
### SYCLBIN binary format
61+
62+
The SYCLBIN format consists of:
63+
64+
- A [file header](#file-header) with magic number (0x53594249 "SYBI") and version information
65+
- Three lists of headers: the [abstract module header](#abstract-module-header) list, the
66+
[IR module header](#ir-module-header) list and
67+
[native device code image header](#native-device-code-image-header) list,
68+
containing information about the [abstract modules](#abstract-module),
69+
[IR modules](#ir-module) and
70+
[native device code images](#native-device-code-image) respectively.
71+
- Two byte tables containing metadata and binary data
72+
73+
#### File Structure
74+
75+
| |
76+
| --------------------------------------------------------------------- |
77+
| [File header](#file-header) |
78+
| [Abstract module header](#abstract-module-header) 1 |
79+
| ... |
80+
| [Abstract module header](#abstract-module-header) N |
81+
| [IR module header](#ir-module-header) 1 |
82+
| ... |
83+
| [IR module header](#ir-module-header) M |
84+
| [Native device code image header](#native-device-code-image-header) 1 |
85+
| ... |
86+
| [Native device code image header](#native-device-code-image-header) L |
87+
| Metadata byte table |
88+
| Binary byte table |
89+
90+
The headers and each byte table are all aligned to 8 bytes. The fields in the
91+
headers use C/C++ type notation, including the fixed-size integer types defined
92+
in the `<cstdint>` header, and will have the same size and alignment. For
93+
consistency, all these types use little endian layout.
94+
95+
#### Component Details
96+
97+
- [ ] Do we want to provide that level of details in the RFC, or it is better to clean it up to
98+
keep only key info and for details provide reference to design document?
99+
100+
##### File header
101+
102+
| Type | Description |
103+
| ---------- | ----------------------------------------------------------------------------- |
104+
| `uint32_t` | Magic number. (0x53594249) |
105+
| `uint32_t` | SYCLBIN version number. |
106+
| `uint32_t` | Number of abstract modules. |
107+
| `uint32_t` | Number of IR modules. |
108+
| `uint32_t` | Number of native device code images. |
109+
| `uint64_t` | Byte size of the metadata byte table. |
110+
| `uint64_t` | Byte size of the binary byte table. |
111+
| `uint64_t` | Byte offset of the global metadata from the start of the metadata byte table. |
112+
| `uint64_t` | Byte size of the global metadata. |
113+
114+
##### Global metadata
115+
116+
The global metadata entry contains a single property set with the identifying
117+
name "SYCLBIN/global metadata", as described in the
118+
[PropertySets.md](PropertySets.md#syclbinglobal-metadata) design document.
119+
120+
##### Abstract module
121+
122+
An abstract module is a collection of device binaries that share properties,
123+
including, but not limited to: kernel names, imported symbols, exported symbols,
124+
aspect requirements, and specialization constants.
125+
126+
The device binaries contained inside an abstract module must either be an IR
127+
module or a native device code image. IR modules contain device binaries in some
128+
known intermediate representation, such as SPIR-V, while the native device code
129+
images can be an architecture-specific binary format. There is no requirement
130+
that all device binaries in an abstract module are usable on the same device or
131+
are specific to a single vendor.
132+
133+
##### Abstract module header
134+
135+
A abstract module header contains the following fields in the stated order:
136+
137+
| Type | Description |
138+
| ---------- | ------------------------------------------------------------------------------------------ |
139+
| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. |
140+
| `uint64_t` | Byte size of the metadata in the metadata byte table. |
141+
| `uint32_t` | Number of IR modules. |
142+
| `uint32_t` | Index of the first IR module header in the IR module header array. |
143+
| `uint32_t` | Number of native device code images. |
144+
| `uint32_t` | Index of the first native device code images header native device code image header array. |
145+
146+
##### Abstract module metadata
147+
148+
An abstract module metadata entry contains any number of property sets, as
149+
described in [PropertySets.md](PropertySets.md), excluding:
150+
151+
- ["SYCLBIN/global metadata"](PropertySets.md#syclbinglobal-metadata)
152+
- ["SYCLBIN/ir module metadata"](PropertySets.md#syclbinir-module-metadata)
153+
- ["SYCLBIN/native device code image module metadata"](PropertySets.md#syclbinnative-device-code-image-metadata)
154+
155+
##### IR module
156+
157+
An IR module contains the binary data for the corresponding module compiled to a
158+
given IR representation, identified by the IR type field.
159+
160+
##### IR module header
161+
162+
A IR module header contains the following fields in the stated order:
163+
164+
| Type | Description |
165+
| ---------- | ------------------------------------------------------------------------ |
166+
| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. |
167+
| `uint64_t` | Byte size of the metadata in the metadata byte table. |
168+
| `uint64_t` | Byte offset of the raw IR bytes from the start of the binary byte table. |
169+
| `uint64_t` | Byte size of the raw IR bytes in the binary byte table. |
170+
171+
##### IR module metadata
172+
173+
An IR module metadata entry contains a single property set with the identifying
174+
name "SYCLBIN/ir module metadata", as described in the
175+
[PropertySets.md](PropertySets.md#syclbinir-module-metadata) design document.
176+
177+
##### Native device code image
178+
179+
An native device code image contains the binary data for the corresponding
180+
module AOT compiled for a specific device, identified by the architecture
181+
string. The runtime library will attempt to map these to the architecture
182+
enumerators in the
183+
[sycl_ext_oneapi_device_architecture](../extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc)
184+
extension.
185+
186+
##### Native device code image header
187+
188+
A native device code image header contains the following fields in the stated
189+
order:
190+
191+
| Type | Description |
192+
| ---------- | ----------------------------------------------------------------------------------- |
193+
| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. |
194+
| `uint64_t` | Byte size of the metadata in the metadata byte table. |
195+
| `uint64_t` | Byte offset of the device code image bytes from the start of the binary byte table. |
196+
| `uint64_t` | Byte size of the device code image bytes in the binary byte table. |
197+
198+
##### Native device code image metadata
199+
200+
A native device code image metadata entry contains a single property set with
201+
the identifying name "SYCLBIN/native device code image module metadata", as
202+
described in the
203+
[PropertySets.md](PropertySets.md#syclbinnative-device-code-image-metadata)
204+
design document.
205+
206+
##### Byte tables
207+
208+
A byte table contains dynamic data, such as metadata and binary blobs. The
209+
contents of it is generally referenced by an offset specified in the headers.
210+
211+
## Toolchain integration
212+
213+
The content of the SYCLBIN may be contained as an image in the [offload binary](https://github.com/llvm/llvm-project/blame/main/llvm/include/llvm/Object/OffloadBinary.h) produced by the [clang-sycl-linker](https://github.com/llvm/llvm-project/tree/main/clang/tools/clang-sycl-linker).
214+
215+
### Clang driver changes
216+
217+
- [ ] This needs to be rewritten...
218+
219+
The clang driver needs to accept the following new flags:
220+
221+
<table>
222+
<tr>
223+
<th>Option</th>
224+
<th>Description</th>
225+
</tr>
226+
<tr>
227+
<td>`-fsyclbin`</td>
228+
<td>
229+
If this option is set, the output of the invocation is a SYCLBIN file with the
230+
.syclbin file extension. This skips the host-compilation invocation of the
231+
typical `-fsycl` pipeline, instead passing the output of the
232+
clang-offload-packager invocation to clang-linker-wrapper together with the new
233+
`--syclbin` flag.
234+
235+
Setting this option will override `-fsycl`. Passing`-fsycl-device-only` with
236+
`-fsyclbin` will cause `-fsyclbin` to be considered unused.
237+
238+
The behavior is dependent on using the clang-linker-wrapper.
239+
</td>
240+
</tr>
241+
<tr>
242+
<td>`--offload-rdc`</td>
243+
<td>This is an alias of `-fgpu-rdc`.</td>
244+
</tr>
245+
</table>
246+
247+
Additionally, `-fsycl-link` should work with .syclbin files. Semantics of how
248+
SYCLBIN files are linked together is yet to be specified.
249+
250+
### clang-sycl-linker changes
251+
252+
- [ ] to be updated...
253+
254+
The clang-linker-wrapper is responsible for doing module-splitting, metadata
255+
extraction and linking of device binaries, as described in
256+
[OffloadDesign.md](OffloadDesign.md). However, to support SYCLBIN files, the
257+
clang-linker-wrapper must be able to unpack an offload binary (as described in
258+
[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst))
259+
directly, instead of extracting it from a host binary. This should be done when
260+
a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is
261+
responsible to package the resulting device binaries and produced metadata into
262+
the format described in [SYCLBIN binary format section](#syclbin-binary-format).
263+
264+
### clang-linker-wrapper changes
265+
266+
- [ ] to be updated...
267+
268+
Additionally, in this case the clang-linker-wrapper will skip the wrapping of
269+
the device code and the host code linking stage, as there is no host code to
270+
wrap the device code in and link.
271+
272+
### SYCL runtime library changes
273+
274+
- [ ] do we want to provide details in RFC or limit it to basic info?
275+
276+
Using the interfaces from the
277+
[sycl_ext_oneapi_syclbin](../extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc)
278+
extension, the runtime must be able to parse the SYCLBIN format, as described in
279+
the [SYCLBIN binary format section](#syclbin-binary-format). To avoid large
280+
amounts of code duplication, the runtime uses the implementation of SYCLBIN
281+
reading and writing implemented in LLVM.
282+
283+
When creating a `kernel_bundle` from a SYCLBIN file, the runtime reads the
284+
contents of the SYCLBIN file and creates the corresponding data structure from
285+
it.
286+
287+
- [ ] this part below needs to be rewritten I think....
288+
289+
In order for the SYCL runtime library's existing logic to use the binaries,
290+
the runtime then creates a collection of `sycl_device_binary_struct` objects and
291+
its constituents, pointing to the data in the parsed SYCLBIN object. Passing
292+
these objects to the runtime library's `ProgramManager` allows it to reuse the
293+
logic for compiling, linking and building SYCL binaries.
294+
295+
In the other direction, users can request the "contents" of a `kernel_bundle`.
296+
When this is done, the runtime library must ensure that a SYCLBIN file is
297+
available for the contents of the `kernel_bundle` and must then write the
298+
SYCLBIN object to the corresponding binary representation in the format
299+
described in the [SYCLBIN binary format section](#syclbin-binary-format). In cases
300+
where the `kernel_bundle` was created with a SYCLBIN file, the SYCLBIN
301+
representation is immediately available and can be serialized directly. In other
302+
cases, the runtime library creates a new SYCLBIN object from the binaries
303+
associated with the `kernel_bundle`, then serializes it and returns the result.
304+
305+
## Versioning and Extensibility
306+
307+
The SYCLBIN format is subject to change, but any such changes must come with an
308+
increment to the version number in the header.
309+
Additionally, any changes to the property set structure that affects the way the
310+
runtime has to parse the contained property sets will require an increase in the
311+
SYCLBIN version. Adding new property set names or new predefined properties only
312+
require a SYCLBIN version change if the the SYCLBIN consumer cannot safely
313+
ignore the property.
314+
315+
## Upstreaming Plan
316+
317+
- Phase 1: Upstream SYCLBIN format specification, including parsing/writing
318+
- Phase 2: Add clang driver, sycl-linker and linker-wrapper support
319+
- Phase 3: Integrate SYCLBIN support into SYCL runtime
320+
321+
## Opens, common todos
322+
323+
- [ ] Do we want to extend SYCL spec with SYCLBIN format? Do we want to somehow mention it in the RFC? Is it relevant?
324+
- [ ] Need to add/update links in this RFC
325+
- [ ] Polish formatting, fix typos, etc...

0 commit comments

Comments
 (0)