diff --git a/sycl/doc/design/PropertySets.md b/sycl/doc/design/PropertySets.md new file mode 100644 index 0000000000000..25b95d6c91337 --- /dev/null +++ b/sycl/doc/design/PropertySets.md @@ -0,0 +1,296 @@ +# SYCL binary property sets + +To communicate information about SYCL binaries to the SYCL runtime, the +implementation produces sets of properties. The intention of this design +document is to describe the structure of the property sets and define the +representation and meaning of pre-defined property set names. + + +## Property sets structure + +A property set consists of a reserved name, enclosed in square brackets, +followed by a series of string key and value pairs. The set name and each entry +in the set are separated by a newline. + +The string key and value pairs have the following format: +``` +=| +``` + +The value type is a string and the value of it has the following meaning for the +corresponding value: + +| Value type | Description | +| ---------- | ----------------------------------------- | +| "1" | The value is a 32 bit integer. | +| "2" | The value is a base64 encoded byte array. | + +__Note:__ Whitespaces are __not__ ignored and are treated like any other +characters. As such, `some_key=1|1` is not the same as `some_key = 1 | 1` and +`[some property set]` is not the same as `[ some property set ]`. + + +## Property sets + +This section describes the known property sets. + + +### [SYCL/specialization constants] + +__Key:__ Specialization constant name. + +__Value type:__ Byte array. ("2") + +__Value:__ Information about the specialization constant with the following +fields: + +```c++ +// Encodes ID of a scalar specialization constants which is a leaf of some +// composite specialization constant. +unsigned ID; +// Encodes offset from the beginning of composite, where scalar resides, i.e. +// location of the scalar value within a byte-array containing the whole +// composite specialization constant. If descriptor is used to represent a +// whole scalar specialization constant instead of an element of a composite, +// this field should be contain zero. +unsigned Offset; +// Encodes size of scalar specialization constant. +unsigned Size; +``` + +See also [SYCL2020-SpecializationConstants.md](./SYCL2020-SpecializationConstants.md). + + +### [SYCL/specialization constants default values] + +__Key:__ Specialization constant name. + +__Value type:__ Byte array. ("2") + +__Value:__ Byte representation of the default value for the specialization +constant. + +See also [SYCL2020-SpecializationConstants.md](./SYCL2020-SpecializationConstants.md). + + +### [SYCL/devicelib req mask] + +__Key:__ At most one entry with "DeviceLibReqMask". + +__Value type:__ 32 bit integer. ("1") + +__Value:__ A bitmask of which device libraries the binary uses. + +__Notes:__ + +1. If this property set is missing, no device libraries are used by the binary. + + +### [SYCL/kernel param opt] + +__Key:__ Kernel name. + +__Value type:__ Byte array. ("2") + +__Value:__ A bitmask identifying the arguments of the kernel that have been +removed by the dead-argument-elimination optimization pass. + +__Notes:__ + +1. If no entry is present for a given kernel in the binary, no arguments have +been eliminated. +2. If this property set is missing, no kernels in the binary have any eliminated +arguments. + + +### [SYCL/program metadata] + +Program metadata properties: + +| Key | Value type | Value | +| ---------------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------- | +| `kernel` + "@reqd_work_group_size" | Byte array. ("2") | Specifies the required work-group size for the kernel identified by the name `kernel`. | +| `kernel` + "@work_group_num_dim" | Byte array. ("2") | Specifies the work-group dimensionality of the kernel identified by the name `kernel`. | +| `kernel` + "@max_work_group_size" | Byte array. ("2") | Specifies the max work-group size for the kernel identified by the name `kernel`. | +| `kernel` + "@max_linear_work_group_size" | Byte array. ("2") | Specifies the max linear work-group size for the kernel identified by the name `kernel`. | +| `variable` + "@global_id_mapping" | Byte array. ("2") | Specifies the mapping between the global variable with unique identifier `variable` and its name in the binary. | + + +### [SYCL/misc properties] + +Miscellaneous properties: + +| Key | Value type | Value | +| ------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------- | +| "isEsimdImage" | 32 bit integer. ("1") | 1 if the image is ESIMD and 0 or missing otherwise. | +| "sycl-register-alloc-mode" | 32 bit integer. ("1") | The register allocation mode: 2 for large and 0 or missing for automatic. | +| "sycl-grf-size" | 32 bit integer. ("1") | The GRF size. Automatic if 0 or missing. | +| "optLevel" | 32 bit integer. ("1") | Optimization level, corresponding to the `-O` option used during compilation. | +| "sanUsed" | Byte array. ("2") | Specifying if address sanitization ("asan") or memory sanitization ("msan") is used. Missing if neither is used. | +| "specConstsReplacedWithDefault" | 32 bit integer. ("1") | 1 if the specialization constants have been replaced by their default values and 0 or missing otherwise. | + + +### [SYCL/assert used] + +__Key:__ Kernel name. + +__Value type:__ 32 bit integer. ("1") + +__Value:__ 1 if the kernel uses assertions and 0 or missing otherwise. + + +### [SYCL/exported symbols] + +__Key:__ Symbol name. + +__Value type:__ 32 bit integer. ("1") + +__Value:__ 1 if the symbol is exported by the binary and 0 or missing otherwise. + +See also [SharedLibraries.md](SharedLibraries.md). + + +### [SYCL/imported symbols] + +__Key:__ Symbol name. + +__Value type:__ 32 bit integer. ("1") + +__Value:__ 1 if the symbol is imported by the binary and 0 or missing otherwise. + +See also [SharedLibraries.md](SharedLibraries.md). + + +### [SYCL/device globals] + +__Key:__ Device global variable name. + +__Value type:__ Byte array. ("2") + +__Value:__ Information about the device global variable with the following +fields: + +```c++ + // Encodes size of the underlying type T of the device global variable. + uint32_t Size; + + // Either 1 (true) or 0 (false), telling whether the device global variable + // was declared with the device_image_scope property. + // We use uint32_t for a boolean value to eliminate padding after the field + // and suppress false positive reports from MemorySanitizer. + uint32_t DeviceImageScope; +``` + +__Notes:__ + +1. If this property set is missing, the binary does not contain any device +global variables. + +See also [DeviceGlobal.md](./DeviceGlobal.md). + + +### [SYCL/device requirements] + +Set of device requirements for the entire module: + +| Key | Value type | Value | +| ------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| "aspects" | Byte array. ("2") | A collection of 32 bit integers representing the SYCL aspects used. These correspond 1:1 with the enum values of `sycl::aspect`. | +| "fixed_target" | Byte array. ("2") | The string literals specified in `-fsycl-fixed-targets`. | +| "reqd_work_group_size_uint64_t" | Byte array. ("2") | At most three 64 bit unsigned integers representing the required work-group size. If this entry is missing, there is no work-group size requirement. | +| "joint_matrix" | Byte array. ("2") | A string containing a semi-colon-separated list of comma-separated descriptors for used matrices. The descriptors in the order they appear are:
  • sycl-joint-matrix-type
  • sycl-joint-matrix-use
  • sycl-joint-matrix-rows
  • sycl-joint-matrix-cols
| +| "joint_matrix_mad" | Byte array. ("2") | A string containing a semi-colon-separated list of comma-separated descriptors for used matrix MAD operations. The descriptors in the order they appear are:
  • sycl-joint-matrix-mad-type-A
  • sycl-joint-matrix-mad-type-B
  • sycl-joint-matrix-mad-type-C
  • sycl-joint-matrix-mad-type-D
  • sycl-joint-matrix-mad-size-M
  • sycl-joint-matrix-mad-size-K
  • sycl-joint-matrix-mad-size-N
| +| "reqd_sub_group_size" | Byte array. ("2") | At most three 32 bit unsigned integers representing the required sub-group size. If this entry is missing, there is no sub-group size requirement. | +| "work_group_num_dim" | Byte array. ("2") | At most three 32 bit unsigned integers representing the work-group dimensionality. If this entry is missing, there is no specified work-group dimensionality. | + + +See also [OptionalDeviceFeatures.md](OptionalDeviceFeatures.md). + + +### [SYCL/host pipes] + +__Key:__ Host pipe variable name. + +__Value type:__ Byte array. ("2") + +__Value:__ Information about the host pipe variable with the following +fields: + +```c++ + // Encodes size of the underlying type T of the host pipe variable. + uint32_t Size; +``` + +__Notes:__ + +1. If this property set is missing, the binary does not contain any host pipe +variables. + + +### [SYCL/virtual functions] + +Set of information about virtual function usage in the module. + +| Key | Value type | Value | +| ---------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| "virtual-functions-set" | Byte array. ("2") | A string identifying the set of virtual functions contained in the module. If this is missing, the module does not contain any virtual function sets. | +| "uses-virtual-functions-set" | Byte array. ("2") | A string containing a comma-separated list of sets of virtual functions used by the module. If this is missing, the module does not use any virtual function sets. | + + +### [SYCL/implicit local arg] + +__Key:__ Kernel name. + +__Value type:__ 32 bit integer. ("1") + +__Value:__ Index of the implicit local memory argument. + +__Notes:__ + +1. If no entry is present for a given kernel in the binary, the kernel does not +have an implicit local memory argument. +2. If this property set is missing, no kernels in the binary have an implicit +local memory argument. + + +### [SYCL/registered kernels] + +__Key:__ "Registered" kernel name. + +__Value type:__ Byte array. ("2") + +__Value:__ The name of the kernel corresponding to the registered kernel name. + +__Notes:__ + +1. If this property set is missing, the binary does not have any registered +kernel names. + + +### [SYCLBIN/global metadata] + +Set of global information about a SYCLBIN file. + +| Key | Value type | Value | +| ------- | --------------------- | ----- | +| "state" | 32 bit integer. ("1") | Integer representation of one of the possible states of the file, corresponding to the `sycl::bundle_state` enum. It must be one of the following:
  1. `sycl::bundle_state::input`
  2. `sycl::bundle_state::object`
  3. `sycl::bundle_state::executable`
| + + +### [SYCLBIN/ir module metadata] + +Set of information about an IR module in a SYCLBIN file. + +| Key | Value type | Value | +| -------- | --------------------- | ----- | +| "type" | 32 bit integer. ("1") | Integer representation of one of the pre-defined IR types. It must be one of the following:
  1. SPIR-V
  2. PTX
  3. AMDGCN
| +| "target" | Byte array. ("2") | A string representing the architecture of the binary, corresponding to the value of `-fsycl-targets` option used when compiling this binary. This may be missing if no part of `-fsycl-targets` was used during the compilation of this binary or if `-fsycl-targets` was not used at all. | + + +### [SYCLBIN/native device code image metadata] + +Set of information about an native device code image in a SYCLBIN file. + +| Key | Value type | Value | +| ------ | ----------------- | ----- | +| "arch" | Byte array. ("2") | A string representing the architecture of the binary, corresponding to the value of `-fsycl-targets` option used when compiling this binary. | + diff --git a/sycl/doc/design/SYCLBINDesign.md b/sycl/doc/design/SYCLBINDesign.md new file mode 100644 index 0000000000000..bd32a59319de4 --- /dev/null +++ b/sycl/doc/design/SYCLBINDesign.md @@ -0,0 +1,289 @@ +# SYCLBIN - A format for separately compiled SYCL device code + +Some applications may want to dynamically load device binaries at runtime, e.g. +for modularity and to avoid having to recompile the entire application. However, +doing so through the +[sycl_ext_oneapi_kernel_compiler](https://github.com/intel/llvm/blob/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc) +extension can be unnecessarily expensive, if the systems utilizing this +modularity are able to compile the binaries separate from the application's +execution. + +To facilitate that a new SYCLBIN format is needed to define the interface between +the compiler-produced binaries and the runtime's handling of it. This necessity +comes from the overall design of our SYCL toolchain where runtime relies on +compiler-provided information/metadata to implement various features (like +support for specialization constants or shared libraries), i.e. device code alone +is not enough. + +This design document details the SYCLBIN binary format used for storing SYCL +device binaries to be loaded dynamically by the SYCL runtime. It also details +how the toolchain produces, links and packages these binaries, as well as how +the SYCL runtime library handles files of this format. + + +## SYCLBIN binary format + +The files produced by the new compilation path will follow the format described +in this section. The intention of defining a new format for these is to give +the DPC++ implementation an extendable and lightweight wrapper around the +multiple modules and corresponding metadata captured in the SYCLBIN file. +The content of the SYCLBIN may be contained as an entry in the offloading binary +format produced by the clang-offload-packager, as described in +[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst). + +The format of these files consist of a [file header](#file-header), containing +general information about the file, followed by three lists of headers: The +[abstract module header](#abstract-module-header) list, the +[IR module header](#ir-module-header) list and +[native device code image header](#native-device-code-image-header) list, +containing information about the [abstract modules](#abstract-module), +[IR modules](#ir-module) and +[native device code images](#native-device-code-image) respectively. + +| | +| --------------------------------------------------------------------- | +| [File header](#file-header) | +| [Abstract module header](#abstract-module-header) 1 | +| ... | +| [Abstract module header](#abstract-module-header) N | +| [IR module header](#ir-module-header) 1 | +| ... | +| [IR module header](#ir-module-header) M | +| [Native device code image header](#native-device-code-image-header) 1 | +| ... | +| [Native device code image header](#native-device-code-image-header) L | +| Metadata byte table | +| Binary byte table | + +The headers and each byte table are all aligned to 8 bytes. The fields in the +headers use C/C++ type notation, including the fixed-size integer types defined +in the `` header, and will have the same size and alignment. For +consistency, all these types use little endian layout. + +### File header + +The file header segment appears as the first part of the SYCLBIN file. Like many +other file-formats, it defines a magic number to help identify the format, which +is 0x53594249 (or "SYBI".) Immediately following the magic number is the version +number, which is used by SYCLBIN consumers when parsing data in the rest of the +file. + +| Type | Description | +| ---------- | ----------------------------------------------------------------------------- | +| `uint32_t` | Magic number. (0x53594249) | +| `uint32_t` | SYCLBIN version number. | +| `uint32_t` | Number of abstract modules. | +| `uint32_t` | Number of IR modules. | +| `uint32_t` | Number of native device code images. | +| `uint64_t` | Byte size of the metadata byte table. | +| `uint64_t` | Byte size of the binary byte table. | +| `uint64_t` | Byte offset of the global metadata from the start of the metadata byte table. | +| `uint64_t` | Byte size of the global metadata. | + + +#### Global metadata + +The global metadata entry contains a single property set with the identifying +name "SYCLBIN/global metadata", as described in the +[PropertySets.md](PropertySets.md#syclbinglobal-metadata) design document. + + +### Abstract module + +An abstract module is a collection of device binaries that share properties, +including, but not limited to: kernel names, imported symbols, exported symbols, +aspect requirements, and specialization constants. + +The device binaries contained inside an abstract module must either be an IR +module or a native device code image. IR modules contain device binaries in some +known intermediate representation, such as SPIR-V, while the native device code +images can be an architecture-specific binary format. There is no requirement +that all device binaries in an abstract module are usable on the same device or +are specific to a single vendor. + + +#### Abstract module header + +A abstract module header contains the following fields in the stated order: + +| Type | Description | +| ---------- | ------------------------------------------------------------------------------------------ | +| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. | +| `uint64_t` | Byte size of the metadata in the metadata byte table. | +| `uint32_t` | Number of IR modules. | +| `uint32_t` | Index of the first IR module header in the IR module header array. | +| `uint32_t` | Number of native device code images. | +| `uint32_t` | Index of the first native device code images header native device code image header array. | + +#### Abstract module metadata + +An abstract module metadata entry contains any number of property sets, as +described in [PropertySets.md](PropertySets.md), excluding: + +* ["SYCLBIN/global metadata"](PropertySets.md#syclbinglobal-metadata) +* ["SYCLBIN/ir module metadata"](PropertySets.md#syclbinir-module-metadata) +* ["SYCLBIN/native device code image module metadata"](PropertySets.md#syclbinnative-device-code-image-metadata) + + +#### IR module + +An IR module contains the binary data for the corresponding module compiled to a +given IR representation, identified by the IR type field. + + +##### IR module header + +A IR module header contains the following fields in the stated order: + +| Type | Description | +| ---------- | ------------------------------------------------------------------------ | +| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. | +| `uint64_t` | Byte size of the metadata in the metadata byte table. | +| `uint64_t` | Byte offset of the raw IR bytes from the start of the binary byte table. | +| `uint64_t` | Byte size of the raw IR bytes in the binary byte table. | + + +##### IR module metadata + +An IR module metadata entry contains a single property set with the identifying +name "SYCLBIN/ir module metadata", as described in the +[PropertySets.md](PropertySets.md#syclbinir-module-metadata) design document. + + +#### Native device code image + +An native device code image contains the binary data for the corresponding +module AOT compiled for a specific device, identified by the architecture +string. The runtime library will attempt to map these to the architecture +enumerators in the +[sycl_ext_oneapi_device_architecture](../extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc) +extension. + + +##### Native device code image header + +A native device code image header contains the following fields in the stated +order: + +| Type | Description | +| ---------- | ----------------------------------------------------------------------------------- | +| `uint64_t` | Byte offset of the metadata from the start of the metadata byte table. | +| `uint64_t` | Byte size of the metadata in the metadata byte table. | +| `uint64_t` | Byte offset of the device code image bytes from the start of the binary byte table. | +| `uint64_t` | Byte size of the device code image bytes in the binary byte table. | + + +##### Native device code image metadata + +A native device code image metadata entry contains a single property set with +the identifying name "SYCLBIN/native device code image module metadata", as +described in the +[PropertySets.md](PropertySets.md#syclbinnative-device-code-image-metadata) +design document. + + +### Byte tables + +A byte table contains dynamic data, such as metadata and binary blobs. The +contents of it is generally referenced by an offset specified in the headers. + + +### SYCLBIN version changelog + +The SYCLBIN format is subject to change, but any such changes must come with an +increment to the version number in the header and a subsection to this section +describing the change. + +Additionally, any changes to the property set structure that affects the way the +runtime has to parse the contained property sets will require an increase in the +SYCLBIN version. Adding new property set names or new predefined properties only +require a SYCLBIN version change if the the SYCLBIN consumer cannot safely +ignore the property. + + +#### Version 1 + + * Initial version of the layout. + + +## Clang driver changes + +The clang driver needs to accept the following new flags: + + + + + + + + + + + + + + +
OptionDescription
`-fsyclbin` +If this option is set, the output of the invocation is a SYCLBIN file with the +.syclbin file extension. This skips the host-compilation invocation of the +typical `-fsycl` pipeline, instead passing the output of the +clang-offload-packager invocation to clang-linker-wrapper together with the new +`--syclbin` flag. + +Setting this option will override `-fsycl`. Passing`-fsycl-device-only` with +`-fsyclbin` will cause `-fsycl-device-only` to be considered unused. + +The behavior is dependent on using the clang-linker-wrapper. As the current +default offload compilation behavior is using the old offload model (driver +based), this option currently requires `--offload-new-driver` to be set. +
`--offload-rdc`This is an alias of `-fgpu-rdc`.
+ +Additionally, `-fsycl-link` should work with .syclbin files. Semantics of how +SYCLBIN files are linked together is yet to be specified. + + +## clang-linker-wrapper changes + +The clang-linker-wrapper is responsible for doing module-splitting, metadata +extraction and linking of device binaries, as described in +[OffloadDesign.md](OffloadDesign.md). However, to support SYCLBIN files, the +clang-linker-wrapper must be able to unpack an offload binary (as described in +[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst)) +directly, instead of extracting it from a host binary. This should be done when +a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is +responsible to package the resulting device binaries and produced metadata into +the format described in [SYCLBIN binary format section](#syclbin-binary-format). +Additionally, in this case the clang-linker-wrapper will skip the wrapping of +the device code and the host code linking stage, as there is no host code to +wrap the device code in and link. + +*TODO:* Describe the details of linking SYCLBIN files. + + +## SYCL runtime library changes + +Using the interfaces from the +[sycl_ext_oneapi_syclbin](../extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc) +extension, the runtime must be able to parse the SYCLBIN format, as described in +the [SYCLBIN binary format section](#syclbin-binary-format). To avoid large +amounts of code duplication, the runtime uses the implementation of SYCLBIN +reading and writing implemented in LLVM. + +When creating a `kernel_bundle` from a SYCLBIN file, the runtime reads the +contents of the SYCLBIN file and creates the corresponding data structure from +it. In order for the SYCL runtime library's existing logic to use the binaries, +the runtime then creates a collection of `sycl_device_binary_struct` objects and +its constituents, pointing to the data in the parsed SYCLBIN object. Passing +these objects to the runtime library's `ProgramManager` allows it to reuse the +logic for compiling, linking and building SYCL binaries. + +In the other direction, users can request the "contents" of a `kernel_bundle`. +When this is done, the runtime library must ensure that a SYCLBIN file is +available for the contents of the `kernel_bundle` and must then write the +SYCLBIN object to the corresponding binary representation in the format +described in the [SYCLBIN binary format section](#syclbin-binary-format). In cases +where the `kernel_bundle` was created with a SYCLBIN file, the SYCLBIN +representation is immediately available and can be serialized directly. In other +cases, the runtime library creates a new SYCLBIN object from the binaries +associated with the `kernel_bundle`, then serializes it and returns the result. + diff --git a/sycl/doc/index.rst b/sycl/doc/index.rst index cc4961dd7f438..d49a788ace48d 100644 --- a/sycl/doc/index.rst +++ b/sycl/doc/index.rst @@ -54,6 +54,8 @@ Design Documents for the oneAPI DPC++ Compiler design/CommandGraph design/OffloadDesign design/PrivateAlloca + design/SYCLBINDesign + design/PropertySets New OpenCL Extensions New SPIR-V Extensions