|
| 1 | += sycl_ext_oneapi_range_type |
| 2 | + |
| 3 | +:source-highlighter: coderay |
| 4 | +:coderay-linenums-mode: table |
| 5 | + |
| 6 | +// This section needs to be after the document title. |
| 7 | +:doctype: book |
| 8 | +:toc2: |
| 9 | +:toc: left |
| 10 | +:encoding: utf-8 |
| 11 | +:lang: en |
| 12 | +:dpcpp: pass:[DPC++] |
| 13 | +:endnote: —{nbsp}end{nbsp}note |
| 14 | + |
| 15 | +// Set the default source code type in this document to C++, |
| 16 | +// for syntax highlighting purposes. This is needed because |
| 17 | +// docbook uses c++ and html5 uses cpp. |
| 18 | +:language: {basebackend@docbook:c++:cpp} |
| 19 | + |
| 20 | + |
| 21 | +== Notice |
| 22 | + |
| 23 | +[%hardbreaks] |
| 24 | +Copyright (C) 2024 Intel Corporation. All rights reserved. |
| 25 | + |
| 26 | +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks |
| 27 | +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by |
| 28 | +permission by Khronos. |
| 29 | + |
| 30 | + |
| 31 | +== Contact |
| 32 | + |
| 33 | +To report problems with this extension, please open a new issue at: |
| 34 | + |
| 35 | +https://github.com/intel/llvm/issues |
| 36 | + |
| 37 | + |
| 38 | +== Dependencies |
| 39 | + |
| 40 | +This extension is written against the SYCL 2020 revision 9 specification. All |
| 41 | +references below to the "core SYCL specification" or to section numbers in the |
| 42 | +SYCL specification refer to that revision. |
| 43 | + |
| 44 | +This extension also depends on the following other SYCL extensions: |
| 45 | + |
| 46 | +* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ |
| 47 | + sycl_ext_oneapi_kernel_properties] |
| 48 | + |
| 49 | + |
| 50 | +== Status |
| 51 | + |
| 52 | +This is a proposed extension specification, intended to gather community |
| 53 | +feedback. Interfaces defined in this specification may not be implemented yet |
| 54 | +or may be in a preliminary state. The specification itself may also change in |
| 55 | +incompatible ways before it is finalized. *Shipping software products should |
| 56 | +not rely on APIs defined in this specification.* |
| 57 | + |
| 58 | + |
| 59 | +== Overview |
| 60 | + |
| 61 | +The maximum number of work-items that can be launched in a single kernel |
| 62 | +depends on multiple factors. |
| 63 | +SYCL 2020 says that the total number of work-items must be representable as a |
| 64 | +`size_t`, but several implementations (including {dpcpp}) provide optimization |
| 65 | +options to assert that kernels will not require the full range of a `size_t`. |
| 66 | + |
| 67 | +This extension proposes a new kernel property that allows developers to declare |
| 68 | +the range requirements of individual kernels, providing more fine-grained |
| 69 | +control than existing compiler options and improved error behavior. |
| 70 | + |
| 71 | +The property described in this extension is an advanced feature that most |
| 72 | +applications should not need to use. |
| 73 | +In most cases, applications get the best performance without using this |
| 74 | +property. |
| 75 | + |
| 76 | + |
| 77 | +== Specification |
| 78 | + |
| 79 | +=== Feature test macro |
| 80 | + |
| 81 | +This extension provides a feature-test macro as described in the core SYCL |
| 82 | +specification. An implementation supporting this extension must predefine the |
| 83 | +macro `SYCL_EXT_ONEAPI_RANGE_TYPE` to one of the values defined in the table |
| 84 | +below. Applications can test for the existence of this macro to determine if |
| 85 | +the implementation supports this feature, or applications can test the macro's |
| 86 | +value to determine which of the extension's features the implementation |
| 87 | +supports. |
| 88 | + |
| 89 | +[%header,cols="1,5"] |
| 90 | +|=== |
| 91 | +|Value |
| 92 | +|Description |
| 93 | + |
| 94 | +|1 |
| 95 | +|The APIs of this experimental extension are not versioned, so the |
| 96 | + feature-test macro always has this value. |
| 97 | +|=== |
| 98 | + |
| 99 | +=== New property |
| 100 | + |
| 101 | +```c++ |
| 102 | +namespace sycl::ext::oneapi::experimental { |
| 103 | + |
| 104 | +struct range_type_key { |
| 105 | + template <typename T> |
| 106 | + using value_t = property_value<range_type_key, T>; |
| 107 | +}; |
| 108 | + |
| 109 | +template <typename T> |
| 110 | +inline constexpr range_type_key::value_t<T> range_type; |
| 111 | + |
| 112 | +} // namespace sycl::ext::oneapi::experimental |
| 113 | +``` |
| 114 | + |
| 115 | +|=== |
| 116 | +|Property|Description |
| 117 | + |
| 118 | +|`range_type` |
| 119 | +|The `range_type` property is an assertion by the application that the kernel |
| 120 | +will never be launched with more than `std::numeric_limits<T>::max()` |
| 121 | +work-items. |
| 122 | +If the kernel is launched with more than this many work-items, the |
| 123 | +implementation must throw a synchronous `exception` with the `errc::nd_range` |
| 124 | +error code. |
| 125 | + |
| 126 | +`T` must be an integral type. |
| 127 | + |
| 128 | +|=== |
| 129 | + |
| 130 | +This property can also be associated with a device function using the |
| 131 | +`SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` macro. |
| 132 | + |
| 133 | +There are special requirements whenever a device function defined in one |
| 134 | +translation unit makes a call to a device function that is defined in a second |
| 135 | +translation unit. |
| 136 | +In such a case, the second device function is always declared using |
| 137 | +`SYCL_EXTERNAL`. |
| 138 | +If the kernel calling these device functions is defined using a `range_type` |
| 139 | +property, the functions declared using `SYCL_EXTERNAL` must be similarly |
| 140 | +decorated to ensure that a compatible `range_type` is used. |
| 141 | +This decoration must exist in both the translation unit making the call and |
| 142 | +also in the translation unit that defines the function. |
| 143 | +If the `range_type` property is missing in the translation unit that makes the |
| 144 | +call, or if the `range_type` of the called function is not compatible with the |
| 145 | +`range_type` of the calling function, the program is ill-formed and the |
| 146 | +compiler must raise a diagnostic. |
| 147 | +Two `range_type` properties are considered compatible if all values that can be |
| 148 | +represented by the `range_type` of the caller function can be represented by |
| 149 | +the `range_type` of the called function. |
| 150 | + |
| 151 | +== Usage example |
| 152 | + |
| 153 | +```c++ |
| 154 | +namespace syclex = sycl::ext::oneapi::experimental; |
| 155 | + |
| 156 | +struct SmallKernel |
| 157 | +{ |
| 158 | + // Declare that this kernel supports at most 2^31-1 work-items. |
| 159 | + auto get(syclex::properties_tag) const { |
| 160 | + return syclex::properties{syclex::range_type<int>}; |
| 161 | + } |
| 162 | +}; |
| 163 | + |
| 164 | +... |
| 165 | + |
| 166 | +// Throws an exception with errc::nd_range error code. |
| 167 | +// (because 2147483648 > 2147483647) |
| 168 | +q.parallel_for(2147483648, SmallKernel()); |
| 169 | +``` |
| 170 | + |
| 171 | +== Interaction with the {dpcpp} "-fsycl-id-queries-fit-in-int" option |
| 172 | + |
| 173 | +The `-fsycl-id-queries-fit-in-int` option is specific to the {dpcpp} |
| 174 | +implementation. |
| 175 | +Therefore, this section that describes the interaction between this extension |
| 176 | +and that option is non-normative and does not apply to other SYCL |
| 177 | +implementations that may support this extension. |
| 178 | + |
| 179 | +If a translation unit is compiled with the `-fsycl-id-queries-fit-in-int` |
| 180 | +option, all kernels and `SYCL_EXTERNAL` functions without an explicitly |
| 181 | +specified `range_type` property are compiled as-if `range_type<int>` was |
| 182 | +specified as a property of that kernel or function. |
| 183 | + |
| 184 | + |
| 185 | +== Implementation notes |
| 186 | + |
| 187 | +This non-normative section provides information about one possible |
| 188 | +implementation of this extension. |
| 189 | +It is not part of the specification of the extension's API. |
| 190 | + |
| 191 | +There are several ways this extension could be implemented: |
| 192 | + |
| 193 | +- A dedicated LLVM-IR pass could recognize functions that compute an ID (e.g., |
| 194 | + `nd_item::get_global_id`), check whether the static call tree comes from a |
| 195 | + kernel or function decorated with `range_type`, and then conditionally insert |
| 196 | + an equivalent of `__builtin_assume`. |
| 197 | + |
| 198 | +- The existing `+__builtin_assume+` intrinsic in `+__SYCL_ASSUME_INT+` could be |
| 199 | + replaced by a new (but similar) SPIR-V intrinsic, shifting range-type |
| 200 | + optimizations to the device compiler. |
| 201 | + This could take the form of a dedicated `+__spirv_AssumeInRange+` intrinsic, |
| 202 | + or a general `+__spirv_Assume+` intrinsic coupled with a `+__spirv_MaxRange+` |
| 203 | + value. |
| 204 | + |
| 205 | +Ideally, the solution should not generate SPIR-V modules that require new |
| 206 | +extensions if `range_type` is not used. |
| 207 | + |
| 208 | + |
| 209 | +== Issues |
| 210 | + |
| 211 | +None. |
0 commit comments