|
| 1 | += sycl_ext_oneapi_clock |
| 2 | + |
| 3 | +:source-highlighter: coderay |
| 4 | +:coderay-linenums-mode: table |
| 5 | + |
| 6 | +// This section needs to be after the document title. |
| 7 | +:doctype: book |
| 8 | +:toc2: |
| 9 | +:toc: left |
| 10 | +:encoding: utf-8 |
| 11 | +:lang: en |
| 12 | +:dpcpp: pass:[DPC++] |
| 13 | +:endnote: —{nbsp}end{nbsp}note |
| 14 | + |
| 15 | +// Set the default source code type in this document to C++, |
| 16 | +// for syntax highlighting purposes. This is needed because |
| 17 | +// docbook uses c++ and html5 uses cpp. |
| 18 | +:language: {basebackend@docbook:c++:cpp} |
| 19 | + |
| 20 | + |
| 21 | +== Notice |
| 22 | + |
| 23 | +[%hardbreaks] |
| 24 | +Copyright (C) 2025 Intel Corporation. All rights reserved. |
| 25 | + |
| 26 | +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks |
| 27 | +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by |
| 28 | +permission by Khronos. |
| 29 | + |
| 30 | + |
| 31 | +== Contact |
| 32 | + |
| 33 | +To report problems with this extension, please open a new issue at: |
| 34 | + |
| 35 | +https://github.com/intel/llvm/issues |
| 36 | + |
| 37 | + |
| 38 | +== Dependencies |
| 39 | + |
| 40 | +This extension is written against the SYCL 2020 revision 10 specification. All |
| 41 | +references below to the "core SYCL specification" or to section numbers in the |
| 42 | +SYCL specification refer to that revision. |
| 43 | + |
| 44 | +== Status |
| 45 | + |
| 46 | +This is a proposed extension specification, intended to gather community |
| 47 | +feedback. Interfaces defined in this specification may not be implemented yet |
| 48 | +or may be in a preliminary state. The specification itself may also change in |
| 49 | +incompatible ways before it is finalized. *Shipping software products should |
| 50 | +not rely on APIs defined in this specification.* |
| 51 | + |
| 52 | +== Backend support status |
| 53 | + |
| 54 | +The APIs in this extension may be used only on a device that has |
| 55 | +`aspect::ext_oneapi_clock_sub_group`, `aspect::ext_oneapi_clock_work_group` or |
| 56 | +`aspect::ext_oneapi_clock_device` accordingly. The application must check that |
| 57 | +the device has these aspects before submitting a kernel using a corresponding |
| 58 | +API in this extension. If the application fails to do this, the implementation |
| 59 | +throws a synchronous exception with the `errc::kernel_not_supported` error code |
| 60 | +when the kernel is submitted to the queue. |
| 61 | + |
| 62 | +== Overview |
| 63 | + |
| 64 | +This extension introduces a new free function `clock<clock_scope>()`. This |
| 65 | +function allows the user to sample the value from one of three clocks provided |
| 66 | +by the compute units, depending on the value of the scope argument. The clocks |
| 67 | +in this extension do not necessarily count units of time. For example, they may |
| 68 | +count cycles instead. In addition, the cycle frequency may change as the kernel |
| 69 | +executes. As a result, there is no portable way to convert the values returned |
| 70 | +by these clocks into time durations. |
| 71 | + |
| 72 | +`scope` is an enumeration constant of the new `clock_scope` enum. It should be |
| 73 | +passed to the function to define the clock source; e.g., |
| 74 | +`clock<clock_scope::sub_group>()` samples the value from a clock shared by all |
| 75 | +work-items executing in the same sub-group. |
| 76 | + |
| 77 | +This extension also adds new aspects: `ext_oneapi_clock_sub_group`, |
| 78 | +`ext_oneapi_clock_work_group` and `ext_oneapi_clock_device` indicating whether |
| 79 | +the device supports the corresponding clock scopes. |
| 80 | + |
| 81 | +== Specification |
| 82 | + |
| 83 | +=== Feature test macro |
| 84 | + |
| 85 | +This extension provides a feature-test macro as described in the core SYCL |
| 86 | +specification. An implementation supporting this extension must predefine the |
| 87 | +macro `SYCL_EXT_ONEAPI_CLOCK` to one of the values defined in the table |
| 88 | +below. Applications can test for the existence of this macro to determine if |
| 89 | +the implementation supports this feature, or applications can test the macro's |
| 90 | +value to determine which of the extension's features the implementation |
| 91 | +supports. |
| 92 | + |
| 93 | +[%header,cols="1,5"] |
| 94 | +|=== |
| 95 | +|Value |
| 96 | +|Description |
| 97 | + |
| 98 | +|1 |
| 99 | +|The APIs of this experimental extension are not versioned, so the feature-test |
| 100 | + macro always has this value. |
| 101 | +|=== |
| 102 | + |
| 103 | +=== New device aspects |
| 104 | + |
| 105 | +This extension adds new device aspects: |
| 106 | + |
| 107 | +```c++ |
| 108 | +namespace sycl { |
| 109 | + |
| 110 | +enum class aspect : /*unspecified*/ { |
| 111 | + ext_oneapi_clock_sub_group, |
| 112 | + ext_oneapi_clock_work_group, |
| 113 | + ext_oneapi_clock_device |
| 114 | +}; |
| 115 | + |
| 116 | +} // namespace sycl |
| 117 | +``` |
| 118 | + |
| 119 | +[width="100%",%header,cols="50%,50%"] |
| 120 | +|=== |
| 121 | +|Aspect |
| 122 | +|Description |
| 123 | + |
| 124 | +|`ext_oneapi_clock_sub_group` |
| 125 | +|Indicates that the device supports the `sycl::ext::oneapi::experimental::clock<clock_scope::sub_group>()` call. |
| 126 | +|`ext_oneapi_clock_work_group` |
| 127 | +|Indicates that the device supports the `sycl::ext::oneapi::experimental::clock<clock_scope::work_group>()` call. |
| 128 | +|`ext_oneapi_clock_device` |
| 129 | +|Indicates that the device supports the `sycl::ext::oneapi::experimental::clock<clock_scope::device>()` call. |
| 130 | +|=== |
| 131 | + |
| 132 | +=== New enum |
| 133 | + |
| 134 | +```c++ |
| 135 | +namespace sycl::ext::oneapi::experimental { |
| 136 | + |
| 137 | +enum class clock_scope : /* unspecified */ { |
| 138 | + sub_group, |
| 139 | + work_group, |
| 140 | + device |
| 141 | +}; |
| 142 | + |
| 143 | +}; // namespace sycl::ext::oneapi::experimental |
| 144 | +``` |
| 145 | +An enumerator from `clock_scope` passed as a template parameter to the `clock()` |
| 146 | +function defines the clock source: |
| 147 | + |
| 148 | +[width="100%",%header,cols="50%,50%"] |
| 149 | +|=== |
| 150 | +|Enumerator |
| 151 | +|Description |
| 152 | + |
| 153 | +|`sub_group` |
| 154 | +|`clock()` gets values shared by all work-items executing in the same sub-group. |
| 155 | + |
| 156 | +|`work_group` |
| 157 | +|`clock()` gets values shared by all work-items executing in the same work-group. |
| 158 | + |
| 159 | +|`device` |
| 160 | +|`clock()` gets values shared by all work-items executing on the device. |
| 161 | +|=== |
| 162 | + |
| 163 | +=== New free function |
| 164 | + |
| 165 | +```c++ |
| 166 | +namespace sycl::ext::oneapi::experimental { |
| 167 | + |
| 168 | +template <clock_scope scope> uint64_t clock(); |
| 169 | + |
| 170 | +} // namespace sycl::ext::oneapi::experimental |
| 171 | +``` |
| 172 | + |
| 173 | +This function may only be called from within a SYCL kernel function. |
| 174 | + |
| 175 | +All work-items within the `scope` read from the same source clock. There is no |
| 176 | +guarantee that two work-items get the same value. |
| 177 | + |
| 178 | +_Returns:_ The sample value of a clock as seen by the work-item. |
| 179 | +The clock is defined as an unbounded, unsigned integer counter that |
| 180 | +monotonically increments over time. The rate at which the clock advances is not |
| 181 | +guaranteed to be constant: it may vary over the lifetime of a work-item, differ |
| 182 | +between separate executions of the program, and be affected by conditions |
| 183 | +outside the control of the programmer. The value returned by this instruction |
| 184 | +corresponds to the least significant bits of the clock counter at the time of |
| 185 | +execution. Consequently, the sampled value may wrap around zero. |
| 186 | + |
| 187 | +== Issues |
| 188 | + |
| 189 | +. How to convert the result of the function to seconds? |
| 190 | ++ |
| 191 | +*RESOLVED*: There is no portable way to convert the values returned by these |
| 192 | +clocks. |
0 commit comments