@@ -123,10 +123,15 @@ of a `simd<T, N>` type (e.g. using multiple registers rather than an array).
123123We expect implementations of this ABI to build on Clang's `ext_vector_type`,
124124rather than specifying the register types directly.
125125
126- The value of `N` for each `simd` argument to a SIMD function must be the same,
127- and represents the sub-group size of the calling kernel. If a function accepts
128- `simd` arguments with multiple `N` values, it cannot be invoked from a SPMD
129- kernel and the implementation must issue a diagnostic in this case.
126+ Each of the following must hold true for a SIMD function and its invoke_simd
127+ call site:
128+ - the value of `N` of each `simd` or `simd_mask` formal parameter corresponding to a non-uniform
129+ SPMD actual argument (if any) must be the same
130+ - if the return type of the SIMD function is `simd<T, M>` or `simd_mask<T, M>`,
131+ then `M` must match `N` above
132+ Otherwise, the SIMD function cannot be invoked from an SPMD kernel and the
133+ implementation must issue a diagnostic in this case.
134+ The `N` value represents the sub-group size of the calling kernel.
130135
131136The example below shows a simple SIMD function that scales all elements of
132137an 8-wide SIMD type `x` by a scalar value `n`.
@@ -244,6 +249,12 @@ rules in reverse:
244249 are converted to `T`, and the value in element `i` of the SIMD type is
245250 returned to the work-item with sub-group local ID `i`.
246251
252+ - Return values of type `sycl::ext::oneapi::experimental::uniform<T>` are not anyhow converted,
253+ and broadcast to each work-item; every work-item in the sub-group receives
254+ the same value.
255+ NOTE: `sycl::ext::oneapi::experimental::uniform<T>` return type is the way to return
256+ a uniform value of `simd` or `simd_mask` type.
257+
247258- Return values of type `T` are converted to `sycl::ext::oneapi::experimental::uniform<T>`,
248259 and broadcast to each work-item; every work-item in the sub-group receives
249260 the same value.
@@ -263,6 +274,25 @@ variable should be passed via a dedicated argument (e.g. the value returned
263274by `sub_group::get_local_id()[0]` could be passed as an integer to a
264275`Callable` expecting a `sycl::ext::oneapi::experimental::simd<uint32_t, N>`).
265276
277+ NOTE: An implementation must be able to determine the sub-group size of the SIMD
278+ function object to perform argument type conversion.
279+ The determined subgroup size must be equal to the value of the caller's `sycl::reqd_sub_group_size`
280+ attribute (if present) and consistent with the argument conversion rules defined
281+ by this specification. Otherwise, the implementation must issue a proper diagnostic.
282+
283+ The following algorithm to determine the sub-group size may be employed:
284+ - If any of the actual SPMD arguments is non-uniform, the specification requires
285+ that the corresponding SIMD argument must have `simd<T, N>` or `simd_mask<bool, N>`
286+ type, and `N` must be equal to the the sub-group size. The implementation may
287+ iterate through the supported sub-group sizes at compile time and check whether the
288+ function object is invocable with given sub-group size. There must be
289+ exactly one such sub-group size and it must match `N` or the user program is not
290+ well-formed.
291+ - Otherwise, if the return type of the SIMD target function is `simd<T, N>` or
292+ `simd_mask<bool, N>`, then the sub-group size is `N`.
293+ - Otherwise (all arguments and the return type are uniform) the SIMD target is
294+ fully uniform and it can be invoked with any sub-group size.
295+
266296The `invoke_simd` function has the same requirements as other group functions
267297(as defined in Section 4.17.3 of the SYCL 2020 specification). A call to
268298`invoke_simd` must be encountered in converged control flow by all work-items
@@ -447,4 +477,5 @@ SYCL 2020 specification states that this only applies to inter-device transfers.
447477|2|2021-03-31|John Pennycook|*Rename extension and add feature test macro*
448478|3|2021-04-23|John Pennycook|*Split uniform wrapper into separate extension*
449479|4|2022-01-20|John Pennycook|*Clarify interaction with SYCL_EXTERNAL*
480+ |5|2022-08-15|Konst Bobrovskii|*Clarify sub-group size calculation*
450481|========================================
0 commit comments