[SYCL] Fix handling of subgroup info queries#8859
Conversation
Update `opencl` PI plugin info queries to prevent crashes when the device does not support subgroups. Signed-off-by: Michael Aziz <michael.aziz@intel.com>
| const cl_uint compileNumSg = | ||
| krn.get_info<info::kernel_device_specific::compile_num_sub_groups>(dev); | ||
| assert(compileNumSg <= maxNumSg); |
There was a problem hiding this comment.
Unrelated to the PR, but that's a bit weird query. I don't think that we have a way to specify amount of sub-groups a kernel should have. Do we really need that query in SYCL spec? Tagging @gmlueck here to comment.
There was a problem hiding this comment.
I need to ask @Pennycook what this query means. Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS in OpenCL?
Returns the number of sub-groups per work-group specified in the kernel source or IL. If the sub-group count is not specified then 0 is returned.
How would a SYCL application specify the number of subgroups at the source code level? You can specify the maximum number of work-items in a sub-group via [[sycl::reqd_sub_group_size]], but that's not the same as the number of sub-groups.
There was a problem hiding this comment.
Is this supposed to correspond to CL_KERNEL_COMPILE_NUM_SUB_GROUPS in OpenCL?
I think so. But I have no idea what is the mechanism of specifying that value on kernels even for OpenCL. I think it is also undocumented in there.
There was a problem hiding this comment.
I've always assumed that an implementation could derive this from a combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] -- if both are specified, then the number of sub-groups has also been specified. Even though devices are free to choose how work-groups are divided into sub-groups, I think that once a kernel has been compiled for a specific device the number of sub-groups should be known.
Based on some quick searching, though, I'm not sure if that was the intent. In OpenCL-Docs#447, @bashbaug notes that there isn't a way to set this property in OpenCL unless you're providing SPIR-V (which defines a SubgroupsPerWorkgroup ExecutionMode).
There was a problem hiding this comment.
My pedantic reading of the spec is that [[sycl::reqd_sub_group_size]] only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of [[sycl::reqd_sub_group_size]] and [[sycl::reqd_work_group_size]] does not necessarily specify a particular number of sub-groups.
There was a problem hiding this comment.
I think you're right for SYCL, and that comes from the relaxations we put in for sub-group sizes.
Table 34 in OpenCL says:
All sub-groups must be the same size, while the last subgroup in any work-group (i.e. the subgroup with the maximum index) could be the same or smaller size.
...so I think what I said holds for OpenCL.
We deliberately made SYCL sub-groups very flexible to give implementations a lot of freedom, but we probably went too far. I think all implementations either: 1) have the behavior mandated by OpenCL; or 2) have similar guarantees applied to the inner-most dimension of the work-group. If we clarified in a future version of SYCL that only these two interpretations were legal, I think an implementation could still reason about the number of sub-groups given the work-group size.
There was a problem hiding this comment.
My pedantic reading of the spec is that
[[sycl::reqd_sub_group_size]]only guarantees a maximum on the sub-group size. I think an implementation is conformant even if it chooses a smaller sub-group size. Therefore, I think the combination of[[sycl::reqd_sub_group_size]]and[[sycl::reqd_work_group_size]]does not necessarily specify a particular number of sub-groups.
@gmlueck, can you please clarify why the annotation only guarantees a maximum sub-group size? I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.
There was a problem hiding this comment.
I created a Khronos issue to track this spec clarification:
https://gitlab.khronos.org/sycl/Specification/-/issues/651
(Requires Khronos access.)
There was a problem hiding this comment.
I understood that an implementation had to use the sub-group size that was specified and could not choose a smaller one.
When the work-group size is not evenly divisible by the sub-group size, some sub-group(s) must have a smaller size. The SYCL spec does not currently provide any requirements. With today's wording, an implementation could make several (or all) sub-groups with a smaller size. There isn't even a requirement in the spec about the size of sub-groups when the work-group size is evenly divisible by the sub-group size.
| cast<cl_kernel_sub_group_info>(param_name), input_value_size, input_value, | ||
| sizeof(size_t), &ret_val, param_value_size_ret)); | ||
|
|
||
| if (ret_err == CL_INVALID_OPERATION) { |
There was a problem hiding this comment.
I originally thought that there would be a check for OpenCL version supported by the device before calling clGetKernelSubGroupInfo at all. However, this approach seems simpler and more performant
Signed-off-by: Michael Aziz <michael.aziz@intel.com>
|
Test failures (unrelated to this change):
|
This reverts commit 2408035.
Update
openclPI plugin info queries to prevent crashes when the device does not support subgroups.