Skip to content

Commit dfd72fc

Browse files
authored
[SYCL][Doc] Simpler free function kernel enqueue (intel#19995)
Specify a simplified way to enqueue free function kernels without first getting a kernel bundle. The style follows the enqueue functions in "sycl_ext_oneapi_enqueue_functions". Once the KHR extension "sycl_khr_free_function_commands" is finalized, we may also add enqueue functions that follow that style. However, we will first need to extend the `requirements` in that KHR to accept kernel launch properties.
1 parent 52baa2b commit dfd72fc

File tree

1 file changed

+144
-66
lines changed

1 file changed

+144
-66
lines changed

sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc

Lines changed: 144 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ This extension also depends on the following other SYCL extensions:
4949
sycl_ext_oneapi_properties]
5050
* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[
5151
sycl_ext_oneapi_kernel_properties]
52+
* link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[
53+
sycl_ext_oneapi_enqueue_functions]
5254

5355

5456
== Status
@@ -241,6 +243,13 @@ A function decorated with one of these properties can still be called as a
241243
normal function in either host or device code.
242244
The property has no effect in such cases.
243245

246+
[_Note:_ Many of the APIs specified below have a template parameter `Func`,
247+
which identifies a free function kernel.
248+
This kernel function may be defined in any translation unit in the application.
249+
It is not necessary for the function to be defined in the same translation unit
250+
as the instantiation of the template taking the `Func` parameter.
251+
_{endnote}_]
252+
244253
=== New traits for kernel functions
245254

246255
This extension defines the following traits that can be used to tell whether a
@@ -323,16 +332,123 @@ Otherwise `value` is `false`.
323332
The helper trait `is_kernel_v` provides the value of `value`.
324333
|====
325334

335+
=== New free functions to launch a kernel
336+
337+
This extension adds the following helper which captures a kernel function
338+
address as a template parameter.
339+
340+
[frame=all,grid=none,separator="@"]
341+
!====
342+
a@
343+
[source,c++]
344+
----
345+
namespace sycl::ext::oneapi::experimental {
346+
347+
template<auto *Func>
348+
struct kernel_function_s {};
349+
350+
template<auto *Func>
351+
inline constexpr kernel_function_s<Func> kernel_function;
352+
353+
} // namespace sycl::ext::oneapi::experimental
354+
----
355+
!====
356+
357+
It also adds the following free functions which launch a free function kernel.
358+
359+
[frame=all,grid=none,separator="@"]
360+
!====
361+
a@
362+
[source,c++]
363+
----
364+
namespace sycl::ext::oneapi::experimental {
365+
366+
template <auto *Func, typename Args...>
367+
void single_task(queue q, kernel_function_s<Func> k, Args&&... args);
368+
369+
template <auto *Func, typename Args...>
370+
void single_task(handler &h, kernel_function_s<Func> k, Args&&... args);
371+
372+
} // namespace sycl::ext::oneapi::experimental
373+
----
374+
!====
375+
376+
_Constraints_: Available only if `is_single_task_kernel_v<Func>` is `true`.
377+
Available only if `+std::is_invocable_v<decltype(Func), Args...>+` is `true`.
378+
379+
_Effects_: Enqueues a kernel object to the `queue` or `handler` as a single task.
380+
Each value in the `args` pack is passed to the corresponding argument in
381+
`Func`, converting it to the argument's type if necessary.
382+
383+
'''
384+
385+
[frame=all,grid=none,separator="@"]
386+
!====
387+
a@
388+
[source,c++]
389+
----
390+
namespace sycl::ext::oneapi::experimental {
391+
392+
template <auto *Func, int Dimensions, typename... Args>
393+
void nd_launch(queue q, nd_range<Dimensions> r,
394+
kernel_function_s<Func> k, Args&&... args);
395+
396+
template <auto *Func, int Dimensions, typename... Args>
397+
void nd_launch(handler &h, nd_range<Dimensions> r,
398+
kernel_function_s<Func> k, Args&&... args);
399+
400+
} // namespace sycl::ext::oneapi::experimental
401+
----
402+
!====
403+
404+
_Constraints_: Available only if `is_nd_range_kernel_v<Func, Dimensions>` is
405+
`true`.
406+
Available only if `+std::is_invocable_v<decltype(Func), Args...>+` is `true`.
407+
408+
_Effects_: Enqueues a kernel object to the `queue` or `handler` as an ND-range
409+
kernel, using the number of work-items specified by the ND-range `r`.
410+
Each value in the `args` pack is passed to the corresponding argument in
411+
`Func`, converting it to the argument's type if necessary.
412+
413+
'''
414+
415+
[frame=all,grid=none,separator="@"]
416+
!====
417+
a@
418+
[source,c++]
419+
----
420+
namespace sycl::ext::oneapi::experimental {
421+
422+
template <auto *Func, int Dimensions,
423+
typename Properties, typename... Args>
424+
void nd_launch(queue q,
425+
launch_config<nd_range<Dimensions>, Properties> c,
426+
kernel_function_s<Func> k, Args&& args...);
427+
428+
template <auto *Func, int Dimensions,
429+
typename Properties, typename... Args>
430+
void nd_launch(handler &h,
431+
launch_config<nd_range<Dimensions>, Properties> c,
432+
kernel_function_s<Func> k, Args&& args...);
433+
434+
} // namespace sycl::ext::oneapi::experimental
435+
----
436+
!====
437+
438+
_Constraints_: Available only if `is_nd_range_kernel_v<Func, Dimensions>` is
439+
`true`.
440+
Available only if `+std::is_invocable_v<decltype(Func), Args...>+` is `true`.
441+
442+
_Effects_: Enqueues a kernel object to the `queue` or `handler` as an ND-range
443+
kernel, using the launch configuration specified by `c`.
444+
Each value in the `args` pack is passed to the corresponding argument in
445+
`Func`, converting it to the argument's type if necessary.
446+
326447
=== New kernel bundle member functions
327448

328449
This extension adds the following new functions which add kernel bundle support
329450
for free function kernels.
330451

331-
[_Note:_ Many of the functions in this section have a template parameter
332-
`Func`, which identifies a free function kernel.
333-
This kernel function may be defined in any translation unit in the application.
334-
_{endnote}_]
335-
336452
|====
337453
a|
338454
[frame=all,grid=none]
@@ -798,7 +914,8 @@ The allowed types are:
798914
The following example demonstrates how to define a free function kernel and then
799915
enqueue it on a device.
800916

801-
```
917+
[source,c++]
918+
----
802919
#include <sycl/sycl.hpp>
803920
namespace syclext = sycl::ext::oneapi;
804921
namespace syclexp = sycl::ext::oneapi::experimental;
@@ -818,23 +935,14 @@ int main() {
818935
sycl::queue q;
819936
sycl::context ctxt = q.get_context();
820937
821-
// Get a kernel bundle that contains the free function kernel "iota".
822-
auto exe_bndl =
823-
syclexp::get_kernel_bundle<iota, sycl::bundle_state::executable>(ctxt);
824-
825-
// Get a kernel object for the "iota" function from that bundle.
826-
sycl::kernel k_iota = exe_bndl.ext_oneapi_get_kernel<iota>();
827-
828938
float *ptr = sycl::malloc_shared<float>(NUM, q);
829-
q.submit([&](sycl::handler &cgh) {
830-
// Set the values of the kernel arguments.
831-
cgh.set_args(3.14f, ptr);
832939
833-
sycl::nd_range ndr{{NUM}, {WGSIZE}};
834-
cgh.parallel_for(ndr, k_iota);
835-
}).wait();
940+
sycl::nd_range ndr{{NUM}, {WGSIZE}};
941+
syclexp::nd_launch(q, ndr, syclexp::kernel_function<iota>, 3.14f, ptr);
942+
943+
q.wait();
836944
}
837-
```
945+
----
838946

839947
=== Free function kernels which are templates or overloaded
840948

@@ -843,7 +951,8 @@ It is also legal to define several overloads for a free function kernel.
843951
The following example demonstrates how to get a kernel identifier in such
844952
cases.
845953

846-
```
954+
[source,c++]
955+
----
847956
#include <sycl/sycl.hpp>
848957
namespace syclexp = sycl::ext::oneapi::experimental;
849958
@@ -864,17 +973,26 @@ void ping(int *x) {
864973
}
865974
866975
int main() {
976+
sycl::queue q;
977+
sycl::context ctxt = q.get_context();
978+
979+
float *fptr = sycl::malloc_shared<float>(NUM, q);
980+
int *iptr = sycl::malloc_shared<int>(NUM, q);
981+
sycl::nd_range ndr{{NUM}, {WGSIZE}};
982+
867983
// When the free function kernel is templated, pass the address of a
868984
// specific instantiation.
869-
sycl::kernel_id iota_float = syclexp::get_kernel_id<iota<float>>();
870-
sycl::kernel_id iota_int = syclexp::get_kernel_id<iota<int>>();
985+
syclexp::nd_launch(q, ndr, syclexp::kernel_function<iota<float>>, 3.14f, fptr);
986+
syclexp::nd_launch(q, ndr, syclexp::kernel_function<iota<int>>, 3, iptr);
871987
872988
// When there are multiple overloads of a free function kernel, use a cast
873989
// to disambiguate.
874-
sycl::kernel_id ping_float = syclexp::get_kernel_id<(void(*)(float))ping>();
875-
sycl::kernel_id ping_int = syclexp::get_kernel_id<(void(*)(int))ping>();
990+
syclexp::nd_launch(q, ndr, syclexp::kernel_function<(void(*)(float))ping>, fptr);
991+
syclexp::nd_launch(q, ndr, syclexp::kernel_function<(void(*)(int))ping>, iptr);
992+
993+
q.wait();
876994
}
877-
```
995+
----
878996

879997

880998
[[level-zero-and-opencl-compatibility]]
@@ -1065,46 +1183,6 @@ argument, effectively turning the call into a no-op.
10651183

10661184
== Issues
10671185

1068-
* We're pretty sure that we want to define some syntax that allows a free
1069-
function kernel to be enqueued using the APIs defined in
1070-
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[
1071-
sycl_ext_oneapi_enqueue_functions], but we haven't settled on the exact API
1072-
yet.
1073-
One option is like this:
1074-
+
1075-
```
1076-
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclexp::nd_range_kernel<1>))
1077-
void iota(float start, float *ptr) { /*...*/ }
1078-
1079-
int main() {
1080-
sycl::queue q;
1081-
float *ptr = sycl::malloc_shared<float>(N, q);
1082-
sycl::nd_launch<iota>(q, sycl::nd_range{{N}, {WGS}}, 1.f, ptr);
1083-
}
1084-
```
1085-
+
1086-
Another option is like this:
1087-
+
1088-
```
1089-
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclexp::nd_range_kernel<1>))
1090-
void iota(float start, float *ptr) { /*...*/ }
1091-
1092-
int main() {
1093-
sycl::queue q;
1094-
float *ptr = sycl::malloc_shared<float>(N, q);
1095-
sycl::nd_launch(q, sycl::nd_range{{N}, {WGS}}, kfp<iota>, 1.f, ptr);
1096-
}
1097-
```
1098-
+
1099-
Where `kfp` would have some nicer name.
1100-
+
1101-
With either form above, it seems like we have enough type information for the
1102-
header to check that the types of the actual kernel arguments are implicitly
1103-
convertible to the types of the formal kernel parameters, and we can raise a
1104-
compile-time error if they are not.
1105-
In addition, the header can perform any necessary implicit conversions when
1106-
setting the kernel argument values.
1107-
11081186
* We are debating whether we should allow a free function kernel to be defined
11091187
with an initial "iteration index" parameter such as:
11101188
+

0 commit comments

Comments
 (0)