- 
                Notifications
    You must be signed in to change notification settings 
- Fork 791
Handler-less kernel submit path (parallel_for with nd_range) #19294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, I would like to see at least one public interface implementation that utilizes this approach, just to ensure it works.
| 
 In the latest update, there are two public interfaces: The enqueue functions extension, and queue.parallel_for. Both are enabled only if __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT is defined. | 
expose the new APIs as public under a new define
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| @intel/llvm-gatekeepers please consider merging | 
#19294 added new _no_cgh version, need to pass the macros to fix the build failures.
| sycl::detail::lambda_arg_type<KernelType, nd_item<Dims>>; | ||
| static_assert( | ||
| std::is_convertible_v<sycl::nd_item<Dims>, LambdaArgType>, | ||
| "Kernel argument of a sycl::parallel_for with sycl::nd_range " | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the text be altered in the subsequent patches, as this code can be called not only from parallel_for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the plan is to extend this to other functions once parallel_for(nd_range) is complete.
This PR introduces a fully handler-less kernel submission path. The feature is not complete yet. For testing purposes we introduce the __DPCPP_ENABLE_UNFINISHED_NO_CGH_SUBMIT macros to enable unit tests for the new handler-less path. This macro should not be used by the application, and a legacy handler-based path is used. Once the handler-less path is fully implemented, we will switch corresponding APIs to use it unconditionally and will remove the macros.
This PR covers: