-
Notifications
You must be signed in to change notification settings - Fork 796
[SYCL][Graph] Add support for handler-less graph submission #20690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
5dc81c5 to
098a334
Compare
098a334 to
3617bc9
Compare
e3a6ba2 to
da7910d
Compare
da7910d to
7642921
Compare
sycl/source/detail/queue_impl.cpp
Outdated
| hasCommandGraph() ? getCommandGraph().get() : nullptr, | ||
| detail::CGType::Kernel); | ||
| hasCommandGraph() ? getCommandGraph().get() : nullptr, Type); | ||
| } else if (inOrder && MNoLastEventMode && CommandFuncContainsHostTask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that checking MNoLastEventMode here works (as this is what the handler path does). One of my longer term goals for the handler-less path however, was to avoid using MNoLastEventMode synchronization flag. Instead of checking that flag, my plan was to rely on the LastEvent being set. If SchedulerBypass is true, then we just unset it, since the lower layers take care of the ordering. If it is false, then we have to set it, since the kernel submission to the scheduler requires ordering on the SYCL layer.
The MNoLastEventMode is still used in queue_impl::wait, which is generic, so this may be a subject to future changes. The current handler-less path however updates that flag only for compatibility with the handler path.
I think there are three cases here:
- The LastEvent is set, then we simply add the dependency
- The LastEvent is not set - we are in the No Last Event Mode - we have to insert the barrier
- The LastEvent is not set - nothing was submitted to this queue yet - then we might use the MEmpty flag? But probably we have to move the MEmpty.store() to after this check.
Please let me know if this thinking makes sense and if it would be possible to replace the use of the MNoLastEventMode flag here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe replacing the MNoLastEventMode check with !MEmpty will get us the same behavior after moving around the assignment.
There is one scenario where we may be adding an additional barrier with in-order queue after looking deeper: If our queue timeline looks like kernel submit -> q.wait() -> graph with host task. I don't think a barrier is needed here since we have flushed the queue already. MEmpty is only set to true when the queue is initialized and is false after the first submission, so we will inject an unneeded barrier here. However, even with MNoLastEventMode, it should remain true after the wait and inject an unnecessary barrier as well. There is a queue_empty() operation we could use but it queries the UR adapter so probably has unnecessary overhead for the general case.
I think we can switch to use MEmpty since the above issue applies to both cases. Does this match your understanding of how the queue is scheduled?
Edit: I went ahead and added this suggestion.
sycl/source/detail/queue_impl.cpp
Outdated
| return {submit_kernel_scheduler_bypass(KData, CGData.MEvents, | ||
| CallerNeedsEvent, nullptr, nullptr, | ||
| CodeLoc, IsTopCodeLoc), | ||
| SchedulerBypass}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to just return true here (and false in the other two cases)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be easier to read since we see the value of the variable. Changed.
sycl/source/detail/queue_impl.cpp
Outdated
| detail::EventImplPtr queue_impl::submit_direct( | ||
| bool CallerNeedsEvent, sycl::span<const event> DepEvents, | ||
| SubmitCommandFuncType &SubmitCommandFunc, detail::CGType Type, | ||
| bool CommandFuncContainsHostTask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this argument could be called something like "InsertBarrierForCommandOrdering", which would make submit_direct more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to InsertBarrierForInOrderCommand since this special case only applies to in-order queues.
slawekptak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that overall the changes look great. Added a few minor comments.
* Use !MEmpty instead of MNoLastEventMode in handler-less path * Styling / variable naming
submit_graph_direct_with_event_implandsubmit_graph_direct_without_event_implto the ABI which invoke a handler-less path for graph submission forqueue::ext_oneapi_graphand the free functionexecute_graph.submit_directutility to be more general: support submissions which may contain host task, move scheduler bypass logic to callback functor, and parameterize submission CGType.