Skip to content

Conversation

@mmichel11
Copy link
Contributor

@mmichel11 mmichel11 commented Nov 19, 2025

  • Adds submit_graph_direct_with_event_impl and submit_graph_direct_without_event_impl to the ABI which invoke a handler-less path for graph submission for queue::ext_oneapi_graph and the free function execute_graph.
  • Adjusts handler-less submit_direct utility to be more general: support submissions which may contain host task, move scheduler bypass logic to callback functor, and parameterize submission CGType.
  • Extends graph test coverage to cover identified gaps: recording handler-less graph submission and host task cases with dependencies.

@mmichel11 mmichel11 force-pushed the matt/handlerless_graph_submit branch from 5dc81c5 to 098a334 Compare November 19, 2025 19:49
@mmichel11 mmichel11 force-pushed the matt/handlerless_graph_submit branch from 098a334 to 3617bc9 Compare November 19, 2025 20:36
@mmichel11 mmichel11 force-pushed the matt/handlerless_graph_submit branch from e3a6ba2 to da7910d Compare November 20, 2025 04:43
@mmichel11 mmichel11 marked this pull request as ready for review November 20, 2025 19:12
@mmichel11 mmichel11 requested review from a team as code owners November 20, 2025 19:12
hasCommandGraph() ? getCommandGraph().get() : nullptr,
detail::CGType::Kernel);
hasCommandGraph() ? getCommandGraph().get() : nullptr, Type);
} else if (inOrder && MNoLastEventMode && CommandFuncContainsHostTask) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that checking MNoLastEventMode here works (as this is what the handler path does). One of my longer term goals for the handler-less path however, was to avoid using MNoLastEventMode synchronization flag. Instead of checking that flag, my plan was to rely on the LastEvent being set. If SchedulerBypass is true, then we just unset it, since the lower layers take care of the ordering. If it is false, then we have to set it, since the kernel submission to the scheduler requires ordering on the SYCL layer.
The MNoLastEventMode is still used in queue_impl::wait, which is generic, so this may be a subject to future changes. The current handler-less path however updates that flag only for compatibility with the handler path.
I think there are three cases here:

  1. The LastEvent is set, then we simply add the dependency
  2. The LastEvent is not set - we are in the No Last Event Mode - we have to insert the barrier
  3. The LastEvent is not set - nothing was submitted to this queue yet - then we might use the MEmpty flag? But probably we have to move the MEmpty.store() to after this check.

Please let me know if this thinking makes sense and if it would be possible to replace the use of the MNoLastEventMode flag here.

Copy link
Contributor Author

@mmichel11 mmichel11 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe replacing the MNoLastEventMode check with !MEmpty will get us the same behavior after moving around the assignment.

There is one scenario where we may be adding an additional barrier with in-order queue after looking deeper: If our queue timeline looks like kernel submit -> q.wait() -> graph with host task. I don't think a barrier is needed here since we have flushed the queue already. MEmpty is only set to true when the queue is initialized and is false after the first submission, so we will inject an unneeded barrier here. However, even with MNoLastEventMode, it should remain true after the wait and inject an unnecessary barrier as well. There is a queue_empty() operation we could use but it queries the UR adapter so probably has unnecessary overhead for the general case.

I think we can switch to use MEmpty since the above issue applies to both cases. Does this match your understanding of how the queue is scheduled?

Edit: I went ahead and added this suggestion.

return {submit_kernel_scheduler_bypass(KData, CGData.MEvents,
CallerNeedsEvent, nullptr, nullptr,
CodeLoc, IsTopCodeLoc),
SchedulerBypass};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to just return true here (and false in the other two cases)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be easier to read since we see the value of the variable. Changed.

detail::EventImplPtr queue_impl::submit_direct(
bool CallerNeedsEvent, sycl::span<const event> DepEvents,
SubmitCommandFuncType &SubmitCommandFunc, detail::CGType Type,
bool CommandFuncContainsHostTask) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this argument could be called something like "InsertBarrierForCommandOrdering", which would make submit_direct more generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to InsertBarrierForInOrderCommand since this special case only applies to in-order queues.

Copy link
Contributor

@slawekptak slawekptak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that overall the changes look great. Added a few minor comments.

* Use !MEmpty instead of MNoLastEventMode in handler-less path
* Styling / variable naming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants