Skip to content

Commit 59a1b4a

Browse files
author
iclsrc
committed
Merge from 'sycl' to 'sycl-web'
2 parents 5c56e0a + 56f8d38 commit 59a1b4a

35 files changed

+235
-253
lines changed

sycl/doc/design/CommandGraph.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,13 @@ Level Zero:
277277
`waitForEvents` on the same command-list. Resulting in additional latency when
278278
executing a UR command-buffer.
279279

280+
3. Dependencies between multiple submissions must be handled by the runtime.
281+
Indeed, when a second submission is performed the signal conditions
282+
of *WaitEvent* are redefined by this second submission.
283+
Therefore, this can lead to an undefined behavior and potential
284+
hangs especially if the conditions of the first submissions were not yet
285+
satisfied and the event has not yet been signaled.
286+
280287
Future work will include exploring L0 API extensions to improve the mapping of
281288
UR command-buffer to L0 command-list.
282289

sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1059,9 +1059,10 @@ void
10591059
handler::ext_oneapi_graph(command_graph<graph_state::executable>& graph)
10601060
----
10611061

1062-
|Invokes the execution of a graph. Only one instance of `graph` may be executing,
1063-
or pending execution, at any time. Concurrent graph execution can be achieved by
1064-
finalizing a graph in modifiable state into multiple graphs in executable state.
1062+
|Invokes the execution of a graph. Only one instance of `graph` will
1063+
execute at any time. If `graph` is submitted multiple times, dependencies
1064+
are automatically added by the runtime to prevent concurrent executions of
1065+
an identical graph.
10651066

10661067
Parameters:
10671068

@@ -1073,8 +1074,6 @@ Exceptions:
10731074
to a queue which is associated with a device or context that is different
10741075
from the device and context used on creation of the graph.
10751076

1076-
* Throws synchronously with error code `invalid` if a previous submission of
1077-
`graph` has yet to complete execution.
10781077
|===
10791078

10801079
=== Thread Safety
@@ -1590,6 +1589,12 @@ outputs of the modifiable graph, a technique called _Whole Graph Update_. The
15901589
modifiable graph must have the same topology as the graph originally used to
15911590
create the executable graphs, with the nodes targeting the same devices and
15921591
added in the same order.
1592+
If a graph has been updated since its last submission, the sequential
1593+
execution constraint is no longer required.
1594+
The automatic addition of dependencies is disabled and updated graphs
1595+
can be submitted simultaneously.
1596+
Users are therefore responsible for explicitly managing potential dependencies
1597+
between these executions to avoid data races.
15931598

15941599
:sycl-kernel-function: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl-kernel-function
15951600

@@ -1719,15 +1724,6 @@ runtime.
17191724

17201725
== Issues
17211726

1722-
=== Simultaneous Graph Submission
1723-
1724-
Enable an instance of a graph in executable state to be submitted for execution
1725-
when a previous submission of the same graph has yet to complete execution.
1726-
1727-
**UNRESOLVED:** Trending "yes". Backend support for this is inconsistent, but
1728-
the runtime could schedule the submissions sequentially for backends which don't
1729-
support it.
1730-
17311727
=== Multi Device Graph
17321728

17331729
Allow an executable graph to contain nodes targeting different devices.
@@ -1792,6 +1788,9 @@ if used in application code.
17921788
`sycl::ext::intel::property::queue::no_immediate_command_list`
17931789
should be set on construction to any queues an executable
17941790
graph is submitted to.
1791+
. Synchronization between multiple executions of the same command-buffer
1792+
must be handled in the host for level-zero backend, which may involve
1793+
extra latency for subsequent submissions.
17951794

17961795
== Revision History
17971796

sycl/include/sycl/queue.hpp

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -412,9 +412,7 @@ class __SYCL_EXPORT queue : public detail::OwnerLessBase<queue> {
412412
/// \return a SYCL event object, which corresponds to the queue the command
413413
/// group is being enqueued on.
414414
event ext_oneapi_submit_barrier(
415-
const detail::code_location &CodeLoc = detail::code_location::current()) {
416-
return submit([=](handler &CGH) { CGH.ext_oneapi_barrier(); }, CodeLoc);
417-
}
415+
const detail::code_location &CodeLoc = detail::code_location::current());
418416

419417
/// Prevents any commands submitted afterward to this queue from executing
420418
/// until all events in WaitList have entered the complete state. If WaitList
@@ -427,10 +425,7 @@ class __SYCL_EXPORT queue : public detail::OwnerLessBase<queue> {
427425
/// group is being enqueued on.
428426
event ext_oneapi_submit_barrier(
429427
const std::vector<event> &WaitList,
430-
const detail::code_location &CodeLoc = detail::code_location::current()) {
431-
return submit([=](handler &CGH) { CGH.ext_oneapi_barrier(WaitList); },
432-
CodeLoc);
433-
}
428+
const detail::code_location &CodeLoc = detail::code_location::current());
434429

435430
/// Performs a blocking wait for the completion of all enqueued tasks in the
436431
/// queue.

sycl/source/detail/graph_impl.cpp

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -772,12 +772,40 @@ exec_graph_impl::enqueue(const std::shared_ptr<sycl::detail::queue_impl> &Queue,
772772
CurrentPartition->MPiCommandBuffers[Queue->get_device()];
773773

774774
if (CommandBuffer) {
775-
if (!previousSubmissionCompleted()) {
776-
throw sycl::exception(
777-
make_error_code(errc::invalid),
778-
"This Graph cannot be submitted at the moment "
779-
"because the previous run has not yet completed.");
775+
// if previous submissions are incompleted, we automatically
776+
// add completion events of previous submissions as dependencies.
777+
// With Level-Zero backend we cannot resubmit a command-buffer until the
778+
// previous one has already completed.
779+
// Indeed, since a command-list does not accept a list a dependencies at
780+
// submission, we circumvent this lack by adding a barrier that waits on a
781+
// specific event and then define the conditions to signal this event in
782+
// another command-list. Consequently, if a second submission is
783+
// performed, the signal conditions of this single event are redefined by
784+
// this second submission. Thus, this can lead to an undefined behaviour
785+
// and potential hangs. We have therefore to expliclty wait in the host
786+
// for previous submission to complete before resubmitting the
787+
// command-buffer for level-zero backend.
788+
// TODO : add a check to release this constraint and allow multiple
789+
// concurrent submissions if the exec_graph has been updated since the
790+
// last submission.
791+
for (std::vector<sycl::detail::EventImplPtr>::iterator It =
792+
MExecutionEvents.begin();
793+
It != MExecutionEvents.end();) {
794+
auto Event = *It;
795+
if (!Event->isCompleted()) {
796+
if (Queue->get_device().get_backend() ==
797+
sycl::backend::ext_oneapi_level_zero) {
798+
Event->wait(Event);
799+
} else {
800+
CGData.MEvents.push_back(Event);
801+
}
802+
++It;
803+
} else {
804+
// Remove completed events
805+
It = MExecutionEvents.erase(It);
806+
}
780807
}
808+
781809
NewEvent = CreateNewEvent();
782810
sycl::detail::pi::PiEvent *OutEvent = &NewEvent->getHandleRef();
783811
// Merge requirements from the nodes into requirements (if any) from the

sycl/source/detail/queue_impl.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,11 @@ event queue_impl::memcpyFromDeviceGlobal(
356356
return MDiscardEvents ? createDiscardedEvent() : ResEvent;
357357
}
358358

359+
event queue_impl::getLastEvent() const {
360+
std::lock_guard<std::mutex> Lock{MLastEventMtx};
361+
return MDiscardEvents ? createDiscardedEvent() : MLastEvent;
362+
}
363+
359364
void queue_impl::addEvent(const event &Event) {
360365
EventImplPtr EImpl = getSyclObjImpl(Event);
361366
assert(EImpl && "Event implementation is missing");

sycl/source/detail/queue_impl.hpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,8 @@ class queue_impl {
202202
#endif
203203
}
204204

205+
event getLastEvent() const;
206+
205207
private:
206208
void queue_impl_interop(sycl::detail::pi::PiQueue PiQueue) {
207209
if (has_property<ext::oneapi::property::queue::discard_events>() &&

sycl/source/queue.cpp

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,38 @@ void queue::wait_and_throw_proxy(const detail::code_location &CodeLoc) {
192192
impl->wait_and_throw(CodeLoc);
193193
}
194194

195+
/// Prevents any commands submitted afterward to this queue from executing
196+
/// until all commands previously submitted to this queue have entered the
197+
/// complete state.
198+
///
199+
/// \param CodeLoc is the code location of the submit call (default argument)
200+
/// \return a SYCL event object, which corresponds to the queue the command
201+
/// group is being enqueued on.
202+
event queue::ext_oneapi_submit_barrier(const detail::code_location &CodeLoc) {
203+
if (is_in_order())
204+
return impl->getLastEvent();
205+
206+
return submit([=](handler &CGH) { CGH.ext_oneapi_barrier(); }, CodeLoc);
207+
}
208+
209+
/// Prevents any commands submitted afterward to this queue from executing
210+
/// until all events in WaitList have entered the complete state. If WaitList
211+
/// is empty, then ext_oneapi_submit_barrier has no effect.
212+
///
213+
/// \param WaitList is a vector of valid SYCL events that need to complete
214+
/// before barrier command can be executed.
215+
/// \param CodeLoc is the code location of the submit call (default argument)
216+
/// \return a SYCL event object, which corresponds to the queue the command
217+
/// group is being enqueued on.
218+
event queue::ext_oneapi_submit_barrier(const std::vector<event> &WaitList,
219+
const detail::code_location &CodeLoc) {
220+
if (is_in_order() && WaitList.empty())
221+
return impl->getLastEvent();
222+
223+
return submit([=](handler &CGH) { CGH.ext_oneapi_barrier(WaitList); },
224+
CodeLoc);
225+
}
226+
195227
template <typename Param>
196228
typename detail::is_queue_info_desc<Param>::return_type
197229
queue::get_info() const {

sycl/test-e2e/Basic/vector_byte.cpp renamed to sycl/test-e2e/Basic/vector/byte.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,11 @@
1212
//
1313
//===----------------------------------------------------------------------===//
1414

15-
#define SYCL_SIMPLE_SWIZZLES
1615
#include <sycl/sycl.hpp>
1716

17+
#include <cstddef> // std::byte
18+
#include <tuple> // std::ignore
19+
1820
int main() {
1921
std::byte bt{7};
2022
// constructors
@@ -30,7 +32,7 @@ int main() {
3032
// operator[]
3133
assert(vb16[3] == std::byte{2});
3234
// explicit conversion
33-
std::byte(vb1.x());
35+
std::ignore = std::byte(vb1.x());
3436
std::byte b = vb1;
3537

3638
// operator=

0 commit comments

Comments
 (0)