Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion sycl/source/detail/queue_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -618,8 +618,9 @@ void queue_impl::wait(const detail::code_location &CodeLoc) {
WeakEvents.swap(MEventsWeak);
SharedEvents.swap(MEventsShared);

{
if (MAreCleanupRequestsMissed.load(std::memory_order_acquire)) {
std::lock_guard<std::mutex> RequestLock(MMissedCleanupRequestsMtx);
MAreCleanupRequestsMissed.store(false, std::memory_order_release);
for (auto &UpdatedGraph : MMissedCleanupRequests)
doUnenqueuedCommandCleanup(UpdatedGraph);
MMissedCleanupRequests.clear();
Expand Down Expand Up @@ -801,6 +802,7 @@ void queue_impl::revisitUnenqueuedCommandsState(
else {
std::lock_guard<std::mutex> RequestLock(MMissedCleanupRequestsMtx);
MMissedCleanupRequests.push_back(CompletedHostTask->getCommandGraph());
MAreCleanupRequestsMissed.store(true, std::memory_order_release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the following not an issue:

Thread A        Thread B
acquire mtx
                check atomic without mutex, empty
push_back
atomic_store_true

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is exactly the question I would like to discuss with the fellow sycl experts!

According to my understanding, the change is correct, because the situation you describing is undistinguishing from a situation when Thread A just staying, say, at the beginning of queue_impl::revisitUnenqueuedCommandsState(), i.e. there is no reason for Thread B to expect something in MMissedCleanupRequests at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergey-semenov , @KseniyaTikhomirova would be much more knowledgeable about this than I am.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @KseniyaTikhomirova about this today. Like Alexandr said, the case is indistinguishable from the first thread being earlier in its execution path.
With the synchronization being made less strict, it will be slightly more likely that we'll exit queue::wait with some dependency related data still present in the queue. But that's not a functional problem since that data will be handled during the next cleanup call.

}
}

Expand Down
20 changes: 15 additions & 5 deletions sycl/source/detail/queue_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -711,13 +711,18 @@ class queue_impl {

void setExternalEvent(const event &Event) {
std::lock_guard<std::mutex> Lock(MInOrderExternalEventMtx);
MHasValueInOrderExternalEvent.store(true, std::memory_order_release);
MInOrderExternalEvent = Event;
}

std::optional<event> popExternalEvent() {
std::lock_guard<std::mutex> Lock(MInOrderExternalEventMtx);
std::optional<event> Result = std::nullopt;
std::swap(Result, MInOrderExternalEvent);

if (MHasValueInOrderExternalEvent.load(std::memory_order_acquire)) {
std::lock_guard<std::mutex> Lock(MInOrderExternalEventMtx);
MHasValueInOrderExternalEvent.store(false, std::memory_order_release);
std::swap(Result, MInOrderExternalEvent);
}
return Result;
}

Expand Down Expand Up @@ -833,8 +838,10 @@ class queue_impl {
// dependency so in the case where some commands were not enqueued
// (blocked), we track them to prevent barrier from being enqueued
// earlier.
if (MAreCleanupRequestsMissed.load(std::memory_order_acquire))
{
std::lock_guard<std::mutex> RequestLock(MMissedCleanupRequestsMtx);
MAreCleanupRequestsMissed.store(false, std::memory_order_release);
for (auto &UpdatedGraph : MMissedCleanupRequests)
doUnenqueuedCommandCleanup(UpdatedGraph);
MMissedCleanupRequests.clear();
Expand All @@ -850,12 +857,12 @@ class queue_impl {
auto EventRet = Handler.finalize();
EventImplPtr EventRetImpl = getSyclObjImpl(EventRet);
if (Type == CGType::CodeplayHostTask)
Deps.UnenqueuedCmdEvents.push_back(EventRetImpl);
Deps.UnenqueuedCmdEvents.push_back(std::move(EventRetImpl));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSyclObjImpl(EventRet) vs std::move(EventRetImpl) is almost equal, I don't think the temporary variable is justified anymore. IMO, inline it and maybe add a comment for the future as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's turn out this was already modified on sycl branch (reference used instead). So, dropped from the patch.

else if (Type == CGType::Barrier || Type == CGType::BarrierWaitlist) {
Deps.LastBarrier = EventRetImpl;
Deps.LastBarrier = std::move(EventRetImpl);
Deps.UnenqueuedCmdEvents.clear();
} else if (!EventRetImpl->isEnqueued()) {
Deps.UnenqueuedCmdEvents.push_back(EventRetImpl);
Deps.UnenqueuedCmdEvents.push_back(std::move(EventRetImpl));
}

return EventRet;
Expand Down Expand Up @@ -1040,6 +1047,8 @@ class queue_impl {
// the fallback implementation of profiling info
bool MFallbackProfiling = false;

// Is value presented in MInOrderExternalEvent?
std::atomic_bool MHasValueInOrderExternalEvent = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name is confusing, maybe MInOrderExteranEventIsSet would be better?

Second, is it correct to assume that we still want the optional because default-constructed/empty event is heavy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, is it correct to assume that we still want the optional because default-constructed/empty event is heavy?

Yep, one is heavy. I hope one day we able to implement an empty sycl::event with just empty std::shared_ptr<detail::event_impl> impl, then optional became unneeded.

// This event can be optionally provided by users for in-order queues to add
// an additional dependency for the subsequent submission in to the queue.
// Access to the event should be guarded with MInOrderExternalEventMtx.
Expand Down Expand Up @@ -1071,6 +1080,7 @@ class queue_impl {
std::deque<std::shared_ptr<ext::oneapi::experimental::detail::graph_impl>>
MMissedCleanupRequests;
std::mutex MMissedCleanupRequestsMtx;
std::atomic_bool MAreCleanupRequestsMissed = false;

friend class sycl::ext::oneapi::experimental::detail::node_impl;

Expand Down