Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 50 additions & 2 deletions offload/plugins-nextgen/amdgpu/src/rtl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1115,6 +1115,18 @@ struct AMDGPUStreamTy {
return Plugin::success();
}

/// Complete pending post actions until and including the event in target
/// slot.
Error completeUntil(uint32_t TargetSlot) {
for (uint32_t Slot = 0; Slot <= TargetSlot; ++Slot) {
// Take the post action of the operation if any.
if (auto Err = Slots[Slot].performAction())
return Err;
}

return Plugin::success();
}

/// Make the current stream wait on a specific operation of another stream.
/// The idea is to make the current stream waiting on two signals: 1) the last
/// signal of the current stream, and 2) the last signal of the other stream.
Expand Down Expand Up @@ -1502,6 +1514,11 @@ struct AMDGPUStreamTy {
return complete();
}

/// Synchronize the stream until the given event. The current thread waits
/// until the provided event is finalized, and it performs the pending post
/// actions for that and prior events.
Error synchronizeOn(AMDGPUEventTy &Event);

/// Query the stream and complete pending post actions if operations finished.
/// Return whether all the operations completed. This operation does not block
/// the calling thread.
Expand Down Expand Up @@ -1575,6 +1592,21 @@ struct AMDGPUEventTy {
return Stream.waitEvent(*this);
}

Error sync() {
std::lock_guard<std::mutex> Lock(Mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the need for these mutexes? I'm just a little concerned when we have mutexes combined with functions that handle callbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cribbed this function from wait above, which does a similar thing and also protects the event with a mutex. But looking at them both closely, I'm not sure why either need to be guarded. The only way I can see it being required is if you wait/sync on an event that is in the middle of a record... But I'm not sure if that can ever happen. If we do remove the mutex, I think it makes sense to remove it entirely and as a separate change just to make it easier to track down if it does cause issues.

For synchronizeOn, I think it is required because another thread (or callback handler) could add, synchronise, wait or finalize an event while synchronizeOn is running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a quick look. Libompt allows you to "record" events after they are constructed, which I assume can also happen while you wait/sync on them, so I think the mutex is required.


if (!RecordedStream)
return Plugin::error(ErrorCode::INVALID_ARGUMENT,
"event does not have any recorded stream");

// No need to wait on anything, the recorded stream already finished the
// corresponding operation.
if (RecordedSlot < 0)
return Plugin::success();

return RecordedStream->synchronizeOn(*this);
}

protected:
/// The stream registered in this event.
AMDGPUStreamTy *RecordedStream;
Expand Down Expand Up @@ -1630,6 +1662,22 @@ Error AMDGPUStreamTy::waitEvent(const AMDGPUEventTy &Event) {
return waitOnStreamOperation(RecordedStream, Event.RecordedSlot);
}

Error AMDGPUStreamTy::synchronizeOn(AMDGPUEventTy &Event) {
std::lock_guard<std::mutex> Lock(Mutex);

// Wait until the requested slot has completed
if (auto Err = Slots[Event.RecordedSlot].Signal->wait(
StreamBusyWaitMicroseconds, &Device))
return Err;

// If the event is the last one in the stream, just do a full finalize
if (Event.RecordedSlot == last())
return complete();

// Otherwise, only finalize until the appropriate event
return completeUntil(Event.RecordedSlot);
}

struct AMDGPUStreamManagerTy final
: GenericDeviceResourceManagerTy<AMDGPUResourceRef<AMDGPUStreamTy>> {
using ResourceRef = AMDGPUResourceRef<AMDGPUStreamTy>;
Expand Down Expand Up @@ -2540,8 +2588,8 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {

/// Synchronize the current thread with the event.
Error syncEventImpl(void *EventPtr) override {
return Plugin::error(ErrorCode::UNIMPLEMENTED,
"synchronize event not implemented");
AMDGPUEventTy *Event = reinterpret_cast<AMDGPUEventTy *>(EventPtr);
return Event->sync();
}

/// Print information about the device.
Expand Down
3 changes: 0 additions & 3 deletions offload/unittests/OffloadAPI/common/Fixtures.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -171,9 +171,6 @@ struct OffloadQueueTest : OffloadDeviceTest {
struct OffloadEventTest : OffloadQueueTest {
void SetUp() override {
RETURN_ON_FATAL_FAILURE(OffloadQueueTest::SetUp());
if (getPlatformBackend() == OL_PLATFORM_BACKEND_AMDGPU)
GTEST_SKIP() << "AMDGPU synchronize event not implemented";

// Get an event from a memcpy. We can still use it in olGetEventInfo etc
// after it has been waited on.
void *Alloc;
Expand Down
3 changes: 0 additions & 3 deletions offload/unittests/OffloadAPI/event/olWaitEvent.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@ using olWaitEventTest = OffloadQueueTest;
OFFLOAD_TESTS_INSTANTIATE_DEVICE_FIXTURE(olWaitEventTest);

TEST_P(olWaitEventTest, Success) {
if (getPlatformBackend() == OL_PLATFORM_BACKEND_AMDGPU)
GTEST_SKIP() << "AMDGPU synchronize event not implemented";

uint32_t Src = 42;
void *DstPtr;

Expand Down
Loading