-
Notifications
You must be signed in to change notification settings - Fork 798
[SYCL] Fix discarded enqueue function event markings #16223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Fix discarded enqueue function event markings #16223
Conversation
This commit fixes an issue where memory operations enqueued through the enqueue free functions would not correctly mark the resulting events as discarded, breaking in-order barrier assumptions. Signed-off-by: Larsen, Steffen <[email protected]>
|
Can confirm, this fixes #15606 as reported. But on the application level, I now get "wait method cannot be used for a discarded event" exception from Example trace: 16223.log (the "MPI rank" in the error message refers to a thread, we're using our thread-based MPI imitation). |
Thank you, @al42and! I will convert this to draft while I investigate. It looks like the problem comes after a mem-fill operation, so there must be some dependencies that don't know how to handle the case where they are discarded. Update: I have not yet been able to reproduce the failure. @al42and - Would you be able to provide a stack trace from the throw-site? Hope is it can tell me what kind of dependencies are causing this. |
Since I'm lazy to rebuild LLVM in debug mode, printf-based tracing suggest that:
Running with |
#include <iostream>
#include <sycl/sycl.hpp>
#include <thread>
#include <unistd.h>
static constexpr int nthreads = 2;
static constexpr int niter = 20;
void threadFunction(int tid) {
sycl::device dev(sycl::gpu_selector_v);
std::cout << dev.get_info<sycl::info::device::name>() << std::endl;
sycl::queue q{dev, {sycl::property::queue::in_order()}};
constexpr int size = 128 * 128 * 128;
int *d_buf = sycl::malloc_device<int>(size, q);
int *h_buf = sycl::malloc_host<int>(size, q);
const sycl::nd_range<1> range1D{{size}, {128}};
std::vector<sycl::event> evs;
for (int i = 0; i < niter; i++) {
evs.push_back(q.memcpy(h_buf, d_buf, size * sizeof(int)));
sycl::ext::oneapi::experimental::submit(
q, [&](sycl::handler &cgh) { cgh.fill<int>(d_buf, 1, size); });
}
q.wait_and_throw();
std::cout << "After waiting for the queue" << std::endl;
std::cout << h_buf[0] << std::endl;
sycl::free(d_buf, q);
sycl::free(h_buf, q);
}
int main() {
std::array<std::thread, nthreads> threads;
for (int i = 0; i < nthreads; i++) {
threads[i] = std::thread{threadFunction, i};
}
for (int i = 0; i < nthreads; i++) {
threads[i].join();
}
std::cout << "All threads have finished." << std::endl;
return 0;
}$ clang++ -fsycl 16223.cpp && ONEAPI_DEVICE_SELECTOR=opencl:gpu ./a.out
Intel(R) Arc(TM) A770 Graphics
Intel(R) Arc(TM) A770 Graphics
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): wait method cannot be used for a discarded event.
Aborted (core dumped)Looks like it's not a specific operation, but some smart pointer lifetime issue? Also, this feels like a nasty unrelated issue of pointer re-use that only now is surfacing, but I could easily be wrong here. |
Signed-off-by: Larsen, Steffen <[email protected]>
|
Thanks a ton, @al42and ! I believe the problematic cases have now been addressed and I have adapted your code into a smaller regression test. I could not find your signature on previous commit, so please let me know if you would like to be added as co-author. |
Signed-off-by: Larsen, Steffen <[email protected]>
Thank you! Can confirm that it all works now.
|
cperkinsintel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
|
||
| if (Event) | ||
| MEvent->setHandle(*Event); | ||
| SetEventHandleOrDiscard(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3 x N
|
@steffenlarsen: Thanks again for the fix! Do you know if this is going into the next patch release of oneAPI 2025.0 (productized version, not open-source one), or will it wait for oneAPI 2025.1? Need to know which versions to warn about :) |
Sadly it won't make 2025.0, but I will do what I can to make sure it gets into the following minor release. |
This commit fixes an issue where memory operations enqueued through the enqueue free functions would not correctly mark the resulting events as discarded, breaking in-order barrier assumptions.
Fixes #15606.
Co-authored-by: Andrey Alekseenko [email protected]