-
Notifications
You must be signed in to change notification settings - Fork 42
Increase refcount to ze_loader/CUDA libraries when Level Zero/CUDA providers are used #1086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
809e6ab to
a24a920
Compare
0e2989c to
e75bed8
Compare
|
I know this is probably not easy, but could we add the simplest test for this case? |
yeah, I am working on it, that's why this PR is a draft. |
e75bed8 to
2bdefbd
Compare
63c5902 to
96624e6
Compare
42576d2 to
8246397
Compare
6b0e8e8 to
6271502
Compare
|
@vinser52 @bratpiorka @pbalcer This PR fixes intel/llvm#16944 |
6271502 to
7a78113
Compare
7a78113 to
664484f
Compare
Ok, new tests have been added. Actually, I split the Without my changes the |
so this PR is ready for the review. |
|
@lukaszstolarczuk @bratpiorka @pbalcer Hanging on Windows CUDA CI builds ... |
Description
Prior to this PR, the Level Zero provider did not
dlopen(“libze_loader.so”)at all. We suppose that UMF’s client, who uses Level Zero provider, loadedlibze_loader.sointo the process. But what if there are two clients in the process:libze_loader.sointo the process and uses the Level Zero provider. In that case, UMF inits Level Zero symbols viadlsym. Then the first client destroys the Level Zero Memory provider and unloads thelibze_loader.so.libze_loader.sointo the process and uses the Level Zero provider. But our current implementation does not catch such situation and Level Zero symbols are considered initialized, but thedlsymshould be done again.This PR fixes that. The UMF acquires handle to the
libze_loader.soviadlopen. There is aRTLD_NOLOADflag that tellsdlopennot to load the library and succeed only if the library is already loaded. It allows to increase the refcount to thelibze_loader.soand not unload it when the first client callsdlclose.This PR should fix the #926.
Fixes: intel/llvm#16944 (confirmed by @ldorau)
Checklist