From cba2b47552031d640d3f22c678807a117e0300f8 Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Fri, 21 Feb 2025 11:53:37 -0800 Subject: [PATCH 01/21] fix for windows shutdown, minus some reviewer feedback. See sycl/doc/design/GlobalObjectsInRuntime.md for a full technical description --- sycl/doc/design/GlobalObjectsInRuntime.md | 73 ++++++++++++++++-- sycl/source/detail/global_handler.cpp | 83 ++++++++++++++++----- sycl/source/detail/global_handler.hpp | 3 +- sycl/source/detail/host_task.hpp | 8 ++ sycl/source/detail/platform_impl.cpp | 2 +- sycl/source/detail/queue_impl.hpp | 10 ++- sycl/source/detail/scheduler/scheduler.cpp | 34 +++++++-- sycl/source/detail/thread_pool.hpp | 2 + sycl/unittests/CMakeLists.txt | 1 - sycl/unittests/buffer/BufferReleaseBase.hpp | 1 - sycl/unittests/windows/CMakeLists.txt | 4 - sycl/unittests/windows/dllmain.cpp | 8 ++ 12 files changed, 187 insertions(+), 42 deletions(-) delete mode 100644 sycl/unittests/windows/CMakeLists.txt diff --git a/sycl/doc/design/GlobalObjectsInRuntime.md b/sycl/doc/design/GlobalObjectsInRuntime.md index b56dd7767d108..5606adf754d7d 100644 --- a/sycl/doc/design/GlobalObjectsInRuntime.md +++ b/sycl/doc/design/GlobalObjectsInRuntime.md @@ -55,9 +55,65 @@ Deinitialization is platform-specific. Upon application shutdown, the DPC++ runtime frees memory pointed by `GlobalHandler` global pointer, which triggers destruction of nested `std::unique_ptr`s. +### Shutdown Tasks and Challenges + +As the user's app ends, SYCL's primary goal is to release any UR adapters that +have been gotten, and teardown the plugins/adapters themselves. Additionally, +we need to stop deferring any new buffer releases and clean up any memory +whose release was deferred. + +To this end, the shutdown occurs in two phases: early and late. The purpose +for eary shutdown is primarily to stop any further deferring of memory release. +This is because the deferred memory release is based on threads and on Windows +the threads will be abandoned. So as soon as possible we want to stop deferring +memory and try to let go any that has been deferred. The purpose for late +shutdown is to hold onto the handles and adapters longer than the user's +application. We don't want to initiate late shutdown until after all the users +static and thread local vars have been destroyed, in case those destructors are +calling SYCL. + +In the early shutdown we stop deferring, tell the scheduler to prepare for release, and +try releasing the memory that has been deferred so far. Following this, if +the user has any global or static handles to sycl objects, they'll be destroyed. +Finally, the late shutdown routine is called the last of the UR handles and +adapters are let go, as is the GlobalHandler itself. + + +#### Threads +The deferred memory marshalling is built on a thread pool, but there is a +challenge here in that on Windows, once the end of the users main() is reached +and their app is shutting down, the Windows OS will abandon all remaining +in-flight threads. These threads can be .join() but they simply return instantly, +the threads are not completed. Further any thread specific variables +(or thread_local static vars) will NOT have their destructors called. Note +that the standard while-loop-over-condition-var pattern will cause a hang - +we cannot "wait" on abandoned threads. +On Windows, short of adding some user called API to signal this, there is +no way to detect or avoid this. None of the "end-of-library" lifecycle events +occurs before the threads are abandoned. ( not std::atexit(), not globals or +static, or static thread_local var destruction, not DllMain(DLL_PROCESS_DETACH) ) +This means that on Windows, once we arrive at shutdown_early we cannot wait on +host events or the thread pool. + +For the deferred memory itself, there is no issue here. The Windows OS will +reclaim the memory for us. The issue of which we must be wary is placing UR +handles (and similar) in host threads. The RAII mechanism of unique and +shared pointers will not work in any thread that is abandoned on Windows. + +One last note about threads. It is entirely the OS's discretion when to +start or schedule a thread. If the main process is very busy then it is +possible that threads the SYCL library creates (host_tasks/thread_pool) +won't even be started until AFTER the host application main() function is done. +This is not a normal occurrence, but it can happen if there is no call to queue.wait() + + ### Linux -On Linux DPC++ runtime uses `__attribute__((destructor))` property with low +On Linux, the "eary_shutdown()" is begun by the destruction of a static +StaticVarShutdownHandler object, which is initialized by +platform::get_platforms(). + +late_shutdown() timing uses `__attribute__((destructor))` property with low priority value 110. This approach does not guarantee, that `GlobalHandler` destructor is the last thing to run, as user code may contain a similar function with the same priority value. At the same time, users may specify priorities @@ -72,10 +128,14 @@ times, the memory leak may impact code performance. ### Windows -To identify shutdown moment on Windows, DPC++ runtime uses default `DllMain` -function with `DLL_PROCESS_DETACH` reason. This guarantees, that global objects -deinitialization happens right before `sycl.dll` is unloaded from process -address space. +Differing from Linux, on Windows the "early_shutdown()" is begun by +DllMain(PROCESS_DETACH), unless statically linked. + +The "late_shutdown()" is begun by the destruction of a +static StaticVarShutdownHandler object, which is initialized by +platform::get_platforms(). ( On linux, this is when we do "early_shutdown()". +Go figure.) This is as late as we can manage, but it is later than any user +application global, static, or thread_local variable destruction. ### Recommendations for DPC++ runtime developers @@ -109,8 +169,7 @@ for (adapter in initializedAdapters) { urLoaderTearDown(); ``` -Which in turn is called by either `shutdown_late()` or `shutdown_win()` -depending on platform. +Which in turn is called by `shutdown_late()`. ![](images/adapter-lifetime.jpg) diff --git a/sycl/source/detail/global_handler.cpp b/sycl/source/detail/global_handler.cpp index 5669fbdaacc50..176b74a08548d 100644 --- a/sycl/source/detail/global_handler.cpp +++ b/sycl/source/detail/global_handler.cpp @@ -37,9 +37,11 @@ using LockGuard = std::lock_guard; SpinLock GlobalHandler::MSyclGlobalHandlerProtector{}; // forward decl -void shutdown_win(); // TODO: win variant will go away soon void shutdown_early(); void shutdown_late(); +#ifdef _WIN32 +BOOL isLinkedStatically(); +#endif // Utility class to track references on object. // Used for GlobalHandler now and created as thread_local object on the first @@ -237,24 +239,37 @@ void GlobalHandler::releaseDefaultContexts() { MPlatformToDefaultContextCache.Inst.reset(nullptr); } -struct EarlyShutdownHandler { - ~EarlyShutdownHandler() { +// Shutdown is split into two parts. shutdown_early() stops any more +// objects from being deferred and takes an initial pass at freeing them. +// shutdown_late() finishes and releases the adapters and the GlobalHandler. +// For Windows, early shutdown is typically called from DllMain, +// and late shutdown is here. +// For Linux, early shutdown is here, and late shutdown is called from +// a low priority destructor. +struct StaticVarShutdownHandler { + + ~StaticVarShutdownHandler() { try { #ifdef _WIN32 - // on Windows we keep to the existing shutdown procedure - GlobalHandler::instance().releaseDefaultContexts(); + // If statically linked, DllMain will not be called. So we do its work + // here. + if (isLinkedStatically()) { + shutdown_early(); + } + + shutdown_late(); #else shutdown_early(); #endif } catch (std::exception &e) { - __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~EarlyShutdownHandler", - e); + __SYCL_REPORT_EXCEPTION_TO_STREAM( + "exception in ~StaticVarShutdownHandler", e); } } }; -void GlobalHandler::registerEarlyShutdownHandler() { - static EarlyShutdownHandler handler{}; +void GlobalHandler::registerStaticVarShutdownHandler() { + static StaticVarShutdownHandler handler{}; } bool GlobalHandler::isOkToDefer() const { return OkToDefer; } @@ -287,10 +302,10 @@ void GlobalHandler::prepareSchedulerToRelease(bool Blocking) { #ifndef _WIN32 if (Blocking) drainThreadPool(); +#endif if (MScheduler.Inst) MScheduler.Inst->releaseResources(Blocking ? BlockingT::BLOCKING : BlockingT::NON_BLOCKING); -#endif } void GlobalHandler::drainThreadPool() { @@ -316,6 +331,12 @@ void shutdown_early() { if (!Handler) return; +#if defined(XPTI_ENABLE_INSTRUMENTATION) && defined(_WIN32) + if (xptiTraceEnabled()) + return; // When doing xpti tracing, we can't safely shutdown on Win. + // TODO: figure out why XPTI prevents release. +#endif + // Now that we are shutting down, we will no longer defer MemObj releases. Handler->endDeferredRelease(); @@ -337,6 +358,12 @@ void shutdown_late() { if (!Handler) return; +#if defined(XPTI_ENABLE_INSTRUMENTATION) && defined(_WIN32) + if (xptiTraceEnabled()) + return; // When doing xpti tracing, we can't safely shutdown on Win. + // TODO: figure out why XPTI prevents release. +#endif + // First, release resources, that may access adapters. Handler->MPlatformCache.Inst.reset(nullptr); Handler->MScheduler.Inst.reset(nullptr); @@ -374,23 +401,18 @@ extern "C" __SYCL_EXPORT BOOL WINAPI DllMain(HINSTANCE hinstDLL, if (PrintUrTrace) std::cout << "---> DLL_PROCESS_DETACH syclx.dll\n" << std::endl; -#ifdef XPTI_ENABLE_INSTRUMENTATION - if (xptiTraceEnabled()) - return TRUE; // When doing xpti tracing, we can't safely call shutdown. - // TODO: figure out what XPTI is doing that prevents - // release. -#endif - try { - shutdown_win(); + shutdown_early(); } catch (std::exception &e) { - __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in shutdown_win", e); + std::cout << "exception in DLL_PROCESS_DETACH" << e.what() << std::endl; return FALSE; } + break; case DLL_PROCESS_ATTACH: if (PrintUrTrace) std::cout << "---> DLL_PROCESS_ATTACH syclx.dll\n" << std::endl; + break; case DLL_THREAD_ATTACH: break; @@ -399,6 +421,29 @@ extern "C" __SYCL_EXPORT BOOL WINAPI DllMain(HINSTANCE hinstDLL, } return TRUE; // Successful DLL_PROCESS_ATTACH. } +BOOL isLinkedStatically() { + // If the exePath is the same as the dllPath, + // or if the module handle for DllMain is not retrievable, + // then we are linked statically + // Otherwise we are dynamically linked or loaded. + HMODULE hModule = nullptr; + auto LpModuleAddr = reinterpret_cast(&DllMain); + if (GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, LpModuleAddr, + &hModule)) { + char dllPath[MAX_PATH]; + if (GetModuleFileNameA(hModule, dllPath, MAX_PATH)) { + char exePath[MAX_PATH]; + if (GetModuleFileNameA(NULL, exePath, MAX_PATH)) { + if (std::string(dllPath) == std::string(exePath)) { + return true; + } + } + } + } else { + return true; + } + return false; +} #else // Setting low priority on destructor ensures it runs after all other global // destructors. Priorities 0-100 are reserved by the compiler. The priority diff --git a/sycl/source/detail/global_handler.hpp b/sycl/source/detail/global_handler.hpp index 4b834927e3832..71e28eaf8e60b 100644 --- a/sycl/source/detail/global_handler.hpp +++ b/sycl/source/detail/global_handler.hpp @@ -73,7 +73,7 @@ class GlobalHandler { XPTIRegistry &getXPTIRegistry(); ThreadPool &getHostTaskThreadPool(); - static void registerEarlyShutdownHandler(); + static void registerStaticVarShutdownHandler(); bool isOkToDefer() const; void endDeferredRelease(); @@ -95,7 +95,6 @@ class GlobalHandler { bool OkToDefer = true; - friend void shutdown_win(); friend void shutdown_early(); friend void shutdown_late(); friend class ObjectUsageCounter; diff --git a/sycl/source/detail/host_task.hpp b/sycl/source/detail/host_task.hpp index 5f7ae11c6a0e4..65061c2f83e42 100644 --- a/sycl/source/detail/host_task.hpp +++ b/sycl/source/detail/host_task.hpp @@ -33,6 +33,10 @@ class HostTask { bool isInteropTask() const { return !!MInteropTask; } void call(HostProfilingInfo *HPI) { + if (!GlobalHandler::instance().isOkToDefer()) { + return; + } + if (HPI) HPI->start(); MHostTask(); @@ -41,6 +45,10 @@ class HostTask { } void call(HostProfilingInfo *HPI, interop_handle handle) { + if (!GlobalHandler::instance().isOkToDefer()) { + return; + } + if (HPI) HPI->start(); MInteropTask(handle); diff --git a/sycl/source/detail/platform_impl.cpp b/sycl/source/detail/platform_impl.cpp index b27b95c1f1938..2d454bb09e7c2 100644 --- a/sycl/source/detail/platform_impl.cpp +++ b/sycl/source/detail/platform_impl.cpp @@ -166,7 +166,7 @@ std::vector platform_impl::get_platforms() { // This initializes a function-local variable whose destructor is invoked as // the SYCL shared library is first being unloaded. - GlobalHandler::registerEarlyShutdownHandler(); + GlobalHandler::registerStaticVarShutdownHandler(); return Platforms; } diff --git a/sycl/source/detail/queue_impl.hpp b/sycl/source/detail/queue_impl.hpp index bf18e97c50fca..7bb22bc8296f8 100644 --- a/sycl/source/detail/queue_impl.hpp +++ b/sycl/source/detail/queue_impl.hpp @@ -264,7 +264,15 @@ class queue_impl { destructorNotification(); #endif throw_asynchronous(); - getAdapter()->call(MQueues[0]); + auto status = + getAdapter()->call_nocheck(MQueues[0]); + // if loader is already closed, it'll return a not-initialized status + // which the UR should convert to SUCCESS code. But that isn't always + // working on Windows. This is a temporary workaround until that is fixed. + if (status != UR_RESULT_SUCCESS && + status != UR_RESULT_ERROR_UNINITIALIZED) { + __SYCL_CHECK_UR_CODE_NO_EXC(status); + } } catch (std::exception &e) { __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~queue_impl", e); } diff --git a/sycl/source/detail/scheduler/scheduler.cpp b/sycl/source/detail/scheduler/scheduler.cpp index efbbb52acab73..f863badc62c57 100644 --- a/sycl/source/detail/scheduler/scheduler.cpp +++ b/sycl/source/detail/scheduler/scheduler.cpp @@ -271,7 +271,20 @@ bool Scheduler::removeMemoryObject(detail::SYCLMemObjI *MemObj, // No operations were performed on the mem object return true; - { +#ifdef _WIN32 + // If we are shutting down on Windows it may not be + // safe to wait on host threads, as the OS may + // abandon them. But no worries, the memory WILL be reclaimed. + bool allowWait = + MemObj->hasUserDataPtr() || GlobalHandler::instance().isOkToDefer(); + if (!allowWait) { + StrictLock = false; + } +#else + bool allowWait = true; +#endif + + if (allowWait) { // This only needs a shared mutex as it only involves enqueueing and // awaiting for events ReadLockT Lock = StrictLock ? ReadLockT(MGraphLock) @@ -281,10 +294,20 @@ bool Scheduler::removeMemoryObject(detail::SYCLMemObjI *MemObj, waitForRecordToFinish(Record, Lock); } { + // If allowWait is false, it means the application is shutting down. + // On Windows we can't safely wait on threads, because they have likely been + // abandoned. So we will try to get the lock. If we can, great, we'll remove + // the record. But if we can't, we just skip. The OS will reclaim the + // memory. WriteLockT Lock = StrictLock ? acquireWriteLock() : WriteLockT(MGraphLock, std::try_to_lock); - if (!Lock.owns_lock()) - return false; + if (!Lock.owns_lock()) { + + if (allowWait) + return false; // Record was not removed, the caller may try again. + else + return true; // skip. + } MGraphBuilder.decrementLeafCountersForRecord(Record); MGraphBuilder.cleanupCommandsForRecord(Record); MGraphBuilder.removeRecordForMemObj(MemObj); @@ -567,11 +590,10 @@ void Scheduler::cleanupAuxiliaryResources(BlockingT Blocking) { std::unique_lock Lock{MAuxiliaryResourcesMutex}; for (auto It = MAuxiliaryResources.begin(); It != MAuxiliaryResources.end();) { - const EventImplPtr &Event = It->first; if (Blocking == BlockingT::BLOCKING) { - Event->waitInternal(); + It->first->waitInternal(); It = MAuxiliaryResources.erase(It); - } else if (Event->isCompleted()) + } else if (It->first->isCompleted()) It = MAuxiliaryResources.erase(It); else ++It; diff --git a/sycl/source/detail/thread_pool.hpp b/sycl/source/detail/thread_pool.hpp index 50240e0a98b06..e9d441d6d27d1 100644 --- a/sycl/source/detail/thread_pool.hpp +++ b/sycl/source/detail/thread_pool.hpp @@ -75,7 +75,9 @@ class ThreadPool { ~ThreadPool() { try { +#ifndef _WIN32 finishAndWait(); +#endif } catch (std::exception &e) { __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~ThreadPool", e); } diff --git a/sycl/unittests/CMakeLists.txt b/sycl/unittests/CMakeLists.txt index 8831426784de2..b099538ac93e5 100644 --- a/sycl/unittests/CMakeLists.txt +++ b/sycl/unittests/CMakeLists.txt @@ -39,7 +39,6 @@ add_subdirectory(pipes) add_subdirectory(program_manager) add_subdirectory(assert) add_subdirectory(Extensions) -add_subdirectory(windows) add_subdirectory(event) add_subdirectory(buffer) add_subdirectory(context_device) diff --git a/sycl/unittests/buffer/BufferReleaseBase.hpp b/sycl/unittests/buffer/BufferReleaseBase.hpp index a4982af3b581f..322b85ffe469c 100644 --- a/sycl/unittests/buffer/BufferReleaseBase.hpp +++ b/sycl/unittests/buffer/BufferReleaseBase.hpp @@ -16,7 +16,6 @@ #include #include -#include #include #include diff --git a/sycl/unittests/windows/CMakeLists.txt b/sycl/unittests/windows/CMakeLists.txt deleted file mode 100644 index 6143d5de55045..0000000000000 --- a/sycl/unittests/windows/CMakeLists.txt +++ /dev/null @@ -1,4 +0,0 @@ -add_sycl_unittest(WindowsDllMainTest OBJECT - dllmain.cpp -) - diff --git a/sycl/unittests/windows/dllmain.cpp b/sycl/unittests/windows/dllmain.cpp index 79c41981f426b..40a933551330b 100644 --- a/sycl/unittests/windows/dllmain.cpp +++ b/sycl/unittests/windows/dllmain.cpp @@ -10,6 +10,9 @@ * This test calls DllMain on Windows. This means, the process performs actions * which are required for library unload. That said, the test requires to be a * distinct binary executable. + * Do NOT add any other test cases to this file. + * Do NOT attempt to move its one test into any other file, because the + * release of the global handler that it causes would interfere with others. */ #include @@ -39,6 +42,11 @@ ur_result_t redefinedAdapterRelease(void *) { TEST(Windows, DllMainCall) { #ifdef _WIN32 sycl::unittest::UrMock<> Mock; +<<<<<<< HEAD +======= + Mock.releaseSyclObjsOnDestruction = false; + +>>>>>>> e85ac9ba6f45 (clang-formation) sycl::platform Plt = sycl::platform(); mock::getCallbacks().set_before_callback("urAdapterRelease", &redefinedAdapterRelease); From 9130945d1e3003d89ff90ff6b1f604b52c1dedd8 Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Fri, 21 Feb 2025 12:32:23 -0800 Subject: [PATCH 02/21] resolve cherry-pick --- sycl/doc/design/GlobalObjectsInRuntime.md | 4 +- sycl/source/detail/global_handler.cpp | 31 +++------- sycl/source/detail/host_task.hpp | 1 + sycl/source/detail/queue_impl.hpp | 4 +- sycl/unittests/windows/dllmain.cpp | 70 ----------------------- 5 files changed, 13 insertions(+), 97 deletions(-) delete mode 100644 sycl/unittests/windows/dllmain.cpp diff --git a/sycl/doc/design/GlobalObjectsInRuntime.md b/sycl/doc/design/GlobalObjectsInRuntime.md index 5606adf754d7d..d3c66a4f6ec0e 100644 --- a/sycl/doc/design/GlobalObjectsInRuntime.md +++ b/sycl/doc/design/GlobalObjectsInRuntime.md @@ -63,7 +63,7 @@ we need to stop deferring any new buffer releases and clean up any memory whose release was deferred. To this end, the shutdown occurs in two phases: early and late. The purpose -for eary shutdown is primarily to stop any further deferring of memory release. +for early shutdown is primarily to stop any further deferring of memory release. This is because the deferred memory release is based on threads and on Windows the threads will be abandoned. So as soon as possible we want to stop deferring memory and try to let go any that has been deferred. The purpose for late @@ -109,7 +109,7 @@ This is not a normal occurrence, but it can happen if there is no call to queue. ### Linux -On Linux, the "eary_shutdown()" is begun by the destruction of a static +On Linux, the "early_shutdown()" is begun by the destruction of a static StaticVarShutdownHandler object, which is initialized by platform::get_platforms(). diff --git a/sycl/source/detail/global_handler.cpp b/sycl/source/detail/global_handler.cpp index 176b74a08548d..8233198970e09 100644 --- a/sycl/source/detail/global_handler.cpp +++ b/sycl/source/detail/global_handler.cpp @@ -62,10 +62,6 @@ class ObjectUsageCounter { LockGuard Guard(GlobalHandler::MSyclGlobalHandlerProtector); MCounter--; - GlobalHandler *RTGlobalObjHandler = GlobalHandler::getInstancePtr(); - if (RTGlobalObjHandler) { - RTGlobalObjHandler->prepareSchedulerToRelease(!MCounter); - } } catch (std::exception &e) { __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~ObjectUsageCounter", e); } @@ -313,18 +309,6 @@ void GlobalHandler::drainThreadPool() { MHostTaskThreadPool.Inst->drain(); } -#ifdef _WIN32 -// because of something not-yet-understood on Windows -// threads may be shutdown once the end of main() is reached -// making an orderly shutdown difficult. Fortunately, Windows -// itself is very aggressive about reclaiming memory. Thus, -// we focus solely on unloading the adapters, so as to not -// accidentally retain device handles. etc -void shutdown_win() { - GlobalHandler *&Handler = GlobalHandler::getInstancePtr(); - Handler->unloadAdapters(); -} -#else void shutdown_early() { const LockGuard Lock{GlobalHandler::MSyclGlobalHandlerProtector}; GlobalHandler *&Handler = GlobalHandler::getInstancePtr(); @@ -380,7 +364,6 @@ void shutdown_late() { delete Handler; Handler = nullptr; } -#endif #ifdef _WIN32 extern "C" __SYCL_EXPORT BOOL WINAPI DllMain(HINSTANCE hinstDLL, @@ -404,7 +387,7 @@ extern "C" __SYCL_EXPORT BOOL WINAPI DllMain(HINSTANCE hinstDLL, try { shutdown_early(); } catch (std::exception &e) { - std::cout << "exception in DLL_PROCESS_DETACH" << e.what() << std::endl; + __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in DLL_PROCESS_DETACH", e); return FALSE; } @@ -428,21 +411,21 @@ BOOL isLinkedStatically() { // Otherwise we are dynamically linked or loaded. HMODULE hModule = nullptr; auto LpModuleAddr = reinterpret_cast(&DllMain); - if (GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, LpModuleAddr, - &hModule)) { + if (!GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, LpModuleAddr, + &hModule)) { + return true; // not retrievable, therefore statically linked + } else { char dllPath[MAX_PATH]; if (GetModuleFileNameA(hModule, dllPath, MAX_PATH)) { char exePath[MAX_PATH]; if (GetModuleFileNameA(NULL, exePath, MAX_PATH)) { if (std::string(dllPath) == std::string(exePath)) { - return true; + return true; // paths identical, therefore statically linked } } } - } else { - return true; } - return false; + return false; // Otherwise dynamically linked or loaded } #else // Setting low priority on destructor ensures it runs after all other global diff --git a/sycl/source/detail/host_task.hpp b/sycl/source/detail/host_task.hpp index 65061c2f83e42..f7e3feff8d0ef 100644 --- a/sycl/source/detail/host_task.hpp +++ b/sycl/source/detail/host_task.hpp @@ -13,6 +13,7 @@ #pragma once #include +#include #include #include #include diff --git a/sycl/source/detail/queue_impl.hpp b/sycl/source/detail/queue_impl.hpp index 7bb22bc8296f8..290661e93b5b6 100644 --- a/sycl/source/detail/queue_impl.hpp +++ b/sycl/source/detail/queue_impl.hpp @@ -266,9 +266,11 @@ class queue_impl { throw_asynchronous(); auto status = getAdapter()->call_nocheck(MQueues[0]); - // if loader is already closed, it'll return a not-initialized status + // If loader is already closed, it'll return a not-initialized status // which the UR should convert to SUCCESS code. But that isn't always // working on Windows. This is a temporary workaround until that is fixed. + // TODO: Remove this workaround when UR is fixed, and restore + // ->call<>() instead of ->call_nocheck<>() above. if (status != UR_RESULT_SUCCESS && status != UR_RESULT_ERROR_UNINITIALIZED) { __SYCL_CHECK_UR_CODE_NO_EXC(status); diff --git a/sycl/unittests/windows/dllmain.cpp b/sycl/unittests/windows/dllmain.cpp deleted file mode 100644 index 40a933551330b..0000000000000 --- a/sycl/unittests/windows/dllmain.cpp +++ /dev/null @@ -1,70 +0,0 @@ -//==----- dllmain.cpp --- verify behaviour of lib on process termination ---==// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -/* - * This test calls DllMain on Windows. This means, the process performs actions - * which are required for library unload. That said, the test requires to be a - * distinct binary executable. - * Do NOT add any other test cases to this file. - * Do NOT attempt to move its one test into any other file, because the - * release of the global handler that it causes would interfere with others. - */ - -#include -#include - -#include - -#ifdef _WIN32 -#include - -extern "C" BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, - LPVOID lpReserved); - -static std::atomic TearDownCalls{0}; - -// Before the port this was an override for LoaderTearDown, UR's mock -// functionality can't override loader functions but AdapterRelease is called -// in the runtime in the same place as LoaderTearDown -ur_result_t redefinedAdapterRelease(void *) { - fprintf(stderr, "intercepted tear down\n"); - ++TearDownCalls; - - return UR_RESULT_SUCCESS; -} -#endif - -TEST(Windows, DllMainCall) { -#ifdef _WIN32 - sycl::unittest::UrMock<> Mock; -<<<<<<< HEAD -======= - Mock.releaseSyclObjsOnDestruction = false; - ->>>>>>> e85ac9ba6f45 (clang-formation) - sycl::platform Plt = sycl::platform(); - mock::getCallbacks().set_before_callback("urAdapterRelease", - &redefinedAdapterRelease); - - // Teardown calls are only expected on sycl.dll library unload, not when - // process gets terminated. - // The first call to DllMain is to simulate library unload. The second one - // is to simulate process termination - fprintf(stderr, "Call DllMain for the first time\n"); - DllMain((HINSTANCE)0, DLL_PROCESS_DETACH, (LPVOID)NULL); - - int TearDownCallsDone = TearDownCalls.load(); - - EXPECT_NE(TearDownCallsDone, 0); - - fprintf(stderr, "Call DllMain for the second time\n"); - DllMain((HINSTANCE)0, DLL_PROCESS_DETACH, (LPVOID)0x01); - - EXPECT_EQ(TearDownCalls.load(), TearDownCallsDone); -#endif -} From fc42629032dbec4d8dc5faa693fb1098487f3fec Mon Sep 17 00:00:00 2001 From: "Perkins, Chris" Date: Mon, 3 Mar 2025 14:02:39 -0800 Subject: [PATCH 03/21] fix for windows. Should probabaly be removed from Linux too, more testing needed --- sycl/source/detail/program_manager/program_manager.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sycl/source/detail/program_manager/program_manager.cpp b/sycl/source/detail/program_manager/program_manager.cpp index f2e98c8b68219..c40eed1f3df5b 100644 --- a/sycl/source/detail/program_manager/program_manager.cpp +++ b/sycl/source/detail/program_manager/program_manager.cpp @@ -3636,7 +3636,9 @@ extern "C" void __sycl_register_lib(sycl_device_binaries desc) { // Executed as a part of current module's (.exe, .dll) static initialization extern "C" void __sycl_unregister_lib(sycl_device_binaries desc) { // Partial cleanup is not necessary at shutdown +#ifndef _WIN32 if (!sycl::detail::GlobalHandler::instance().isOkToDefer()) return; sycl::detail::ProgramManager::getInstance().removeImages(desc); +#endif } From 6da51cd3a9bcb7bb479d50701a786094a5a07c3d Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Wed, 2 Apr 2025 14:09:14 -0700 Subject: [PATCH 04/21] [UR][L0] Fix L0 teardown checks for stability Signed-off-by: Neil R. Spruit --- unified-runtime/cmake/FetchLevelZero.cmake | 4 +- .../source/adapters/level_zero/common.hpp | 65 +++---------------- .../source/adapters/level_zero/context.cpp | 30 +++++++-- .../source/adapters/level_zero/event.cpp | 5 +- .../source/adapters/level_zero/kernel.cpp | 5 +- .../source/adapters/level_zero/memory.cpp | 5 +- .../source/adapters/level_zero/queue.cpp | 10 ++- .../source/adapters/level_zero/sampler.cpp | 5 +- .../source/adapters/level_zero/v2/common.hpp | 5 +- 9 files changed, 62 insertions(+), 72 deletions(-) diff --git a/unified-runtime/cmake/FetchLevelZero.cmake b/unified-runtime/cmake/FetchLevelZero.cmake index c4d0c954278d2..0824059533e1e 100644 --- a/unified-runtime/cmake/FetchLevelZero.cmake +++ b/unified-runtime/cmake/FetchLevelZero.cmake @@ -40,10 +40,10 @@ if (NOT DEFINED LEVEL_ZERO_LIBRARY OR NOT DEFINED LEVEL_ZERO_INCLUDE_DIR) set(BUILD_STATIC ON) if (UR_LEVEL_ZERO_LOADER_REPO STREQUAL "") - set(UR_LEVEL_ZERO_LOADER_REPO "https://github.com/oneapi-src/level-zero.git") + set(UR_LEVEL_ZERO_LOADER_REPO "https://github.com/nrspruit/level-zero.git") endif() if (UR_LEVEL_ZERO_LOADER_TAG STREQUAL "") - set(UR_LEVEL_ZERO_LOADER_TAG v1.21.1) + set(UR_LEVEL_ZERO_LOADER_TAG cd83892e09c339b1688de3aa67cd902fb277b297) endif() # Disable due to a bug https://github.com/oneapi-src/level-zero/issues/104 diff --git a/unified-runtime/source/adapters/level_zero/common.hpp b/unified-runtime/source/adapters/level_zero/common.hpp index 11b87d9ed276b..96e8765ed845e 100644 --- a/unified-runtime/source/adapters/level_zero/common.hpp +++ b/unified-runtime/source/adapters/level_zero/common.hpp @@ -27,6 +27,7 @@ #include #include +#include #include #include @@ -38,65 +39,15 @@ struct _ur_platform_handle_t; [[maybe_unused]] static bool checkL0LoaderTeardown() { - bool loaderStable = true; -#ifdef _WIN32 - uint32_t ZeDriverCount = 0; - HMODULE zeLoader = LoadLibrary("ze_loader.dll"); - if (zeLoader) { - typedef ze_result_t (*zeDriverGet_t)(uint32_t *, ze_driver_handle_t *); - zeDriverGet_t zeDriverGetLoader = - (zeDriverGet_t)GetProcAddress(zeLoader, "zeDriverGet"); - if (zeDriverGetLoader) { - ze_result_t result = zeDriverGetLoader(&ZeDriverCount, nullptr); - logger::debug( - "ZE ---> checkL0LoaderTeardown result = {} driver count = {}", result, - ZeDriverCount); - if (result != ZE_RESULT_SUCCESS || ZeDriverCount == 0) { - loaderStable = false; - } - } else { - logger::debug("ZE ---> checkL0LoaderTeardown: Failed to get address of " - "zeDriverGet"); - loaderStable = false; - } - FreeLibrary(zeLoader); - } else { - logger::debug( - "ZE ---> checkL0LoaderTeardown: Failed to load ze_loader.dll"); - loaderStable = false; - } -#else - uint32_t ZeDriverCount = 0; - void *zeLoader = dlopen("libze_loader.so.1", RTLD_LAZY); - if (zeLoader) { - typedef ze_result_t (*zeDriverGet_t)(uint32_t *, ze_driver_handle_t *); - zeDriverGet_t zeDriverGetLoader = - (zeDriverGet_t)dlsym(zeLoader, "zeDriverGet"); - if (zeDriverGetLoader) { - ze_result_t result = zeDriverGetLoader(&ZeDriverCount, nullptr); - logger::debug( - "ZE ---> checkL0LoaderTeardown result = {} driver count = {}", result, - ZeDriverCount); - if (result != ZE_RESULT_SUCCESS || ZeDriverCount == 0) { - loaderStable = false; - } - } else { - logger::debug("ZE ---> checkL0LoaderTeardown: Failed to get address of " - "zeDriverGet"); - loaderStable = false; + try { + if (!zelCheckIsLoaderInTearDown()) { + logger::debug("ZE ---> checkL0LoaderTeardown: Loader is not in teardown"); + return true; } - dlclose(zeLoader); - } else { - logger::debug( - "ZE ---> checkL0LoaderTeardown: Failed to load libze_loader.so.1"); - loaderStable = false; - } -#endif - if (!loaderStable) { - logger::debug( - "ZE ---> checkL0LoaderTeardown: Loader is not stable, returning false"); + } catch (...) { } - return loaderStable; + logger::debug("ZE ---> checkL0LoaderTeardown: Loader is in teardown or is unstable"); + return false; } // Controls UR L0 calls tracing. diff --git a/unified-runtime/source/adapters/level_zero/context.cpp b/unified-runtime/source/adapters/level_zero/context.cpp index de1ca00f9f8fe..b47bdb17e9598 100644 --- a/unified-runtime/source/adapters/level_zero/context.cpp +++ b/unified-runtime/source/adapters/level_zero/context.cpp @@ -285,8 +285,11 @@ ur_result_t ContextReleaseHelper(ur_context_handle_t Context) { if (DestroyZeContext) { auto ZeResult = ZE_CALL_NOCHECK(zeContextDestroy, (DestroyZeContext)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } return Result; @@ -311,8 +314,11 @@ ur_result_t ur_context_handle_t_::finalize() { (Event->IsInteropNativeHandle && checkL0LoaderTeardown())) { auto ZeResult = ZE_CALL_NOCHECK(zeEventDestroy, (Event->ZeEvent)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } Event->ZeEvent = nullptr; delete Event; @@ -326,8 +332,11 @@ ur_result_t ur_context_handle_t_::finalize() { for (auto &ZePool : ZePoolCache) { auto ZeResult = ZE_CALL_NOCHECK(zeEventPoolDestroy, (ZePool)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } ZePoolCache.clear(); } @@ -336,8 +345,11 @@ ur_result_t ur_context_handle_t_::finalize() { // Destroy the command list used for initializations auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandListInit)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } std::scoped_lock Lock(ZeCommandListCacheMutex); for (auto &List : ZeComputeCommandListCache) { @@ -346,8 +358,11 @@ ur_result_t ur_context_handle_t_::finalize() { if (ZeCommandList) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandList)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } } } @@ -357,8 +372,11 @@ ur_result_t ur_context_handle_t_::finalize() { if (ZeCommandList) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandList)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } } } diff --git a/unified-runtime/source/adapters/level_zero/event.cpp b/unified-runtime/source/adapters/level_zero/event.cpp index 11c0502a55939..2c1c0b4d7a811 100644 --- a/unified-runtime/source/adapters/level_zero/event.cpp +++ b/unified-runtime/source/adapters/level_zero/event.cpp @@ -1125,8 +1125,11 @@ ur_result_t urEventReleaseInternal(ur_event_handle_t Event) { (Event->IsInteropNativeHandle && checkL0LoaderTeardown())) { auto ZeResult = ZE_CALL_NOCHECK(zeEventDestroy, (Event->ZeEvent)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } Event->ZeEvent = nullptr; auto Context = Event->Context; diff --git a/unified-runtime/source/adapters/level_zero/kernel.cpp b/unified-runtime/source/adapters/level_zero/kernel.cpp index d9dd4cc38ad0a..02121d4f458b7 100644 --- a/unified-runtime/source/adapters/level_zero/kernel.cpp +++ b/unified-runtime/source/adapters/level_zero/kernel.cpp @@ -944,8 +944,11 @@ ur_result_t urKernelRelease( (Kernel->IsInteropNativeHandle && checkL0LoaderTeardown())) { auto ZeResult = ZE_CALL_NOCHECK(zeKernelDestroy, (ZeKernel)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } } } diff --git a/unified-runtime/source/adapters/level_zero/memory.cpp b/unified-runtime/source/adapters/level_zero/memory.cpp index 1a348c8ef9b6e..98124870c70fd 100644 --- a/unified-runtime/source/adapters/level_zero/memory.cpp +++ b/unified-runtime/source/adapters/level_zero/memory.cpp @@ -1668,8 +1668,11 @@ ur_result_t urMemRelease( auto ZeResult = ZE_CALL_NOCHECK( zeImageDestroy, (ur_cast(ZeHandleImage))); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } } delete Image; diff --git a/unified-runtime/source/adapters/level_zero/queue.cpp b/unified-runtime/source/adapters/level_zero/queue.cpp index d4cea7faa9d6e..3d6bf7adab26b 100644 --- a/unified-runtime/source/adapters/level_zero/queue.cpp +++ b/unified-runtime/source/adapters/level_zero/queue.cpp @@ -651,8 +651,11 @@ ur_result_t urQueueRelease( if (Queue->Healthy && it->second.ZeFence != nullptr) { auto ZeResult = ZE_CALL_NOCHECK(zeFenceDestroy, (it->second.ZeFence)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } if (Queue->UsingImmCmdLists && Queue->OwnZeCommandQueue) { std::scoped_lock Lock( @@ -1609,8 +1612,11 @@ ur_result_t urQueueReleaseInternal(ur_queue_handle_t Queue) { (Queue->IsInteropNativeHandle && checkL0LoaderTeardown())) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandQueueDestroy, (ZeQueue)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } } } diff --git a/unified-runtime/source/adapters/level_zero/sampler.cpp b/unified-runtime/source/adapters/level_zero/sampler.cpp index f239dfed5ce49..2dba86012c038 100644 --- a/unified-runtime/source/adapters/level_zero/sampler.cpp +++ b/unified-runtime/source/adapters/level_zero/sampler.cpp @@ -131,8 +131,11 @@ ur_result_t urSamplerRelease( auto ZeResult = ZE_CALL_NOCHECK(zeSamplerDestroy, (Sampler->ZeSampler)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && ZeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } delete Sampler; return UR_RESULT_SUCCESS; diff --git a/unified-runtime/source/adapters/level_zero/v2/common.hpp b/unified-runtime/source/adapters/level_zero/v2/common.hpp index 504a7d6c618ed..dbadb2884c021 100644 --- a/unified-runtime/source/adapters/level_zero/v2/common.hpp +++ b/unified-runtime/source/adapters/level_zero/v2/common.hpp @@ -83,8 +83,11 @@ struct ze_handle_wrapper { (ownZeHandle && IsInteropNativeHandle && checkL0LoaderTeardown())) { auto zeResult = destroy(handle); // Gracefully handle the case that L0 was already unloaded. - if (zeResult && zeResult != ZE_RESULT_ERROR_UNINITIALIZED) + if (zeResult && (zeResult != ZE_RESULT_ERROR_UNINITIALIZED || zeResult != ZE_RESULT_ERROR_UNKNOWN)) throw ze2urResult(zeResult); + if ( zeResult == ZE_RESULT_ERROR_UNKNOWN) { + zeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } handle = nullptr; From cad11ab72f040229bdb7b54a1fcbfae70747e4d4 Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Wed, 2 Apr 2025 15:40:33 -0700 Subject: [PATCH 05/21] [UR][L0] Verify the Loader is stable before cleanup of all handles Signed-off-by: Neil R. Spruit --- .../adapters/level_zero/command_buffer.cpp | 10 +++-- .../source/adapters/level_zero/context.cpp | 41 ++++++++++--------- .../source/adapters/level_zero/event.cpp | 5 +-- .../source/adapters/level_zero/image.cpp | 2 +- .../source/adapters/level_zero/kernel.cpp | 3 +- .../source/adapters/level_zero/memory.cpp | 3 +- .../adapters/level_zero/physical_mem.cpp | 6 ++- .../source/adapters/level_zero/program.cpp | 4 +- .../source/adapters/level_zero/queue.cpp | 7 ++-- .../source/adapters/level_zero/sampler.cpp | 14 ++++--- .../source/adapters/level_zero/usm.cpp | 3 +- .../source/adapters/level_zero/v2/common.hpp | 3 +- .../source/adapters/level_zero/v2/memory.cpp | 6 +-- .../v2/queue_immediate_in_order.cpp | 2 +- 14 files changed, 54 insertions(+), 55 deletions(-) diff --git a/unified-runtime/source/adapters/level_zero/command_buffer.cpp b/unified-runtime/source/adapters/level_zero/command_buffer.cpp index 45ab74c52a969..8ebc0d1661611 100644 --- a/unified-runtime/source/adapters/level_zero/command_buffer.cpp +++ b/unified-runtime/source/adapters/level_zero/command_buffer.cpp @@ -445,16 +445,16 @@ void ur_exp_command_buffer_handle_t_::cleanupCommandBufferResources() { // Release the memory allocated to the CommandList stored in the // command_buffer - if (ZeComputeCommandList) { + if (ZeComputeCommandList && checkL0LoaderTeardown()) { ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeComputeCommandList)); } - if (useCopyEngine() && ZeCopyCommandList) { + if (useCopyEngine() && ZeCopyCommandList && checkL0LoaderTeardown()) { ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCopyCommandList)); } // Release the memory allocated to the CommandListResetEvents stored in the // command_buffer - if (ZeCommandListResetEvents) { + if (ZeCommandListResetEvents && checkL0LoaderTeardown()) { ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandListResetEvents)); } @@ -502,7 +502,9 @@ void ur_exp_command_buffer_handle_t_::cleanupCommandBufferResources() { // Release fences allocated to command-buffer for (auto &ZeFencePair : ZeFencesMap) { auto &ZeFence = ZeFencePair.second; - ZE_CALL_NOCHECK(zeFenceDestroy, (ZeFence)); + if (checkL0LoaderTeardown()) { + ZE_CALL_NOCHECK(zeFenceDestroy, (ZeFence)); + } } auto ReleaseIndirectMem = [](ur_kernel_handle_t Kernel) { diff --git a/unified-runtime/source/adapters/level_zero/context.cpp b/unified-runtime/source/adapters/level_zero/context.cpp index b47bdb17e9598..9ecc360cc278f 100644 --- a/unified-runtime/source/adapters/level_zero/context.cpp +++ b/unified-runtime/source/adapters/level_zero/context.cpp @@ -264,9 +264,7 @@ ur_result_t ContextReleaseHelper(ur_context_handle_t Context) { Contexts.erase(It); } ze_context_handle_t DestroyZeContext = - ((Context->OwnNativeHandle && !Context->IsInteropNativeHandle) || - (Context->OwnNativeHandle && Context->IsInteropNativeHandle && - checkL0LoaderTeardown())) + (Context->OwnNativeHandle && checkL0LoaderTeardown()) ? Context->ZeContext : nullptr; @@ -310,8 +308,7 @@ ur_result_t ur_context_handle_t_::finalize() { std::scoped_lock Lock(EventCacheMutex); for (auto &EventCache : EventCaches) { for (auto &Event : EventCache) { - if (!Event->IsInteropNativeHandle || - (Event->IsInteropNativeHandle && checkL0LoaderTeardown())) { + if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeEventDestroy, (Event->ZeEvent)); // Gracefully handle the case that L0 was already unloaded. if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) @@ -330,32 +327,36 @@ ur_result_t ur_context_handle_t_::finalize() { std::scoped_lock Lock(ZeEventPoolCacheMutex); for (auto &ZePoolCache : ZeEventPoolCache) { for (auto &ZePool : ZePoolCache) { - auto ZeResult = ZE_CALL_NOCHECK(zeEventPoolDestroy, (ZePool)); - // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) - return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { - ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + if (checkL0LoaderTeardown()) { + auto ZeResult = ZE_CALL_NOCHECK(zeEventPoolDestroy, (ZePool)); + // Gracefully handle the case that L0 was already unloaded. + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } } ZePoolCache.clear(); } } - // Destroy the command list used for initializations - auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandListInit)); - // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) - return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { - ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + if (checkL0LoaderTeardown()) { + // Destroy the command list used for initializations + auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandListInit)); + // Gracefully handle the case that L0 was already unloaded. + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } std::scoped_lock Lock(ZeCommandListCacheMutex); for (auto &List : ZeComputeCommandListCache) { for (auto &Item : List.second) { ze_command_list_handle_t ZeCommandList = Item.first; - if (ZeCommandList) { + if (ZeCommandList && checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandList)); // Gracefully handle the case that L0 was already unloaded. if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) @@ -369,7 +370,7 @@ ur_result_t ur_context_handle_t_::finalize() { for (auto &List : ZeCopyCommandListCache) { for (auto &Item : List.second) { ze_command_list_handle_t ZeCommandList = Item.first; - if (ZeCommandList) { + if (ZeCommandList && checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandList)); // Gracefully handle the case that L0 was already unloaded. if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) diff --git a/unified-runtime/source/adapters/level_zero/event.cpp b/unified-runtime/source/adapters/level_zero/event.cpp index 2c1c0b4d7a811..1e2708721682c 100644 --- a/unified-runtime/source/adapters/level_zero/event.cpp +++ b/unified-runtime/source/adapters/level_zero/event.cpp @@ -1090,7 +1090,7 @@ ur_result_t ur_event_handle_t_::getOrCreateHostVisibleEvent( * leaks or resource mismanagement. */ ur_event_handle_t_::~ur_event_handle_t_() { - if (this->ZeEvent && this->Completed) { + if (this->ZeEvent && this->Completed && checkL0LoaderTeardown()) { if (this->UrQueue && !this->UrQueue->isDiscardEvents()) ZE_CALL_NOCHECK(zeEventDestroy, (this->ZeEvent)); } @@ -1121,8 +1121,7 @@ ur_result_t urEventReleaseInternal(ur_event_handle_t Event) { } if (Event->OwnNativeHandle) { if (DisableEventsCaching) { - if (!Event->IsInteropNativeHandle || - (Event->IsInteropNativeHandle && checkL0LoaderTeardown())) { + if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeEventDestroy, (Event->ZeEvent)); // Gracefully handle the case that L0 was already unloaded. if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) diff --git a/unified-runtime/source/adapters/level_zero/image.cpp b/unified-runtime/source/adapters/level_zero/image.cpp index 24b2a8cfca758..dc361cd501373 100644 --- a/unified-runtime/source/adapters/level_zero/image.cpp +++ b/unified-runtime/source/adapters/level_zero/image.cpp @@ -312,7 +312,7 @@ ur_result_t urBindlessImagesUnsampledImageHandleDestroyExp( auto item = hDevice->ZeOffsetToImageHandleMap.find(hImage); - if (item != hDevice->ZeOffsetToImageHandleMap.end()) { + if (item != hDevice->ZeOffsetToImageHandleMap.end() && checkL0LoaderTeardown()) { ZE2UR_CALL(zeImageDestroy, (item->second)); hDevice->ZeOffsetToImageHandleMap.erase(item); } else { diff --git a/unified-runtime/source/adapters/level_zero/kernel.cpp b/unified-runtime/source/adapters/level_zero/kernel.cpp index 02121d4f458b7..34d4116111796 100644 --- a/unified-runtime/source/adapters/level_zero/kernel.cpp +++ b/unified-runtime/source/adapters/level_zero/kernel.cpp @@ -940,8 +940,7 @@ ur_result_t urKernelRelease( auto KernelProgram = Kernel->Program; if (Kernel->OwnNativeHandle) { for (auto &ZeKernel : Kernel->ZeKernels) { - if (!Kernel->IsInteropNativeHandle || - (Kernel->IsInteropNativeHandle && checkL0LoaderTeardown())) { + if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeKernelDestroy, (ZeKernel)); // Gracefully handle the case that L0 was already unloaded. if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) diff --git a/unified-runtime/source/adapters/level_zero/memory.cpp b/unified-runtime/source/adapters/level_zero/memory.cpp index 98124870c70fd..008cd8bc16dd2 100644 --- a/unified-runtime/source/adapters/level_zero/memory.cpp +++ b/unified-runtime/source/adapters/level_zero/memory.cpp @@ -1663,8 +1663,7 @@ ur_result_t urMemRelease( if (Image->OwnNativeHandle) { UR_CALL(Mem->getZeHandle(ZeHandleImage, ur_mem_handle_t_::write_only, nullptr, nullptr, 0u)); - if (!Image->IsInteropNativeHandle || - (Image->IsInteropNativeHandle && checkL0LoaderTeardown())) { + if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK( zeImageDestroy, (ur_cast(ZeHandleImage))); // Gracefully handle the case that L0 was already unloaded. diff --git a/unified-runtime/source/adapters/level_zero/physical_mem.cpp b/unified-runtime/source/adapters/level_zero/physical_mem.cpp index 836f574800c1e..5d4d0acce0eb3 100644 --- a/unified-runtime/source/adapters/level_zero/physical_mem.cpp +++ b/unified-runtime/source/adapters/level_zero/physical_mem.cpp @@ -50,8 +50,10 @@ ur_result_t urPhysicalMemRelease(ur_physical_mem_handle_t hPhysicalMem) { if (!hPhysicalMem->RefCount.decrementAndTest()) return UR_RESULT_SUCCESS; - ZE2UR_CALL(zePhysicalMemDestroy, (hPhysicalMem->Context->getZeHandle(), - hPhysicalMem->ZePhysicalMem)); + if (checkL0LoaderTeardown()) { + ZE2UR_CALL(zePhysicalMemDestroy, (hPhysicalMem->Context->getZeHandle(), + hPhysicalMem->ZePhysicalMem)); + } delete hPhysicalMem; return UR_RESULT_SUCCESS; diff --git a/unified-runtime/source/adapters/level_zero/program.cpp b/unified-runtime/source/adapters/level_zero/program.cpp index 199ecad21af9b..921aa8e961838 100644 --- a/unified-runtime/source/adapters/level_zero/program.cpp +++ b/unified-runtime/source/adapters/level_zero/program.cpp @@ -1078,7 +1078,7 @@ void ur_program_handle_t_::ur_release_program_resources(bool deletion) { } if (!resourcesReleased) { for (auto &[ZeDevice, DeviceData] : this->DeviceDataMap) { - if (DeviceData.ZeBuildLog) + if (DeviceData.ZeBuildLog && checkL0LoaderTeardown()) ZE_CALL_NOCHECK(zeModuleBuildLogDestroy, (DeviceData.ZeBuildLog)); } // interop api @@ -1087,7 +1087,7 @@ void ur_program_handle_t_::ur_release_program_resources(bool deletion) { } for (auto &[ZeDevice, DeviceData] : this->DeviceDataMap) - if (DeviceData.ZeModule) + if (DeviceData.ZeModule && checkL0LoaderTeardown()) ZE_CALL_NOCHECK(zeModuleDestroy, (DeviceData.ZeModule)); this->DeviceDataMap.clear(); diff --git a/unified-runtime/source/adapters/level_zero/queue.cpp b/unified-runtime/source/adapters/level_zero/queue.cpp index 3d6bf7adab26b..95699c01a1b82 100644 --- a/unified-runtime/source/adapters/level_zero/queue.cpp +++ b/unified-runtime/source/adapters/level_zero/queue.cpp @@ -648,7 +648,7 @@ ur_result_t urQueueRelease( // runtime. Destroy only if a queue is healthy. Destroying a fence may // cause a hang otherwise. // If the fence is a nullptr we are using immediate commandlists. - if (Queue->Healthy && it->second.ZeFence != nullptr) { + if (Queue->Healthy && it->second.ZeFence != nullptr && checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeFenceDestroy, (it->second.ZeFence)); // Gracefully handle the case that L0 was already unloaded. if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) @@ -679,7 +679,7 @@ ur_result_t urQueueRelease( // A non-reusable comamnd list that came from a make_queue call is // destroyed since it cannot be recycled. ze_command_list_handle_t ZeCommandList = it->first; - if (ZeCommandList) { + if (ZeCommandList && checkL0LoaderTeardown()) { ZE2UR_CALL(zeCommandListDestroy, (ZeCommandList)); } } @@ -1608,8 +1608,7 @@ ur_result_t urQueueReleaseInternal(ur_queue_handle_t Queue) { for (auto &QueueGroup : QueueMap) for (auto &ZeQueue : QueueGroup.second.ZeQueues) if (ZeQueue) { - if (!Queue->IsInteropNativeHandle || - (Queue->IsInteropNativeHandle && checkL0LoaderTeardown())) { + if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandQueueDestroy, (ZeQueue)); // Gracefully handle the case that L0 was already unloaded. if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) diff --git a/unified-runtime/source/adapters/level_zero/sampler.cpp b/unified-runtime/source/adapters/level_zero/sampler.cpp index 2dba86012c038..0686192b5f76a 100644 --- a/unified-runtime/source/adapters/level_zero/sampler.cpp +++ b/unified-runtime/source/adapters/level_zero/sampler.cpp @@ -129,12 +129,14 @@ ur_result_t urSamplerRelease( if (!Sampler->RefCount.decrementAndTest()) return UR_RESULT_SUCCESS; - auto ZeResult = ZE_CALL_NOCHECK(zeSamplerDestroy, (Sampler->ZeSampler)); - // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) - return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { - ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + if (checkL0LoaderTeardown()) { + auto ZeResult = ZE_CALL_NOCHECK(zeSamplerDestroy, (Sampler->ZeSampler)); + // Gracefully handle the case that L0 was already unloaded. + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + return ze2urResult(ZeResult); + if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; + } } delete Sampler; diff --git a/unified-runtime/source/adapters/level_zero/usm.cpp b/unified-runtime/source/adapters/level_zero/usm.cpp index c3cb6385f3233..15b67f8ce8255 100644 --- a/unified-runtime/source/adapters/level_zero/usm.cpp +++ b/unified-runtime/source/adapters/level_zero/usm.cpp @@ -683,8 +683,7 @@ ur_result_t UR_APICALL urUSMPoolTrimToExp(ur_context_handle_t, static ur_result_t USMFreeImpl(ur_context_handle_t Context, void *Ptr) { ur_result_t Res = UR_RESULT_SUCCESS; - if (!Context->IsInteropNativeHandle || - (Context->IsInteropNativeHandle && checkL0LoaderTeardown())) { + if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeMemFree, (Context->ZeContext, Ptr)); // Handle When the driver is already released if (ZeResult == ZE_RESULT_ERROR_UNINITIALIZED) { diff --git a/unified-runtime/source/adapters/level_zero/v2/common.hpp b/unified-runtime/source/adapters/level_zero/v2/common.hpp index dbadb2884c021..775965af8047e 100644 --- a/unified-runtime/source/adapters/level_zero/v2/common.hpp +++ b/unified-runtime/source/adapters/level_zero/v2/common.hpp @@ -79,8 +79,7 @@ struct ze_handle_wrapper { return; } - if ((ownZeHandle && !IsInteropNativeHandle) || - (ownZeHandle && IsInteropNativeHandle && checkL0LoaderTeardown())) { + if (ownZeHandle && checkL0LoaderTeardown()) { auto zeResult = destroy(handle); // Gracefully handle the case that L0 was already unloaded. if (zeResult && (zeResult != ZE_RESULT_ERROR_UNINITIALIZED || zeResult != ZE_RESULT_ERROR_UNKNOWN)) diff --git a/unified-runtime/source/adapters/level_zero/v2/memory.cpp b/unified-runtime/source/adapters/level_zero/v2/memory.cpp index 63e29df5462eb..9447d93f55331 100644 --- a/unified-runtime/source/adapters/level_zero/v2/memory.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/memory.cpp @@ -112,8 +112,7 @@ ur_integrated_buffer_handle_t::ur_integrated_buffer_handle_t( this->IsInteropNativeHandle = interopNativeHandle; this->ptr = usm_unique_ptr_t(hostPtr, [hContext, ownHostPtr, this](void *ptr) { - if (!ownHostPtr || - (this->IsInteropNativeHandle && !checkL0LoaderTeardown())) { + if (!ownHostPtr || !checkL0LoaderTeardown()) { return; } ZE_CALL_NOCHECK(zeMemFree, (hContext->getZeHandle(), ptr)); @@ -237,8 +236,7 @@ ur_discrete_buffer_handle_t::ur_discrete_buffer_handle_t( this->IsInteropNativeHandle = interopNativeHandle; deviceAllocations[hDevice->Id.value()] = usm_unique_ptr_t( devicePtr, [this, hContext = this->hContext, ownZePtr](void *ptr) { - if (!ownZePtr || - (this->IsInteropNativeHandle && !checkL0LoaderTeardown())) { + if (!ownZePtr || !checkL0LoaderTeardown()) { return; } ZE_CALL_NOCHECK(zeMemFree, (hContext->getZeHandle(), ptr)); diff --git a/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp b/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp index 7646b48bfc6dc..0360293b7804b 100644 --- a/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp @@ -87,7 +87,7 @@ ur_queue_immediate_in_order_t::ur_queue_immediate_in_order_t( [ownZeQueue, interopNativeHandle](ze_command_list_handle_t hZeCommandList) { if (ownZeQueue) { - if (!interopNativeHandle) { + if (checkL0LoaderTeardown()) { ZE_CALL_NOCHECK(zeCommandListDestroy, (hZeCommandList)); } } From aec3fd573333470626933c2379de854fd8657500 Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Wed, 2 Apr 2025 16:15:39 -0700 Subject: [PATCH 06/21] Fixed formatting Signed-off-by: Neil R. Spruit --- .../source/adapters/level_zero/common.hpp | 5 +-- .../source/adapters/level_zero/context.cpp | 35 +++++++++++-------- .../source/adapters/level_zero/event.cpp | 5 +-- .../source/adapters/level_zero/image.cpp | 3 +- .../source/adapters/level_zero/kernel.cpp | 5 +-- .../source/adapters/level_zero/memory.cpp | 5 +-- .../source/adapters/level_zero/program.cpp | 8 ++--- .../source/adapters/level_zero/queue.cpp | 13 ++++--- .../source/adapters/level_zero/sampler.cpp | 5 +-- .../source/adapters/level_zero/v2/common.hpp | 5 +-- 10 files changed, 52 insertions(+), 37 deletions(-) diff --git a/unified-runtime/source/adapters/level_zero/common.hpp b/unified-runtime/source/adapters/level_zero/common.hpp index 96e8765ed845e..4e0af9f8b4852 100644 --- a/unified-runtime/source/adapters/level_zero/common.hpp +++ b/unified-runtime/source/adapters/level_zero/common.hpp @@ -25,9 +25,9 @@ #include #endif +#include #include #include -#include #include #include @@ -46,7 +46,8 @@ struct _ur_platform_handle_t; } } catch (...) { } - logger::debug("ZE ---> checkL0LoaderTeardown: Loader is in teardown or is unstable"); + logger::debug( + "ZE ---> checkL0LoaderTeardown: Loader is in teardown or is unstable"); return false; } diff --git a/unified-runtime/source/adapters/level_zero/context.cpp b/unified-runtime/source/adapters/level_zero/context.cpp index 9ecc360cc278f..dfd0f4a9d8a4a 100644 --- a/unified-runtime/source/adapters/level_zero/context.cpp +++ b/unified-runtime/source/adapters/level_zero/context.cpp @@ -264,9 +264,8 @@ ur_result_t ContextReleaseHelper(ur_context_handle_t Context) { Contexts.erase(It); } ze_context_handle_t DestroyZeContext = - (Context->OwnNativeHandle && checkL0LoaderTeardown()) - ? Context->ZeContext - : nullptr; + (Context->OwnNativeHandle && checkL0LoaderTeardown()) ? Context->ZeContext + : nullptr; // Clean up any live memory associated with Context ur_result_t Result = Context->finalize(); @@ -283,9 +282,10 @@ ur_result_t ContextReleaseHelper(ur_context_handle_t Context) { if (DestroyZeContext) { auto ZeResult = ZE_CALL_NOCHECK(zeContextDestroy, (DestroyZeContext)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } @@ -311,9 +311,10 @@ ur_result_t ur_context_handle_t_::finalize() { if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeEventDestroy, (Event->ZeEvent)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } @@ -330,9 +331,10 @@ ur_result_t ur_context_handle_t_::finalize() { if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeEventPoolDestroy, (ZePool)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } @@ -345,9 +347,10 @@ ur_result_t ur_context_handle_t_::finalize() { // Destroy the command list used for initializations auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandListInit)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } @@ -359,9 +362,10 @@ ur_result_t ur_context_handle_t_::finalize() { if (ZeCommandList && checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandList)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } @@ -373,9 +377,10 @@ ur_result_t ur_context_handle_t_::finalize() { if (ZeCommandList && checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandListDestroy, (ZeCommandList)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } diff --git a/unified-runtime/source/adapters/level_zero/event.cpp b/unified-runtime/source/adapters/level_zero/event.cpp index 1e2708721682c..be61cc914845a 100644 --- a/unified-runtime/source/adapters/level_zero/event.cpp +++ b/unified-runtime/source/adapters/level_zero/event.cpp @@ -1124,9 +1124,10 @@ ur_result_t urEventReleaseInternal(ur_event_handle_t Event) { if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeEventDestroy, (Event->ZeEvent)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } diff --git a/unified-runtime/source/adapters/level_zero/image.cpp b/unified-runtime/source/adapters/level_zero/image.cpp index dc361cd501373..66c599dc20b24 100644 --- a/unified-runtime/source/adapters/level_zero/image.cpp +++ b/unified-runtime/source/adapters/level_zero/image.cpp @@ -312,7 +312,8 @@ ur_result_t urBindlessImagesUnsampledImageHandleDestroyExp( auto item = hDevice->ZeOffsetToImageHandleMap.find(hImage); - if (item != hDevice->ZeOffsetToImageHandleMap.end() && checkL0LoaderTeardown()) { + if (item != hDevice->ZeOffsetToImageHandleMap.end() && + checkL0LoaderTeardown()) { ZE2UR_CALL(zeImageDestroy, (item->second)); hDevice->ZeOffsetToImageHandleMap.erase(item); } else { diff --git a/unified-runtime/source/adapters/level_zero/kernel.cpp b/unified-runtime/source/adapters/level_zero/kernel.cpp index 34d4116111796..217a1915c2337 100644 --- a/unified-runtime/source/adapters/level_zero/kernel.cpp +++ b/unified-runtime/source/adapters/level_zero/kernel.cpp @@ -943,9 +943,10 @@ ur_result_t urKernelRelease( if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeKernelDestroy, (ZeKernel)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } diff --git a/unified-runtime/source/adapters/level_zero/memory.cpp b/unified-runtime/source/adapters/level_zero/memory.cpp index 008cd8bc16dd2..14f0010b6cf33 100644 --- a/unified-runtime/source/adapters/level_zero/memory.cpp +++ b/unified-runtime/source/adapters/level_zero/memory.cpp @@ -1667,9 +1667,10 @@ ur_result_t urMemRelease( auto ZeResult = ZE_CALL_NOCHECK( zeImageDestroy, (ur_cast(ZeHandleImage))); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } diff --git a/unified-runtime/source/adapters/level_zero/program.cpp b/unified-runtime/source/adapters/level_zero/program.cpp index 921aa8e961838..fff3789536bda 100644 --- a/unified-runtime/source/adapters/level_zero/program.cpp +++ b/unified-runtime/source/adapters/level_zero/program.cpp @@ -1037,15 +1037,15 @@ ur_program_handle_t_::ur_program_handle_t_(ur_context_handle_t Context) ur_program_handle_t_::ur_program_handle_t_(state, ur_context_handle_t Context, ze_module_handle_t InteropZeModule) : Context{Context}, NativeProperties{nullptr}, OwnZeModule{true}, - AssociatedDevices({Context->getDevices()[0]}), - InteropZeModule{InteropZeModule} {} + AssociatedDevices({Context->getDevices()[0]}), InteropZeModule{ + InteropZeModule} {} ur_program_handle_t_::ur_program_handle_t_(state, ur_context_handle_t Context, ze_module_handle_t InteropZeModule, bool OwnZeModule) : Context{Context}, NativeProperties{nullptr}, OwnZeModule{OwnZeModule}, - AssociatedDevices({Context->getDevices()[0]}), - InteropZeModule{InteropZeModule} { + AssociatedDevices({Context->getDevices()[0]}), InteropZeModule{ + InteropZeModule} { // TODO: Currently it is not possible to understand the device associated // with provided ZeModule. So we can't set the state on that device to Exe. } diff --git a/unified-runtime/source/adapters/level_zero/queue.cpp b/unified-runtime/source/adapters/level_zero/queue.cpp index 95699c01a1b82..9db0829477195 100644 --- a/unified-runtime/source/adapters/level_zero/queue.cpp +++ b/unified-runtime/source/adapters/level_zero/queue.cpp @@ -648,12 +648,14 @@ ur_result_t urQueueRelease( // runtime. Destroy only if a queue is healthy. Destroying a fence may // cause a hang otherwise. // If the fence is a nullptr we are using immediate commandlists. - if (Queue->Healthy && it->second.ZeFence != nullptr && checkL0LoaderTeardown()) { + if (Queue->Healthy && it->second.ZeFence != nullptr && + checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeFenceDestroy, (it->second.ZeFence)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } @@ -1611,9 +1613,10 @@ ur_result_t urQueueReleaseInternal(ur_queue_handle_t Queue) { if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeCommandQueueDestroy, (ZeQueue)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } diff --git a/unified-runtime/source/adapters/level_zero/sampler.cpp b/unified-runtime/source/adapters/level_zero/sampler.cpp index 0686192b5f76a..59672ebc00bda 100644 --- a/unified-runtime/source/adapters/level_zero/sampler.cpp +++ b/unified-runtime/source/adapters/level_zero/sampler.cpp @@ -132,9 +132,10 @@ ur_result_t urSamplerRelease( if (checkL0LoaderTeardown()) { auto ZeResult = ZE_CALL_NOCHECK(zeSamplerDestroy, (Sampler->ZeSampler)); // Gracefully handle the case that L0 was already unloaded. - if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || ZeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (ZeResult && (ZeResult != ZE_RESULT_ERROR_UNINITIALIZED || + ZeResult != ZE_RESULT_ERROR_UNKNOWN)) return ze2urResult(ZeResult); - if ( ZeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (ZeResult == ZE_RESULT_ERROR_UNKNOWN) { ZeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } diff --git a/unified-runtime/source/adapters/level_zero/v2/common.hpp b/unified-runtime/source/adapters/level_zero/v2/common.hpp index 775965af8047e..2458c7bbda22f 100644 --- a/unified-runtime/source/adapters/level_zero/v2/common.hpp +++ b/unified-runtime/source/adapters/level_zero/v2/common.hpp @@ -82,9 +82,10 @@ struct ze_handle_wrapper { if (ownZeHandle && checkL0LoaderTeardown()) { auto zeResult = destroy(handle); // Gracefully handle the case that L0 was already unloaded. - if (zeResult && (zeResult != ZE_RESULT_ERROR_UNINITIALIZED || zeResult != ZE_RESULT_ERROR_UNKNOWN)) + if (zeResult && (zeResult != ZE_RESULT_ERROR_UNINITIALIZED || + zeResult != ZE_RESULT_ERROR_UNKNOWN)) throw ze2urResult(zeResult); - if ( zeResult == ZE_RESULT_ERROR_UNKNOWN) { + if (zeResult == ZE_RESULT_ERROR_UNKNOWN) { zeResult = ZE_RESULT_ERROR_UNINITIALIZED; } } From 10d2b783c7488eb79ef95205b276eee935441954 Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Wed, 2 Apr 2025 16:56:09 -0700 Subject: [PATCH 07/21] Remove uneeded print Signed-off-by: Neil R. Spruit --- unified-runtime/source/adapters/level_zero/common.hpp | 1 - 1 file changed, 1 deletion(-) diff --git a/unified-runtime/source/adapters/level_zero/common.hpp b/unified-runtime/source/adapters/level_zero/common.hpp index 4e0af9f8b4852..a64f605d3525c 100644 --- a/unified-runtime/source/adapters/level_zero/common.hpp +++ b/unified-runtime/source/adapters/level_zero/common.hpp @@ -41,7 +41,6 @@ struct _ur_platform_handle_t; [[maybe_unused]] static bool checkL0LoaderTeardown() { try { if (!zelCheckIsLoaderInTearDown()) { - logger::debug("ZE ---> checkL0LoaderTeardown: Loader is not in teardown"); return true; } } catch (...) { From c33beab55a06b09f6528f3290796ce09c247fd2d Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Wed, 2 Apr 2025 17:00:02 -0700 Subject: [PATCH 08/21] Use updated L0 static loader Signed-off-by: Neil R. Spruit --- unified-runtime/cmake/FetchLevelZero.cmake | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/unified-runtime/cmake/FetchLevelZero.cmake b/unified-runtime/cmake/FetchLevelZero.cmake index 0824059533e1e..76541abcebfca 100644 --- a/unified-runtime/cmake/FetchLevelZero.cmake +++ b/unified-runtime/cmake/FetchLevelZero.cmake @@ -43,7 +43,7 @@ if (NOT DEFINED LEVEL_ZERO_LIBRARY OR NOT DEFINED LEVEL_ZERO_INCLUDE_DIR) set(UR_LEVEL_ZERO_LOADER_REPO "https://github.com/nrspruit/level-zero.git") endif() if (UR_LEVEL_ZERO_LOADER_TAG STREQUAL "") - set(UR_LEVEL_ZERO_LOADER_TAG cd83892e09c339b1688de3aa67cd902fb277b297) + set(UR_LEVEL_ZERO_LOADER_TAG 85e97d4589824e0a23b6f3ee6aaedab721ae373f) endif() # Disable due to a bug https://github.com/oneapi-src/level-zero/issues/104 From 6880bf5d3558848e83dc8b0b989c666133c2b1bc Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Thu, 3 Apr 2025 10:58:33 -0700 Subject: [PATCH 09/21] Fix formatting issues Signed-off-by: Neil R. Spruit --- unified-runtime/source/adapters/level_zero/program.cpp | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/unified-runtime/source/adapters/level_zero/program.cpp b/unified-runtime/source/adapters/level_zero/program.cpp index fff3789536bda..921aa8e961838 100644 --- a/unified-runtime/source/adapters/level_zero/program.cpp +++ b/unified-runtime/source/adapters/level_zero/program.cpp @@ -1037,15 +1037,15 @@ ur_program_handle_t_::ur_program_handle_t_(ur_context_handle_t Context) ur_program_handle_t_::ur_program_handle_t_(state, ur_context_handle_t Context, ze_module_handle_t InteropZeModule) : Context{Context}, NativeProperties{nullptr}, OwnZeModule{true}, - AssociatedDevices({Context->getDevices()[0]}), InteropZeModule{ - InteropZeModule} {} + AssociatedDevices({Context->getDevices()[0]}), + InteropZeModule{InteropZeModule} {} ur_program_handle_t_::ur_program_handle_t_(state, ur_context_handle_t Context, ze_module_handle_t InteropZeModule, bool OwnZeModule) : Context{Context}, NativeProperties{nullptr}, OwnZeModule{OwnZeModule}, - AssociatedDevices({Context->getDevices()[0]}), InteropZeModule{ - InteropZeModule} { + AssociatedDevices({Context->getDevices()[0]}), + InteropZeModule{InteropZeModule} { // TODO: Currently it is not possible to understand the device associated // with provided ZeModule. So we can't set the state on that device to Exe. } From 39d914e5ac66a817be8adacb008ffc2f03ebe8b3 Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Thu, 3 Apr 2025 11:20:33 -0700 Subject: [PATCH 10/21] Ensure bindless image is removed even if destroy fails Signed-off-by: Neil R. Spruit --- unified-runtime/source/adapters/level_zero/image.cpp | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/unified-runtime/source/adapters/level_zero/image.cpp b/unified-runtime/source/adapters/level_zero/image.cpp index 66c599dc20b24..c493069ed4205 100644 --- a/unified-runtime/source/adapters/level_zero/image.cpp +++ b/unified-runtime/source/adapters/level_zero/image.cpp @@ -312,9 +312,10 @@ ur_result_t urBindlessImagesUnsampledImageHandleDestroyExp( auto item = hDevice->ZeOffsetToImageHandleMap.find(hImage); - if (item != hDevice->ZeOffsetToImageHandleMap.end() && - checkL0LoaderTeardown()) { - ZE2UR_CALL(zeImageDestroy, (item->second)); + if (item != hDevice->ZeOffsetToImageHandleMap.end()) { + if (checkL0LoaderTeardown()) { + ZE2UR_CALL(zeImageDestroy, (item->second)); + } hDevice->ZeOffsetToImageHandleMap.erase(item); } else { return UR_RESULT_ERROR_INVALID_NULL_HANDLE; From 0ab2423d826f2da6cc77a570eb2c64569adbba96 Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Thu, 3 Apr 2025 12:06:42 -0700 Subject: [PATCH 11/21] Use the offical commit of the Loader Signed-off-by: Neil R. Spruit --- unified-runtime/cmake/FetchLevelZero.cmake | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/unified-runtime/cmake/FetchLevelZero.cmake b/unified-runtime/cmake/FetchLevelZero.cmake index 76541abcebfca..db0ff95e1ed29 100644 --- a/unified-runtime/cmake/FetchLevelZero.cmake +++ b/unified-runtime/cmake/FetchLevelZero.cmake @@ -40,10 +40,10 @@ if (NOT DEFINED LEVEL_ZERO_LIBRARY OR NOT DEFINED LEVEL_ZERO_INCLUDE_DIR) set(BUILD_STATIC ON) if (UR_LEVEL_ZERO_LOADER_REPO STREQUAL "") - set(UR_LEVEL_ZERO_LOADER_REPO "https://github.com/nrspruit/level-zero.git") + set(UR_LEVEL_ZERO_LOADER_REPO "https://github.com/oneapi-src/level-zero.git") endif() if (UR_LEVEL_ZERO_LOADER_TAG STREQUAL "") - set(UR_LEVEL_ZERO_LOADER_TAG 85e97d4589824e0a23b6f3ee6aaedab721ae373f) + set(UR_LEVEL_ZERO_LOADER_TAG ecfe375b30cc04265b20ac1b7996a85d0910f3ed) endif() # Disable due to a bug https://github.com/oneapi-src/level-zero/issues/104 From c414b964c992e260244940557ca5418990eefebc Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Thu, 3 Apr 2025 13:21:22 -0700 Subject: [PATCH 12/21] Remove interop handle tracking support Signed-off-by: Neil R. Spruit --- .../source/adapters/level_zero/v2/memory.cpp | 24 +++++++------------ .../source/adapters/level_zero/v2/memory.hpp | 9 +++---- .../adapters/level_zero/v2/queue_create.cpp | 2 +- .../adapters/level_zero/v2/queue_handle.hpp | 2 -- .../v2/queue_immediate_in_order.cpp | 6 ++--- .../v2/queue_immediate_in_order.hpp | 2 +- 6 files changed, 16 insertions(+), 29 deletions(-) diff --git a/unified-runtime/source/adapters/level_zero/v2/memory.cpp b/unified-runtime/source/adapters/level_zero/v2/memory.cpp index 9447d93f55331..cfee4803ba9c7 100644 --- a/unified-runtime/source/adapters/level_zero/v2/memory.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/memory.cpp @@ -107,11 +107,10 @@ ur_integrated_buffer_handle_t::ur_integrated_buffer_handle_t( ur_integrated_buffer_handle_t::ur_integrated_buffer_handle_t( ur_context_handle_t hContext, void *hostPtr, size_t size, - device_access_mode_t accessMode, bool ownHostPtr, bool interopNativeHandle) + device_access_mode_t accessMode, bool ownHostPtr) : ur_mem_buffer_t(hContext, size, accessMode) { - this->IsInteropNativeHandle = interopNativeHandle; this->ptr = - usm_unique_ptr_t(hostPtr, [hContext, ownHostPtr, this](void *ptr) { + usm_unique_ptr_t(hostPtr, [hContext, ownHostPtr](void *ptr) { if (!ownHostPtr || !checkL0LoaderTeardown()) { return; } @@ -222,7 +221,7 @@ ur_discrete_buffer_handle_t::ur_discrete_buffer_handle_t( ur_discrete_buffer_handle_t::ur_discrete_buffer_handle_t( ur_context_handle_t hContext, ur_device_handle_t hDevice, void *devicePtr, size_t size, device_access_mode_t accessMode, void *writeBackMemory, - bool ownZePtr, bool interopNativeHandle) + bool ownZePtr) : ur_mem_buffer_t(hContext, size, accessMode), deviceAllocations(hContext->getPlatform()->getNumDevices()), activeAllocationDevice(hDevice), writeBackPtr(writeBackMemory), @@ -233,9 +232,8 @@ ur_discrete_buffer_handle_t::ur_discrete_buffer_handle_t( devicePtr = allocateOnDevice(hDevice, size); } else { assert(hDevice); - this->IsInteropNativeHandle = interopNativeHandle; deviceAllocations[hDevice->Id.value()] = usm_unique_ptr_t( - devicePtr, [this, hContext = this->hContext, ownZePtr](void *ptr) { + devicePtr, [hContext = this->hContext, ownZePtr](void *ptr) { if (!ownZePtr || !checkL0LoaderTeardown()) { return; } @@ -466,10 +464,8 @@ ur_mem_image_t::ur_mem_image_t(ur_context_handle_t hContext, ur_mem_image_t::ur_mem_image_t(ur_context_handle_t hContext, const ur_image_format_t *pImageFormat, const ur_image_desc_t *pImageDesc, - ze_image_handle_t zeImage, bool ownZeImage, - bool interopNativeHandle) + ze_image_handle_t zeImage, bool ownZeImage) : hContext(hContext), zeImage(zeImage, ownZeImage) { - this->IsInteropNativeHandle = interopNativeHandle; UR_CALL_THROWS(ur2zeImageDesc(pImageFormat, pImageDesc, zeImageDesc)); } @@ -611,7 +607,7 @@ ur_result_t urMemBufferCreateWithNativeHandle( if (useHostBuffer(hContext) && memoryAttrs.type == ZE_MEMORY_TYPE_HOST) { *phMem = ur_mem_handle_t_::create( - hContext, ptr, size, accessMode, ownNativeHandle, true); + hContext, ptr, size, accessMode, ownNativeHandle); // if useHostBuffer(hContext) is true but the allocation is on device, we'll // treat it as discrete memory } else if (memoryAttrs.type == ZE_MEMORY_TYPE_SHARED) { @@ -623,14 +619,12 @@ ur_result_t urMemBufferCreateWithNativeHandle( // For host allocation, we need to copy the data to a device buffer // and then copy it back on release *phMem = ur_mem_handle_t_::create( - hContext, hDevice, nullptr, size, accessMode, ptr, ownNativeHandle, - true); + hContext, hDevice, nullptr, size, accessMode, ptr, ownNativeHandle); } else { // For device allocation, we can use it directly assert(hDevice); *phMem = ur_mem_handle_t_::create( - hContext, hDevice, ptr, size, accessMode, nullptr, ownNativeHandle, - true); + hContext, hDevice, ptr, size, accessMode, nullptr, ownNativeHandle); } } @@ -740,7 +734,7 @@ ur_result_t urMemImageCreateWithNativeHandle( bool ownNativeHandle = pProperties ? pProperties->isNativeHandleOwned : false; *phMem = ur_mem_handle_t_::create( - hContext, pImageFormat, pImageDesc, zeImage, ownNativeHandle, true); + hContext, pImageFormat, pImageDesc, zeImage, ownNativeHandle); return UR_RESULT_SUCCESS; } catch (...) { return exceptionToResult(std::current_exception()); diff --git a/unified-runtime/source/adapters/level_zero/v2/memory.hpp b/unified-runtime/source/adapters/level_zero/v2/memory.hpp index 28f78bdc15915..f616b7c67f953 100644 --- a/unified-runtime/source/adapters/level_zero/v2/memory.hpp +++ b/unified-runtime/source/adapters/level_zero/v2/memory.hpp @@ -20,8 +20,6 @@ using usm_unique_ptr_t = std::unique_ptr>; struct ur_mem_buffer_t : _ur_object { - // Indicates if this object is an interop handle. - bool IsInteropNativeHandle = false; enum class device_access_mode_t { read_write, read_only, write_only }; @@ -85,7 +83,7 @@ struct ur_integrated_buffer_handle_t : ur_mem_buffer_t { ur_integrated_buffer_handle_t(ur_context_handle_t hContext, void *hostPtr, size_t size, device_access_mode_t accesMode, - bool ownHostPtr, bool interopNativeHandle); + bool ownHostPtr); void * getDevicePtr(ur_device_handle_t, device_access_mode_t, size_t offset, @@ -125,8 +123,7 @@ struct ur_discrete_buffer_handle_t : ur_mem_buffer_t { ur_discrete_buffer_handle_t(ur_context_handle_t hContext, ur_device_handle_t hDevice, void *devicePtr, size_t size, device_access_mode_t accesMode, - void *writeBackMemory, bool ownDevicePtr, - bool interopNativeHandle); + void *writeBackMemory, bool ownDevicePtr); void * getDevicePtr(ur_device_handle_t, device_access_mode_t, size_t offset, @@ -206,7 +203,7 @@ struct ur_mem_image_t : _ur_object { const ur_image_desc_t *pImageDesc, void *pHost); ur_mem_image_t(ur_context_handle_t, const ur_image_format_t *pImageFormat, const ur_image_desc_t *pImageDesc, ze_image_handle_t zeImage, - bool ownZeImage, bool interopNativeHandle); + bool ownZeImage); ze_image_handle_t getZeImage() const { return zeImage.get(); } diff --git a/unified-runtime/source/adapters/level_zero/v2/queue_create.cpp b/unified-runtime/source/adapters/level_zero/v2/queue_create.cpp index 9f45404b7e39d..7e2de8b1b647b 100644 --- a/unified-runtime/source/adapters/level_zero/v2/queue_create.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/queue_create.cpp @@ -59,7 +59,7 @@ ur_result_t urQueueCreateWithNativeHandle( } *phQueue = ur_queue_handle_t_::create( - hContext, hDevice, hNativeQueue, flags, ownNativeHandle, true); + hContext, hDevice, hNativeQueue, flags, ownNativeHandle); return UR_RESULT_SUCCESS; } catch (...) { diff --git a/unified-runtime/source/adapters/level_zero/v2/queue_handle.hpp b/unified-runtime/source/adapters/level_zero/v2/queue_handle.hpp index 26b58c9c6c4cd..a17d304ea5cfc 100644 --- a/unified-runtime/source/adapters/level_zero/v2/queue_handle.hpp +++ b/unified-runtime/source/adapters/level_zero/v2/queue_handle.hpp @@ -49,6 +49,4 @@ struct ur_queue_handle_t_ { }, queue_data); } - // Indicates if this object is an interop handle. - bool IsInteropNativeHandle = false; }; diff --git a/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp b/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp index 0360293b7804b..07eb5381a48c6 100644 --- a/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.cpp @@ -77,15 +77,13 @@ ur_queue_immediate_in_order_t::ur_queue_immediate_in_order_t( ur_queue_immediate_in_order_t::ur_queue_immediate_in_order_t( ur_context_handle_t hContext, ur_device_handle_t hDevice, - ur_native_handle_t hNativeHandle, ur_queue_flags_t flags, bool ownZeQueue, - bool interopNativeHandle) + ur_native_handle_t hNativeHandle, ur_queue_flags_t flags, bool ownZeQueue) : hContext(hContext), hDevice(hDevice), flags(flags), commandListManager( hContext, hDevice, raii::command_list_unique_handle( reinterpret_cast(hNativeHandle), - [ownZeQueue, - interopNativeHandle](ze_command_list_handle_t hZeCommandList) { + [ownZeQueue](ze_command_list_handle_t hZeCommandList) { if (ownZeQueue) { if (checkL0LoaderTeardown()) { ZE_CALL_NOCHECK(zeCommandListDestroy, (hZeCommandList)); diff --git a/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.hpp b/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.hpp index 75fd7aba89b2f..7ddd96fd9ff91 100644 --- a/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.hpp +++ b/unified-runtime/source/adapters/level_zero/v2/queue_immediate_in_order.hpp @@ -72,7 +72,7 @@ struct ur_queue_immediate_in_order_t : _ur_object, public ur_queue_t_ { const ur_queue_properties_t *); ur_queue_immediate_in_order_t(ur_context_handle_t, ur_device_handle_t, ur_native_handle_t, ur_queue_flags_t, - bool ownZeQueue, bool interopNativeHandle); + bool ownZeQueue); ~ur_queue_immediate_in_order_t(); From 78cf7a9ae993fe30f44edf1d552180544c229fa1 Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Thu, 3 Apr 2025 13:57:09 -0700 Subject: [PATCH 13/21] Fix formatting Signed-off-by: Neil R. Spruit --- .../source/adapters/level_zero/v2/memory.cpp | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/unified-runtime/source/adapters/level_zero/v2/memory.cpp b/unified-runtime/source/adapters/level_zero/v2/memory.cpp index cfee4803ba9c7..d2025da38b62c 100644 --- a/unified-runtime/source/adapters/level_zero/v2/memory.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/memory.cpp @@ -109,13 +109,12 @@ ur_integrated_buffer_handle_t::ur_integrated_buffer_handle_t( ur_context_handle_t hContext, void *hostPtr, size_t size, device_access_mode_t accessMode, bool ownHostPtr) : ur_mem_buffer_t(hContext, size, accessMode) { - this->ptr = - usm_unique_ptr_t(hostPtr, [hContext, ownHostPtr](void *ptr) { - if (!ownHostPtr || !checkL0LoaderTeardown()) { - return; - } - ZE_CALL_NOCHECK(zeMemFree, (hContext->getZeHandle(), ptr)); - }); + this->ptr = usm_unique_ptr_t(hostPtr, [hContext, ownHostPtr](void *ptr) { + if (!ownHostPtr || !checkL0LoaderTeardown()) { + return; + } + ZE_CALL_NOCHECK(zeMemFree, (hContext->getZeHandle(), ptr)); + }); } void *ur_integrated_buffer_handle_t::getDevicePtr( From ddae783f751161cdc5edae592ae52b5265c87396 Mon Sep 17 00:00:00 2001 From: "Neil R. Spruit" Date: Thu, 3 Apr 2025 14:08:11 -0700 Subject: [PATCH 14/21] Remove remaining use of isInteropHandle Signed-off-by: Neil R. Spruit --- unified-runtime/source/adapters/level_zero/common.hpp | 3 --- unified-runtime/source/adapters/level_zero/context.cpp | 1 - unified-runtime/source/adapters/level_zero/device.cpp | 1 - unified-runtime/source/adapters/level_zero/event.cpp | 1 - unified-runtime/source/adapters/level_zero/kernel.cpp | 1 - unified-runtime/source/adapters/level_zero/memory.cpp | 2 -- unified-runtime/source/adapters/level_zero/program.cpp | 1 - unified-runtime/source/adapters/level_zero/queue.cpp | 1 - unified-runtime/source/adapters/level_zero/v2/common.hpp | 1 - unified-runtime/source/adapters/level_zero/v2/context.cpp | 1 - unified-runtime/source/adapters/level_zero/v2/event.cpp | 1 - unified-runtime/source/adapters/level_zero/v2/kernel.cpp | 1 - 12 files changed, 15 deletions(-) diff --git a/unified-runtime/source/adapters/level_zero/common.hpp b/unified-runtime/source/adapters/level_zero/common.hpp index a64f605d3525c..592631901fd24 100644 --- a/unified-runtime/source/adapters/level_zero/common.hpp +++ b/unified-runtime/source/adapters/level_zero/common.hpp @@ -280,9 +280,6 @@ struct _ur_object { // Indicates if we own the native handle or it came from interop that // asked to not transfer the ownership to SYCL RT. bool OwnNativeHandle = false; - - // Indicates if this object is an interop handle. - bool IsInteropNativeHandle = false; }; // Record for a memory allocation. This structure is used to keep information diff --git a/unified-runtime/source/adapters/level_zero/context.cpp b/unified-runtime/source/adapters/level_zero/context.cpp index dfd0f4a9d8a4a..6cbd9a0709cdc 100644 --- a/unified-runtime/source/adapters/level_zero/context.cpp +++ b/unified-runtime/source/adapters/level_zero/context.cpp @@ -152,7 +152,6 @@ ur_result_t urContextCreateWithNativeHandle( ur_context_handle_t_ *UrContext = new ur_context_handle_t_( ZeContext, NumDevices, Devices, OwnNativeHandle); UrContext->initialize(); - UrContext->IsInteropNativeHandle = true; *Context = reinterpret_cast(UrContext); } catch (const std::bad_alloc &) { return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY; diff --git a/unified-runtime/source/adapters/level_zero/device.cpp b/unified-runtime/source/adapters/level_zero/device.cpp index 00a5cc4b2d82a..73d8e6e32eda7 100644 --- a/unified-runtime/source/adapters/level_zero/device.cpp +++ b/unified-runtime/source/adapters/level_zero/device.cpp @@ -1536,7 +1536,6 @@ ur_result_t urDeviceCreateWithNativeHandle( if (Dev == nullptr) return UR_RESULT_ERROR_INVALID_VALUE; - Dev->IsInteropNativeHandle = true; *Device = Dev; return UR_RESULT_SUCCESS; } diff --git a/unified-runtime/source/adapters/level_zero/event.cpp b/unified-runtime/source/adapters/level_zero/event.cpp index be61cc914845a..977d0b28a9b3f 100644 --- a/unified-runtime/source/adapters/level_zero/event.cpp +++ b/unified-runtime/source/adapters/level_zero/event.cpp @@ -1001,7 +1001,6 @@ ur_result_t urEventCreateWithNativeHandle( UREvent->CleanedUp = true; *Event = reinterpret_cast(UREvent); - UREvent->IsInteropNativeHandle = true; return UR_RESULT_SUCCESS; } diff --git a/unified-runtime/source/adapters/level_zero/kernel.cpp b/unified-runtime/source/adapters/level_zero/kernel.cpp index 217a1915c2337..3a54f4576f822 100644 --- a/unified-runtime/source/adapters/level_zero/kernel.cpp +++ b/unified-runtime/source/adapters/level_zero/kernel.cpp @@ -1160,7 +1160,6 @@ ur_result_t urKernelCreateWithNativeHandle( } Kernel->Program = Program; - Kernel->IsInteropNativeHandle = true; UR_CALL(Kernel->initialize()); diff --git a/unified-runtime/source/adapters/level_zero/memory.cpp b/unified-runtime/source/adapters/level_zero/memory.cpp index 14f0010b6cf33..cb9e90ceb1509 100644 --- a/unified-runtime/source/adapters/level_zero/memory.cpp +++ b/unified-runtime/source/adapters/level_zero/memory.cpp @@ -1563,7 +1563,6 @@ ur_result_t urMemImageCreateWithNativeHandle( auto OwnNativeHandle = Properties ? Properties->isNativeHandleOwned : false; UR_CALL(createUrMemFromZeImage(Context, ZeHImage, OwnNativeHandle, ZeImageDesc, Mem)); - (*Mem)->IsInteropNativeHandle = true; return UR_RESULT_SUCCESS; } @@ -1779,7 +1778,6 @@ ur_result_t urMemBufferCreateWithNativeHandle( Buffer = new _ur_buffer(Context, Size, Device, ur_cast(NativeMem), OwnNativeHandle); *Mem = reinterpret_cast(Buffer); - (*Mem)->IsInteropNativeHandle = true; } catch (const std::bad_alloc &) { return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY; } catch (...) { diff --git a/unified-runtime/source/adapters/level_zero/program.cpp b/unified-runtime/source/adapters/level_zero/program.cpp index 921aa8e961838..c2bae5d971d9a 100644 --- a/unified-runtime/source/adapters/level_zero/program.cpp +++ b/unified-runtime/source/adapters/level_zero/program.cpp @@ -966,7 +966,6 @@ ur_result_t urProgramCreateWithNativeHandle( ur_program_handle_t_::Exe, Context, ZeModule, Properties ? Properties->isNativeHandleOwned : false); *Program = reinterpret_cast(UrProgram); - (*Program)->IsInteropNativeHandle = true; } catch (const std::bad_alloc &) { return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY; } catch (...) { diff --git a/unified-runtime/source/adapters/level_zero/queue.cpp b/unified-runtime/source/adapters/level_zero/queue.cpp index 9db0829477195..8ae288340918e 100644 --- a/unified-runtime/source/adapters/level_zero/queue.cpp +++ b/unified-runtime/source/adapters/level_zero/queue.cpp @@ -805,7 +805,6 @@ ur_result_t urQueueCreateWithNativeHandle( ur_queue_handle_t_ *Queue = new ur_queue_handle_t_( ComputeQueues, CopyQueues, Context, UrDevice, OwnNativeHandle, Flags); *RetQueue = reinterpret_cast(Queue); - (*RetQueue)->IsInteropNativeHandle = true; } catch (const std::bad_alloc &) { return UR_RESULT_ERROR_OUT_OF_RESOURCES; } catch (...) { diff --git a/unified-runtime/source/adapters/level_zero/v2/common.hpp b/unified-runtime/source/adapters/level_zero/v2/common.hpp index 2458c7bbda22f..23a2495d19b83 100644 --- a/unified-runtime/source/adapters/level_zero/v2/common.hpp +++ b/unified-runtime/source/adapters/level_zero/v2/common.hpp @@ -106,7 +106,6 @@ struct ze_handle_wrapper { private: ZeHandleT handle; bool ownZeHandle; - bool IsInteropNativeHandle = false; }; using ze_kernel_handle_t = HANDLE_WRAPPER_TYPE(::ze_kernel_handle_t, diff --git a/unified-runtime/source/adapters/level_zero/v2/context.cpp b/unified-runtime/source/adapters/level_zero/v2/context.cpp index 5fe913d878201..33b7f437215b0 100644 --- a/unified-runtime/source/adapters/level_zero/v2/context.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/context.cpp @@ -152,7 +152,6 @@ ur_result_t urContextCreateWithNativeHandle( *phContext = new ur_context_handle_t_(zeContext, numDevices, phDevices, ownZeHandle); - (*phContext)->IsInteropNativeHandle = true; return UR_RESULT_SUCCESS; } catch (...) { return exceptionToResult(std::current_exception()); diff --git a/unified-runtime/source/adapters/level_zero/v2/event.cpp b/unified-runtime/source/adapters/level_zero/v2/event.cpp index 68c0d439dce7d..a26fc1a2a45dd 100644 --- a/unified-runtime/source/adapters/level_zero/v2/event.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/event.cpp @@ -398,7 +398,6 @@ urEventCreateWithNativeHandle(ur_native_handle_t hNativeEvent, ZE2UR_CALL(zeEventHostSignal, ((*phEvent)->getZeEvent())); } else { *phEvent = new ur_event_handle_t_(hContext, hNativeEvent, pProperties); - (*phEvent)->IsInteropNativeHandle = true; } return UR_RESULT_SUCCESS; } catch (...) { diff --git a/unified-runtime/source/adapters/level_zero/v2/kernel.cpp b/unified-runtime/source/adapters/level_zero/v2/kernel.cpp index 5f8bad0acdbc8..6d475552279d8 100644 --- a/unified-runtime/source/adapters/level_zero/v2/kernel.cpp +++ b/unified-runtime/source/adapters/level_zero/v2/kernel.cpp @@ -361,7 +361,6 @@ urKernelCreateWithNativeHandle(ur_native_handle_t hNativeKernel, *phKernel = new ur_kernel_handle_t_(hNativeKernel, hProgram, hContext, pProperties); - (*phKernel)->IsInteropNativeHandle = true; return UR_RESULT_SUCCESS; } catch (...) { return exceptionToResult(std::current_exception()); From 40d6a35fa841a73500dfe7b0bbd82efbb85aaeea Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Wed, 9 Apr 2025 11:49:29 -0700 Subject: [PATCH 15/21] test fix and expand usage of checkL0LoaderTeardown to event status retrieval --- .../Regression/static-buffer-dtor.cpp | 20 +++++++++++++++---- .../source/adapters/level_zero/event.cpp | 2 +- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/sycl/test-e2e/Regression/static-buffer-dtor.cpp b/sycl/test-e2e/Regression/static-buffer-dtor.cpp index 8cc1ddce43501..b338a26ca3826 100644 --- a/sycl/test-e2e/Regression/static-buffer-dtor.cpp +++ b/sycl/test-e2e/Regression/static-buffer-dtor.cpp @@ -27,16 +27,28 @@ #include int main() { - uint8_t *h_A = (uint8_t *)malloc(256); + static sycl::buffer bufs[2] = {sycl::range<1>(256), sycl::range<1>(256)}; sycl::queue q; q.submit([&](sycl::handler &cgh) { - cgh.copy(h_A, bufs[0].get_access(cgh)); + auto acc = bufs[0].get_access(cgh); + cgh.single_task([=]() { + for (int i = 0; i < 256; i++) { + acc[i] = 24; + } + }); }); + q.submit([&](sycl::handler &cgh) { - cgh.copy(h_A, bufs[1].get_access(cgh)); + auto acc = bufs[1].get_access(cgh); + cgh.single_task([=]() { + for (int i = 0; i < 256; i++) { + acc[i] = 25; + } + }); }); - free(h_A); + + //no q.wait() return 0; } diff --git a/unified-runtime/source/adapters/level_zero/event.cpp b/unified-runtime/source/adapters/level_zero/event.cpp index 977d0b28a9b3f..262d1aeb32ea0 100644 --- a/unified-runtime/source/adapters/level_zero/event.cpp +++ b/unified-runtime/source/adapters/level_zero/event.cpp @@ -509,7 +509,7 @@ ur_result_t urEventGetInfo( auto HostVisibleEvent = Event->HostVisibleEvent; if (Event->Completed) { Result = UR_EVENT_STATUS_COMPLETE; - } else if (HostVisibleEvent) { + } else if (HostVisibleEvent && checkL0LoaderTeardown()) { ze_result_t ZeResult; ZeResult = ZE_CALL_NOCHECK(zeEventQueryStatus, (HostVisibleEvent->ZeEvent)); From 55c980c4949fd2ac548527352cc052c4fb138c0b Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Wed, 9 Apr 2025 15:24:18 -0700 Subject: [PATCH 16/21] clang-format always has the last say --- sycl/test-e2e/Regression/static-buffer-dtor.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/test-e2e/Regression/static-buffer-dtor.cpp b/sycl/test-e2e/Regression/static-buffer-dtor.cpp index b338a26ca3826..ee8453b264089 100644 --- a/sycl/test-e2e/Regression/static-buffer-dtor.cpp +++ b/sycl/test-e2e/Regression/static-buffer-dtor.cpp @@ -49,6 +49,6 @@ int main() { }); }); - //no q.wait() + // no q.wait() return 0; } From c78aaa6ee34b556d7183b4668c8d1ece9089329f Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Wed, 9 Apr 2025 18:52:39 -0700 Subject: [PATCH 17/21] urQueueRelease has too many unqualified calls to ZE_CALL, both directly and via other calls (Queue->synchronize(), et al). Putting the loader teardown check seems to fix the crash. Once the loader is properly disposed of late, these checks should not cause us to deviate from the normal execution path. --- unified-runtime/source/adapters/level_zero/queue.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/unified-runtime/source/adapters/level_zero/queue.cpp b/unified-runtime/source/adapters/level_zero/queue.cpp index 6cedd3e5bfbcd..cc30653322802 100644 --- a/unified-runtime/source/adapters/level_zero/queue.cpp +++ b/unified-runtime/source/adapters/level_zero/queue.cpp @@ -599,6 +599,9 @@ ur_result_t urQueueRetain( ur_result_t urQueueRelease( /// [in] handle of the queue object to release ur_queue_handle_t Queue) { + if (!checkL0LoaderTeardown()) + return UR_RESULT_SUCCESS; + std::vector EventListToCleanup; { std::scoped_lock Lock(Queue->Mutex); From 9607f1915f861a2bbffd0361bafa5e83653b208e Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Thu, 10 Apr 2025 10:09:25 -0700 Subject: [PATCH 18/21] bumping L0 tag as instructed by Neil --- unified-runtime/cmake/FetchLevelZero.cmake | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/unified-runtime/cmake/FetchLevelZero.cmake b/unified-runtime/cmake/FetchLevelZero.cmake index bfb3f41ebc253..6c2cee33135c5 100644 --- a/unified-runtime/cmake/FetchLevelZero.cmake +++ b/unified-runtime/cmake/FetchLevelZero.cmake @@ -47,7 +47,7 @@ if (NOT DEFINED LEVEL_ZERO_LIBRARY OR NOT DEFINED LEVEL_ZERO_INCLUDE_DIR) set(UR_LEVEL_ZERO_LOADER_REPO "https://github.com/oneapi-src/level-zero.git") endif() if (UR_LEVEL_ZERO_LOADER_TAG STREQUAL "") - set(UR_LEVEL_ZERO_LOADER_TAG ecfe375b30cc04265b20ac1b7996a85d0910f3ed) + set(UR_LEVEL_ZERO_LOADER_TAG a510259fb7490ab35841c0ed72986b01464b502c) endif() # Disable due to a bug https://github.com/oneapi-src/level-zero/issues/104 From 2784062d926ef4d72caeabc3623779c8373a8f06 Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Mon, 14 Apr 2025 14:02:05 -0700 Subject: [PATCH 19/21] fix for last error (hopefully) --- unified-runtime/source/adapters/opencl/event.cpp | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/unified-runtime/source/adapters/opencl/event.cpp b/unified-runtime/source/adapters/opencl/event.cpp index 1717d75dfceda..afd90355f7f20 100644 --- a/unified-runtime/source/adapters/opencl/event.cpp +++ b/unified-runtime/source/adapters/opencl/event.cpp @@ -181,7 +181,10 @@ UR_APIEXPORT ur_result_t UR_APICALL urEventGetInfo(ur_event_handle_t hEvent, size_t CheckPropSize = 0; cl_int RetErr = clGetEventInfo(hEvent->CLEvent, CLEventInfo, propSize, pPropValue, &CheckPropSize); - if (pPropValue && CheckPropSize != propSize) { + if (pPropValue && CheckPropSize != propSize && + propName != UR_EVENT_INFO_COMMAND_EXECUTION_STATUS) { + // Opencl:cpu may (incorrectly) return 0 for propSize when checking + // execution status when statu is CL_COMPLETE. return UR_RESULT_ERROR_INVALID_SIZE; } CL_RETURN_ON_FAILURE(RetErr); From ad92047ba7b6524ff29a4c5167c67d6ebe66629a Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Tue, 15 Apr 2025 16:50:32 -0700 Subject: [PATCH 20/21] simplify --- sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp b/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp index ab9059ce98976..a051225a6af20 100644 --- a/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp +++ b/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp @@ -1,5 +1,5 @@ // RUN: %{build} -D__SYCL_INTERNAL_API -o %t.out -// RUN: %{run-unfiltered-devices} %t.out +// RUN: %{run} %t.out //==-------- queue_old_interop.cpp - SYCL queue OpenCL interop test --------==// // From 9cdfdd209f960fa485930cb9f2d2f02a044cb0bd Mon Sep 17 00:00:00 2001 From: Chris Perkins Date: Tue, 22 Apr 2025 11:27:32 -0700 Subject: [PATCH 21/21] reviewer feedback --- sycl/doc/design/GlobalObjectsInRuntime.md | 46 +++++++++++------------ sycl/source/detail/global_handler.cpp | 15 ++++---- 2 files changed, 30 insertions(+), 31 deletions(-) diff --git a/sycl/doc/design/GlobalObjectsInRuntime.md b/sycl/doc/design/GlobalObjectsInRuntime.md index d3c66a4f6ec0e..1dff42e323939 100644 --- a/sycl/doc/design/GlobalObjectsInRuntime.md +++ b/sycl/doc/design/GlobalObjectsInRuntime.md @@ -57,10 +57,10 @@ destruction of nested `std::unique_ptr`s. ### Shutdown Tasks and Challenges -As the user's app ends, SYCL's primary goal is to release any UR adapters that -have been gotten, and teardown the plugins/adapters themselves. Additionally, -we need to stop deferring any new buffer releases and clean up any memory -whose release was deferred. +As the user's app ends, the SYCL runtime is responsible for releasing any UR +adapters that have been gotten, and teardown the plugins/adapters themselves. +Additionally, we need to stop deferring any new buffer releases and clean up +any memory whose release was deferred. To this end, the shutdown occurs in two phases: early and late. The purpose for early shutdown is primarily to stop any further deferring of memory release. @@ -81,18 +81,18 @@ adapters are let go, as is the GlobalHandler itself. #### Threads The deferred memory marshalling is built on a thread pool, but there is a -challenge here in that on Windows, once the end of the users main() is reached +challenge here in that on Windows, once the end of the users `main()` is reached and their app is shutting down, the Windows OS will abandon all remaining -in-flight threads. These threads can be .join() but they simply return instantly, +in-flight threads. These threads can be `.join()` but they simply return instantly, the threads are not completed. Further any thread specific variables -(or thread_local static vars) will NOT have their destructors called. Note +(or `thread_local static` vars) will NOT have their destructors called. Note that the standard while-loop-over-condition-var pattern will cause a hang - we cannot "wait" on abandoned threads. On Windows, short of adding some user called API to signal this, there is no way to detect or avoid this. None of the "end-of-library" lifecycle events -occurs before the threads are abandoned. ( not std::atexit(), not globals or -static, or static thread_local var destruction, not DllMain(DLL_PROCESS_DETACH) ) -This means that on Windows, once we arrive at shutdown_early we cannot wait on +occurs before the threads are abandoned. ( not `std::atexit()`, not globals or +`static`, or `static thread_local` var destruction, not `DllMain(DLL_PROCESS_DETACH)` ) +This means that on Windows, once we arrive at `shutdown_early()` we cannot wait on host events or the thread pool. For the deferred memory itself, there is no issue here. The Windows OS will @@ -102,18 +102,18 @@ shared pointers will not work in any thread that is abandoned on Windows. One last note about threads. It is entirely the OS's discretion when to start or schedule a thread. If the main process is very busy then it is -possible that threads the SYCL library creates (host_tasks/thread_pool) -won't even be started until AFTER the host application main() function is done. -This is not a normal occurrence, but it can happen if there is no call to queue.wait() +possible that threads the SYCL library creates (`host_tasks`/`thread_pool`) +won't even be started until AFTER the host application `main()` function is done. +This is not a normal occurrence, but it can happen if there is no call to `queue.wait()` ### Linux -On Linux, the "early_shutdown()" is begun by the destruction of a static -StaticVarShutdownHandler object, which is initialized by -platform::get_platforms(). +On Linux, the `early_shutdown()` is begun by the destruction of a static +`StaticVarShutdownHandler` object, which is initialized by +`platform::get_platforms()`. -late_shutdown() timing uses `__attribute__((destructor))` property with low +`late_shutdown()` timing uses `__attribute__((destructor))` property with low priority value 110. This approach does not guarantee, that `GlobalHandler` destructor is the last thing to run, as user code may contain a similar function with the same priority value. At the same time, users may specify priorities @@ -128,14 +128,14 @@ times, the memory leak may impact code performance. ### Windows -Differing from Linux, on Windows the "early_shutdown()" is begun by -DllMain(PROCESS_DETACH), unless statically linked. +Differing from Linux, on Windows the `early_shutdown()` is begun by +`DllMain(PROCESS_DETACH)`, unless statically linked. -The "late_shutdown()" is begun by the destruction of a -static StaticVarShutdownHandler object, which is initialized by -platform::get_platforms(). ( On linux, this is when we do "early_shutdown()". +The `late_shutdown()` is begun by the destruction of a +static `StaticVarShutdownHandler` object, which is initialized by +`platform::get_platforms()`. ( On linux, this is when we do `early_shutdown()`. Go figure.) This is as late as we can manage, but it is later than any user -application global, static, or thread_local variable destruction. +application global, `static`, or `thread_local` variable destruction. ### Recommendations for DPC++ runtime developers diff --git a/sycl/source/detail/global_handler.cpp b/sycl/source/detail/global_handler.cpp index f1cfe7af2cfca..b0be1bcf4db5f 100644 --- a/sycl/source/detail/global_handler.cpp +++ b/sycl/source/detail/global_handler.cpp @@ -431,14 +431,13 @@ BOOL isLinkedStatically() { if (!GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, LpModuleAddr, &hModule)) { return true; // not retrievable, therefore statically linked - } else { - char dllPath[MAX_PATH]; - if (GetModuleFileNameA(hModule, dllPath, MAX_PATH)) { - char exePath[MAX_PATH]; - if (GetModuleFileNameA(NULL, exePath, MAX_PATH)) { - if (std::string(dllPath) == std::string(exePath)) { - return true; // paths identical, therefore statically linked - } + } + char dllPath[MAX_PATH]; + if (GetModuleFileNameA(hModule, dllPath, MAX_PATH)) { + char exePath[MAX_PATH]; + if (GetModuleFileNameA(NULL, exePath, MAX_PATH)) { + if (std::string(dllPath) == std::string(exePath)) { + return true; // paths identical, therefore statically linked } } }