Skip to content

Commit 51c8fc4

Browse files
authored
[rocBLAS] solution library per gfx (#4781)
### Motivation On systems with multiple GPUs of different architectures (e.g. gfx90a and gfx942), rocBLAS previously used a single, process-wide Tensile solution library. The library was effectively chosen for whichever device was used first, so other devices could get the wrong solution library and wrong kernels for their architecture. That leads to wrong results or failures (see [#3413](#3413)). This PR addresses that by tying the solution library to each GPU architecture and caching it in the adapter array, so each device can use the correct library for its gfx. ### What was changed - **Library per architecture:** Replaced the single `m_library` with `m_libraryMap` keyed by architecture name (e.g. `gfx1030`, `gfx906`). Each gfx gets its own Tensile solution library, and multi-GPU systems with mixed architectures can each use the right one. - **Caching in the adapter array:** Each adapter (per device) now stores the library for that device's architecture (`adapter_s.library`), so we don't look up by arch on every call and the device is unambiguous when the adapter is used. - **Correct device in one initialization path:** In `initialize(adapter, deviceId)`, the architecture is now derived with `rocblas_internal_get_arch_name(deviceId)` instead of the current HIP device, so the right library is loaded and cached for that device. **Note:** The current device in a thread still must be set correctly elsewhere; this change fixes one call path (the one that receives an explicit device index). Other paths still rely on the current device (e.g. hip query), and further testing (e.g. pre_load of non-gfx files) may follow. - **Lazy loading and paths:** Refactored to remove duplication: base path logic moved to `determine_tensile_base_path()`, `getLazyLoadingArch()` takes an arch string, and lazy-load futures are per-arch (`ftr_lib_map` keyed by processor name). Device-property and lazy-init maps are populated for all devices up front to support heterogeneous systems. ### Motivation (original bullets, slightly expanded) - Revise the solution system so the library is **per gfx** (not global). - Cache that library **in the adapter array** after initialization. - Remove duplicate code and simplify the structure before and as part of this revision. --- *Description expanded to clarify the problem (why) and the approach (what). Caveats added per author feedback so the fix scope is accurate.—Tony*
1 parent f8153c6 commit 51c8fc4

File tree

3 files changed

+210
-170
lines changed

3 files changed

+210
-170
lines changed

projects/rocblas/library/src/include/utility.hpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -800,6 +800,9 @@ bool rocblas_internal_tensile_supports_ldc_ne_ldd(rocblas_handle handle);
800800
// We assume true if the value is between 942 to 1000
801801
ROCBLAS_INTERNAL_EXPORT bool rocblas_internal_tensile_supports_xdl_math_op(rocblas_math_mode mode);
802802

803+
// for internal use
804+
std::string rocblas_internal_get_arch_name(int device);
805+
803806
// for internal use during testing, fetch arch name
804807
ROCBLAS_INTERNAL_EXPORT std::string rocblas_internal_get_arch_name();
805808

projects/rocblas/library/src/rocblas_auxiliary.cpp

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -893,14 +893,19 @@ bool rocblas_internal_tensile_supports_xdl_math_op(rocblas_math_mode mode)
893893
return (deviceString.find("gfx942") != std::string::npos);
894894
}
895895

896+
std::string rocblas_internal_get_arch_name(int deviceId)
897+
{
898+
hipDeviceProp_t deviceProperties;
899+
PRINT_IF_HIP_ERROR(hipGetDeviceProperties(&deviceProperties, deviceId));
900+
return ArchName<hipDeviceProp_t>{}(deviceProperties); // strips : and later
901+
}
902+
896903
// exported. Get architecture name
897904
std::string rocblas_internal_get_arch_name()
898905
{
899906
int deviceId;
900907
PRINT_IF_HIP_ERROR(hipGetDevice(&deviceId));
901-
hipDeviceProp_t deviceProperties;
902-
PRINT_IF_HIP_ERROR(hipGetDeviceProperties(&deviceProperties, deviceId));
903-
return ArchName<hipDeviceProp_t>{}(deviceProperties);
908+
return rocblas_internal_get_arch_name(deviceId);
904909
}
905910

906911
// exported. Get xnack mode

0 commit comments

Comments
 (0)