Commit 51c8fc4
authored
[rocBLAS] solution library per gfx (#4781)
### Motivation
On systems with multiple GPUs of different architectures (e.g. gfx90a
and gfx942), rocBLAS previously used a single, process-wide Tensile
solution library. The library was effectively chosen for whichever
device was used first, so other devices could get the wrong solution
library and wrong kernels for their architecture. That leads to wrong
results or failures (see
[#3413](#3413)).
This PR addresses that by tying the solution library to each GPU
architecture and caching it in the adapter array, so each device can use
the correct library for its gfx.
### What was changed
- **Library per architecture:** Replaced the single `m_library` with
`m_libraryMap` keyed by architecture name (e.g. `gfx1030`, `gfx906`).
Each gfx gets its own Tensile solution library, and multi-GPU systems
with mixed architectures can each use the right one.
- **Caching in the adapter array:** Each adapter (per device) now stores
the library for that device's architecture (`adapter_s.library`), so we
don't look up by arch on every call and the device is unambiguous when
the adapter is used.
- **Correct device in one initialization path:** In `initialize(adapter,
deviceId)`, the architecture is now derived with
`rocblas_internal_get_arch_name(deviceId)` instead of the current HIP
device, so the right library is loaded and cached for that device.
**Note:** The current device in a thread still must be set correctly
elsewhere; this change fixes one call path (the one that receives an
explicit device index). Other paths still rely on the current device
(e.g. hip query), and further testing (e.g. pre_load of non-gfx files)
may follow.
- **Lazy loading and paths:** Refactored to remove duplication: base
path logic moved to `determine_tensile_base_path()`,
`getLazyLoadingArch()` takes an arch string, and lazy-load futures are
per-arch (`ftr_lib_map` keyed by processor name). Device-property and
lazy-init maps are populated for all devices up front to support
heterogeneous systems.
### Motivation (original bullets, slightly expanded)
- Revise the solution system so the library is **per gfx** (not global).
- Cache that library **in the adapter array** after initialization.
- Remove duplicate code and simplify the structure before and as part of
this revision.
---
*Description expanded to clarify the problem (why) and the approach
(what). Caveats added per author feedback so the fix scope is
accurate.—Tony*1 parent f8153c6 commit 51c8fc4
File tree
3 files changed
+210
-170
lines changed- projects/rocblas/library/src
- include
3 files changed
+210
-170
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
800 | 800 | | |
801 | 801 | | |
802 | 802 | | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
803 | 806 | | |
804 | 807 | | |
805 | 808 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
893 | 893 | | |
894 | 894 | | |
895 | 895 | | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
896 | 903 | | |
897 | 904 | | |
898 | 905 | | |
899 | 906 | | |
900 | 907 | | |
901 | | - | |
902 | | - | |
903 | | - | |
| 908 | + | |
904 | 909 | | |
905 | 910 | | |
906 | 911 | | |
| |||
0 commit comments