Commit 6112357
committed
TL/CUDA: guard team creation when device info is incomplete
ucc_tl_cuda_team_topo_create relies on per-rank GPU device information
(PCI IDs, NVLink matrices) that is populated only when every rank has
at least one visible GPU. Without this check the topo init code
dereferenced uninitialised or invalid device info, causing silent
failures or incorrect topology matrices.
Add an ucc_topo_has_device_info() guard before the topo_create call so
that TL/CUDA gracefully reports UCC_ERR_NOT_SUPPORTED and falls back to
another TL when device info is missing for any rank.1 parent a1e8344 commit 6112357
1 file changed
+8
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
341 | 341 | | |
342 | 342 | | |
343 | 343 | | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
344 | 352 | | |
345 | 353 | | |
346 | 354 | | |
| |||
0 commit comments