Skip to content

Commit 6bf49d9

Browse files
authored
[hip] Re-enable peering for compatible devices. (#20481)
#20114 disabled all peering any time we only have a single physical device backing a logical device. This broke our current sharded models: #20409 Our current sharded models are not set up this way, they have multiple logical devices backed by multiple physical devices, so every hip_device has only a single physical device, and peering is never enabled at all (see #19555) for the original change to allow this. This change enables peering for all devices with all devices that are compatible, which should fix our sharded models. Howewver we should at some point update our sharding method to correctly use single logical devices backed by multiple physical devices. Signed-off-by: Andrew Woloszyn <[email protected]>
1 parent d06effa commit 6bf49d9

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

runtime/src/iree/hal/drivers/hip/hip_device.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -432,9 +432,7 @@ static iree_status_t iree_hal_hip_device_enable_peering(
432432
return IREE_HIP_RESULT_TO_STATUS(symbols, hip_error);
433433
}
434434
if (canAccessPeer != 1) {
435-
return iree_make_status(IREE_STATUS_PERMISSION_DENIED,
436-
"device %d is not able to access peer %d",
437-
device_id, j);
435+
continue;
438436
}
439437

440438
hip_error = symbols->hipDeviceEnablePeerAccess(j, 0);
@@ -504,7 +502,7 @@ iree_status_t iree_hal_hip_device_create(
504502
}
505503

506504
// If there are multiple devices, enable peering between them all.
507-
if (iree_status_is_ok(status) && device_count > 1) {
505+
if (iree_status_is_ok(status)) {
508506
status = iree_hal_hip_device_enable_peering(symbols, device_id);
509507
}
510508
}

0 commit comments

Comments
 (0)