Commit eacda0d
Add
The topology matrix from iree-org#23573 needs an owner with a clear lifetime
contract. Today devices are passed to the HAL module as a flat array —
there is no object that retains the devices, owns the topology, and
guarantees the topology pointer remains valid for the duration of
execution. Every device holds a raw pointer to the topology it was
assigned, so if the topology is freed while devices are still alive,
those pointers dangle.
`iree_hal_device_group_t` is that owner. It takes already-created
devices, builds the immutable topology matrix from their capabilities,
pushes topology info into each device, and retains all of them. The
group's lifetime brackets the devices': whoever holds the devices
long-term (the HAL module, the CTS harness, a Python session) retains
the group, and the group retains the devices, so the topology pointer in
each device is guaranteed valid.
### Creation API
The builder pattern matches the topology builder it wraps:
stack-allocate, add devices, finalize. Finalize is a consuming operation
— it queries capabilities from all devices, computes edge descriptors,
calls driver-specific refinement for same-driver pairs, builds the
topology matrix, assigns topology info into each device via the new
vtable method, and produces the immutable group. The builder is zeroed
after finalize (whether it succeeds or fails) and cannot be reused.
For the common single-device case (7 of 9 callers),
`iree_hal_device_group_create_from_device` wraps the builder sequence
into a one-liner.
### `assign_topology_info` vtable method
Devices need to receive their topology info after the matrix is built —
the topology doesn't exist yet when the device is created, and the
device's index in the matrix isn't known until group creation. This is a
new vtable method on `iree_hal_device_t` that the group calls during
finalize. All existing driver implementations (local-sync, local-task,
CUDA, HIP, Vulkan, Metal, AMDGPU, null) store the info into their device
struct. The method is called exactly once per device, during group
creation.
### HAL module integration
`iree_hal_module_create` now takes a `iree_hal_device_group_t*` instead
of `(device_count, devices[])`. The module retains the group and
delegates all device access through `iree_hal_device_group_device_count`
/ `iree_hal_device_group_device_at`. This eliminates the flexible array
member from `iree_hal_module_t`, simplifies allocation (fixed-size
struct instead of variable-size), and makes the lifetime contract
explicit: the module holds the group, the group holds the devices and
topology.
All callers are updated — CLI tooling, the high-level runtime session,
TFLite bindings, Python bindings, PJRT, ConstEval, simple_embedding
samples, check_test, and the CTS test harness.
### Testing
A mock device (`hal/testing/mock_device`) provides controllable
capabilities for testing topology construction without requiring real
hardware. The device group tests exercise builder validation (empty
builds, duplicate devices, capacity limits), single-device and
multi-device group creation, topology correctness (self-edges,
cross-device edges with expected interop modes), the convenience
function, and lifetime ordering (group outlives devices). The CTS test
harness creates a device group in `SetUpTestSuite` so every CTS test
runs with topology info assigned.
### Where this is going
The device group is the scheduling domain for the causal execution
system. When the AMDGPU driver gets its frontier-integrated semaphores
and queue operations, the device group's topology matrix is what tells
the scheduler whether a semaphore can be waited on natively or needs
handle import, whether a buffer can be read directly or needs DMA
transfer, and what the relative cost is. The group also becomes the
natural attachment point for collective channel creation and
multi-device resource pools.
---------
Co-authored-by: Claude <noreply@anthropic.com>iree_hal_device_group_t to own device topology lifecycle (iree-org#23576)1 parent a2c8b6b commit eacda0d
File tree
33 files changed
+1646
-122
lines changed- compiler/src/iree/compiler/ConstEval
- integrations/pjrt/src/iree_pjrt/common
- runtime
- bindings
- python
- tflite
- src/iree
- hal
- cts
- drivers
- amdgpu
- cuda
- hip
- local_sync
- local_task
- metal
- null
- vulkan
- testing
- modules
- check
- hal
- runtime
- tooling
- samples
- external_transients
- simple_embedding
33 files changed
+1646
-122
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
130 | | - | |
| 129 | + | |
| 130 | + | |
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
| |||
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
328 | | - | |
| 328 | + | |
329 | 329 | | |
330 | 330 | | |
331 | 331 | | |
| |||
462 | 462 | | |
463 | 463 | | |
464 | 464 | | |
465 | | - | |
| 465 | + | |
| 466 | + | |
466 | 467 | | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
| 468 | + | |
| 469 | + | |
473 | 470 | | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
474 | 479 | | |
475 | 480 | | |
476 | 481 | | |
477 | 482 | | |
478 | 483 | | |
479 | 484 | | |
480 | | - | |
| 485 | + | |
481 | 486 | | |
482 | 487 | | |
483 | 488 | | |
484 | 489 | | |
485 | 490 | | |
486 | | - | |
487 | | - | |
| 491 | + | |
| 492 | + | |
488 | 493 | | |
489 | 494 | | |
490 | 495 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
47 | | - | |
| 46 | + | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1680 | 1680 | | |
1681 | 1681 | | |
1682 | 1682 | | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
1683 | 1686 | | |
1684 | | - | |
1685 | | - | |
1686 | | - | |
1687 | | - | |
1688 | | - | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
1689 | 1693 | | |
1690 | 1694 | | |
1691 | 1695 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1107 | 1107 | | |
1108 | 1108 | | |
1109 | 1109 | | |
1110 | | - | |
1111 | | - | |
1112 | | - | |
1113 | | - | |
1114 | | - | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
1115 | 1113 | | |
1116 | | - | |
1117 | | - | |
1118 | | - | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
1119 | 1117 | | |
1120 | | - | |
1121 | | - | |
1122 | 1118 | | |
1123 | 1119 | | |
1124 | | - | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
1125 | 1124 | | |
1126 | | - | |
1127 | | - | |
1128 | 1125 | | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
1129 | 1130 | | |
1130 | 1131 | | |
1131 | 1132 | | |
1132 | 1133 | | |
1133 | 1134 | | |
1134 | 1135 | | |
1135 | 1136 | | |
1136 | | - | |
1137 | | - | |
1138 | | - | |
1139 | | - | |
1140 | | - | |
1141 | | - | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
1142 | 1144 | | |
1143 | 1145 | | |
1144 | 1146 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
64 | 67 | | |
65 | | - | |
| 68 | + | |
66 | 69 | | |
67 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
68 | 73 | | |
69 | 74 | | |
70 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
| |||
118 | 120 | | |
119 | 121 | | |
120 | 122 | | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
| 44 | + | |
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
| |||
111 | 113 | | |
112 | 114 | | |
113 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
114 | 129 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| 92 | + | |
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
95 | 96 | | |
| 97 | + | |
96 | 98 | | |
97 | 99 | | |
98 | 100 | | |
| |||
129 | 131 | | |
130 | 132 | | |
131 | 133 | | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
132 | 141 | | |
133 | 142 | | |
134 | 143 | | |
| |||
142 | 151 | | |
143 | 152 | | |
144 | 153 | | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
145 | 160 | | |
146 | 161 | | |
147 | 162 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
99 | 107 | | |
100 | 108 | | |
101 | 109 | | |
| |||
0 commit comments