-
Notifications
You must be signed in to change notification settings - Fork 168
Description
Documenting this because I just ran into this:
If we have a large linear transfer of a type that is described as a multi-dimensional tensor, e.g. (M, K), when using iron.fill, the lowered code ends up using the data layout transform dimensions:
ty = np.ndarray[(M, K,), dtype_in]
with rt.sequence(ty) as (inp):
rt.fill(fifo.prod(), B)produces:
aie.dma_bd(%arg1 : memref<32x2048xbf16>, 0, 65536, [<size = 1, stride = 0>, <size = 1, stride = 0>, <size = 32, stride = 2048>, <size = 2048, stride = 1>]) {burst_length = 0 : i32}
// (note that the two lowest data layout transformation dimensions are used.)When one of the dimensions is too large (e.g., K > 1024), this leads to this error:
error: "-":29:9: 'aie.dma_bd' op Size 0 exceeds the [0:1023] range.
The hardware can be used in a "layout transformation" and "linear" mode. (For relevant special handling of linear transfers, grep for isLinearTransfer in the codebase.) In linear mode (I'm just calling it that, not sure it has an official name), the data layout transform dimensions are ignored and we can thus do larger, longer transfers without running into above error.
Currently, there are two workarounds:
- Change the type to a single linear buffer, in the example above
ty = np.ndarray[(M*K,)], dtype_in]rather thanty = np.ndarray[(M, K,), dtype_in]. - Use a simple linear TensorAccessPattern with the
fill:B_tap = TensorAccessPattern(tensor_dims=(M,K,), offset=0, sizes=[1, 1, 1, num_batches * K], strides=[0, 0, 0, 1])
I think it would be a usability improvement if for simple linear transfers (without any TAPs given), the second workaround was implemented as the default lowering.
cc @hunhoffe