-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Problem
FPU binary ops (add_tiles, mul_tiles, sub_tiles) accumulate into their output DST register: result = old_DST_value + computed_value. If two FPU binary ops share the same DST output index, the second reads the first's residual and produces a corrupted result.
Currently TTLAssignDST works around this by extending FPU binary result intervals so the linear scan allocator assigns distinct DST registers. This wastes DST capacity.
Proposed Fix
Pass acc_to_dest=false to add_tiles_init/sub_tiles_init/mul_tiles_init in tt-mlir's TTKernel dialect (there is an existing FIXME in TTKernelOps.td). With explicit overwrite mode, DST reuse between FPU binary ops would be safe and the interval extension workaround in TTLAssignDST could be removed.
References
- Workaround code:
lib/Dialect/TTL/Transforms/TTLAssignDST.cpp(see "Prevent DST register reuse between FPU binary ops" section) - tt-mlir FIXME:
TTKernelOps.td(acc_to_destparameter)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels