Skip to content

Support acc_to_dest=false for FPU binary ops to avoid wasting DST capacity #343

@brnorris03

Description

@brnorris03

Problem

FPU binary ops (add_tiles, mul_tiles, sub_tiles) accumulate into their output DST register: result = old_DST_value + computed_value. If two FPU binary ops share the same DST output index, the second reads the first's residual and produces a corrupted result.

Currently TTLAssignDST works around this by extending FPU binary result intervals so the linear scan allocator assigns distinct DST registers. This wastes DST capacity.

Proposed Fix

Pass acc_to_dest=false to add_tiles_init/sub_tiles_init/mul_tiles_init in tt-mlir's TTKernel dialect (there is an existing FIXME in TTKernelOps.td). With explicit overwrite mode, DST reuse between FPU binary ops would be safe and the interval extension workaround in TTLAssignDST could be removed.

References

  • Workaround code: lib/Dialect/TTL/Transforms/TTLAssignDST.cpp (see "Prevent DST register reuse between FPU binary ops" section)
  • tt-mlir FIXME: TTKernelOps.td (acc_to_dest parameter)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions