Skip to content

Make isExpensiveLoadOrStore consider blocked pointers load and stores #2581

@etiotto

Description

@etiotto

The isExpensiveLoadOrStore function (third_party/intel/lib/TritonIntelGPUTransforms/Utility.cpp) fails to consider block pointers and consequently always returns false for loads (and stores) operations that use a block pointer.
In turn, this causes the RemoveLayoutConversion pass to never consider loads using block pointers as anchor operations.

This PR changes isExpensiveLoadOrStore so that block pointer loads can be properly recognized. The RemoveLayourConversion pass is then able to consider those loads as anchor operations and preserve their layout.

Because RemoveLayoutConversion is invoked at several points in the optimization pipeline, the change in third_party/intel/lib/TritonIntelGPUTransforms/Utility.cpp alone causes performance degradation in a couple of GEMM like benchmarks, specifically when operand A of tl.dot is transposed and when the input of tl.dot is first fed into an exponential.

These 2 performance degradation have ben fixed by an enhancing the MaterializeBlockPointer and MatmulLoopPipeline optimizations, so that they can retrieve the dot layout of block pointer loads transitively from its users (in those benchmarks the blocked layout of block ptrs loads is transitively converted to a dot layout).

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions