Skip to content

Commit cb1ed56

Browse files
Move TritonGPURemoveLayoutConversions pass after MaterializeBlockPointer pass (#4085)
This PR moves `TritonGPURemoveLayoutConversions` pass from before `MaterializeBlockPointer` pass to after. Function `isExpensiveLoadOrStore`, used by `TritonGPURemoveLayoutConversions` pass, depends on `BlockIOAttr` which is added by `MaterializeBlockPointer`. If there is no `TritonGPURemoveLayoutConversions` pass between `MaterializeBlockPointer` pass and `TritonGPUPipeline` pass, then the `PrefetchOp` created by `TritonGPUPipeline` would have blocked layout encoding, which is not supported by `PrefetchOp` lowering, i.e., no prefetch intrinsic is added at the end. With this PR, the GEMM tensor of pointer geomean performance improves from 17 to 33 TFlops. B | M | N | K | Ratio -- | -- | -- | -- | -- 1 | 1024 | 1024 | 1024 | 2.175573 1 | 2048 | 2048 | 2048 | 2.311426 1 | 4096 | 4096 | 4096 | 2.204696 1 | 8192 | 8192 | 8192 | 1.944378 1 | 1 | 13824 | 5120 | 1.519084 1 | 4 | 12288 | 4096 | 1.328391 1 | 512 | 8192 | 8192 | 2.258902 1 | 512 | 8192 | 32768 | 2.488874 1 | 512 | 32768 | 8192 | 2.496153 1 | 1024 | 8192 | 16384 | 2.445463 1 | 1024 | 8192 | 28672 | 2.47155 1 | 3072 | 3072 | 4096 | 2.130415 1 | 4096 | 8192 | 16384 | 2.347657 1 | 8192 | 1024 | 16384 | 2.309172 1 | 8192 | 4096 | 16384 | 2.008043 1 | 16384 | 1024 | 8192 | 2.293465 1 | 16384 | 4096 | 8192 | 2.056846 1 | 16384 | 8192 | 1024 | 1.719477 1 | 16384 | 8192 | 4096 | 1.950505 4 | 32768 | 128 | 4096 | 2.112922 4 | 32768 | 4096 | 128 | 1.308416 32 | 4096 | 128 | 4096 | 2.495404 4096 | 8 | 128 | 16384 | 0.950184 4096 | 8 | 16384 | 128 | 1.08093 | | | | geomean | 1.953952 Signed-off-by: Whitney Tsang <[email protected]>
1 parent 96f96e9 commit cb1ed56

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

third_party/intel/backend/compiler.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,8 +308,8 @@ def make_ttgir(mod, metadata, opt, properties):
308308
intel.passes.ttgpuir.add_remove_layout_conversions(pm)
309309

310310
intel.passes.ttgpuir.add_accelerate_matmul(pm)
311-
intel.passes.ttgpuir.add_remove_layout_conversions(pm)
312311
intel.passes.ttgpuir.add_materialize_block_pointer(pm)
312+
intel.passes.ttgpuir.add_remove_layout_conversions(pm)
313313
intel.passes.ttgpuir.add_pipeline(pm, opt.num_stages, XPUBackend.get_split_barrier_scope(opt))
314314

315315
passes.ttgpuir.add_fuse_nested_loops(pm)

0 commit comments

Comments
 (0)