Skip to content

Commit 0ba3707

Browse files
authored
[XPU] Conditionally add -tritonintelgpu-optimize-reduction-locality to pipeline (#2553)
Add the `-tritonintelgpu-optimize-reduction-locality` pass to the pipeline if the `TRITON_INTEL_OPTIMIZE_REDUCTION_LOCALITY` is set to 1. As shown in #2266, this pass gives quite promising results, although there is still room for improvement. Conditionally enabling it will greatly help performance investigation. Signed-off-by: victor-eds <[email protected]>
1 parent 3a7e32b commit 0ba3707

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

third_party/intel/backend/compiler.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,8 @@ def make_ttgir(mod, metadata, opt, properties):
245245
passes.common.add_cse(pm)
246246
passes.ttgpuir.add_prefetch(pm)
247247
passes.ttgpuir.add_optimize_dot_operands(pm, True)
248+
if os.getenv("TRITON_INTEL_OPTIMIZE_REDUCTION_LOCALITY", "0") == 1:
249+
intel.passes.ttgpuir.add_optimize_reduction_locality(pm)
248250
intel.passes.ttgpuir.add_remove_layout_conversions(pm)
249251
intel.passes.ttgpuir.add_reduce_data_duplication(pm)
250252
passes.ttgpuir.add_reorder_instructions(pm)

third_party/intel/triton_xpu.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,8 @@ void init_triton_intel_passes_ttgpuir(py::module &&m) {
9999
gpu::intel::createTritonIntelGPUReduceDataDuplication);
100100
ADD_PASS_WRAPPER_0("add_materialize_block_pointer",
101101
gpu::intel::createTritonIntelGPUMaterializeBlockPointer);
102+
ADD_PASS_WRAPPER_0("add_optimize_reduction_locality",
103+
gpu::intel::createTritonIntelGPUOptimizeReductionLocality);
102104
}
103105

104106
void init_triton_intel(py::module &&m) {

0 commit comments

Comments
 (0)