Commit ee78046

authored

[XPU][OptRed] Revamp -tritonintelgpu-optimize-reduction-locality (#2800)

Original implementation had two critical issues: - *Functional*: It did not preserve register order, so it was computing a different reduction. - *Performance*: When converting back to the original tensor type, it did: `reshape(convert_layout(res))`. That means the `reshape` operation served as an anchor and the suboptimal slice layout was propagated. This was fixed as follows: - Keep register order. - Do `convert_layout(reshape(res))` when converting back to the original type, thus propagating the more optimal layout. See implementation for further details. Signed-off-by: victor-eds <[email protected]>

1 parent da569f1 commit ee78046Copy full SHA for ee78046

3 files changed

+444

-438

lines changed

test/TritonIntelGPU
- optimize-reduction.mlir
third_party/intel
- include/Dialect/TritonIntelGPU/Transforms
  - Passes.td
- lib/TritonIntelGPUTransforms
  - OptimizeReductionLocality.cpp

3 files changed

+444

-438

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit ee78046

3 files changed

3 files changed

File tree

3 files changed

3 files changed

0 commit comments