[FXML-5890] Order tiling worklist #532

cferry-AMD · 2025-04-25T09:21:21Z

This PR gives the possibility to change the order in which tiling happens along with fusion.

….tiling_order

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

mlir/test/Dialect/Linalg/tile-sort.mlir

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

cferry-AMD · 2025-04-30T08:53:08Z

I tried to use the result op order within the slice as a reference instead of counting remarks, but it turns out it does not match:

within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: remark: Fused op in position 0
  %0 = linalg.ceil ins(%arg: tensor<256xf32>) outs(%empty: tensor<256xf32>) -> tensor<256xf32>
       ^
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: note: see current operation: %1 = linalg.ceil ins(%arg0 : tensor<256xf32>) outs(%0 : tensor<256xf32>) -> tensor<256xf32>
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :18:8: remark: Fused op in position 1
  %1 = linalg.negf ins(%0 : tensor<256xf32>) outs(%empty1: tensor<256xf32>) -> tensor<256xf32>
       ^
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :18:8: note: see current operation: %3 = linalg.negf ins(%1 : tensor<256xf32>) outs(%2 : tensor<256xf32>) -> tensor<256xf32>
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: remark: Fused op in position 2
  %0 = linalg.ceil ins(%arg: tensor<256xf32>) outs(%empty: tensor<256xf32>) -> tensor<256xf32>
       ^
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: note: see current operation: %1 = linalg.ceil ins(%arg0 : tensor<256xf32>) outs(%0 : tensor<256xf32>) -> tensor<256xf32>

and the output is:

    %4 = scf.for %arg1 = %c0 to %c256 step %c32 iter_args(%arg2 = %3) -> (tensor<256xf32>) {
      [...]
      %6 = linalg.ceil ins(%extracted_slice : tensor<32xf32>) outs(%extracted_slice_1 : tensor<32xf32>) -> tensor<32xf32>
      [...]
      %8 = linalg.ceil ins(%extracted_slice_2 : tensor<32xf32>) outs(%extracted_slice_4 : tensor<32xf32>) -> tensor<32xf32>
      [...]
      %10 = linalg.negf ins(%8 : tensor<32xf32>) outs(%extracted_slice_6 : tensor<32xf32>) -> tensor<32xf32>
      [...]
      %12 = linalg.powf {tile} ins(%6, %10 : tensor<32xf32>, tensor<32xf32>) outs(%extracted_slice_8 : tensor<32xf32>) -> tensor<32xf32>
      %inserted_slice = tensor.insert_slice %12 into %arg2[%arg1] [32] [1] : tensor<32xf32> into tensor<256xf32>
      scf.yield %inserted_slice : tensor<256xf32>
    }

so ceil, ceil, negf, powf instead of ceil, negf, ceil, powf... so I have to rely on the remark for now. I don't really like abusing the remarks for that...

mgehre-amd · 2025-04-30T09:21:17Z

I tried to use the result op order within the slice as a reference instead of counting remarks, but it turns out it does not match:

within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: remark: Fused op in position 0
  %0 = linalg.ceil ins(%arg: tensor<256xf32>) outs(%empty: tensor<256xf32>) -> tensor<256xf32>
       ^
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: note: see current operation: %1 = linalg.ceil ins(%arg0 : tensor<256xf32>) outs(%0 : tensor<256xf32>) -> tensor<256xf32>
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :18:8: remark: Fused op in position 1
  %1 = linalg.negf ins(%0 : tensor<256xf32>) outs(%empty1: tensor<256xf32>) -> tensor<256xf32>
       ^
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :18:8: note: see current operation: %3 = linalg.negf ins(%1 : tensor<256xf32>) outs(%2 : tensor<256xf32>) -> tensor<256xf32>
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: remark: Fused op in position 2
  %0 = linalg.ceil ins(%arg: tensor<256xf32>) outs(%empty: tensor<256xf32>) -> tensor<256xf32>
       ^
within split at mlir/test/Dialect/Linalg/tile-sort.mlir:1 offset :13:8: note: see current operation: %1 = linalg.ceil ins(%arg0 : tensor<256xf32>) outs(%0 : tensor<256xf32>) -> tensor<256xf32>

and the output is:

    %4 = scf.for %arg1 = %c0 to %c256 step %c32 iter_args(%arg2 = %3) -> (tensor<256xf32>) {
      [...]
      %6 = linalg.ceil ins(%extracted_slice : tensor<32xf32>) outs(%extracted_slice_1 : tensor<32xf32>) -> tensor<32xf32>
      [...]
      %8 = linalg.ceil ins(%extracted_slice_2 : tensor<32xf32>) outs(%extracted_slice_4 : tensor<32xf32>) -> tensor<32xf32>
      [...]
      %10 = linalg.negf ins(%8 : tensor<32xf32>) outs(%extracted_slice_6 : tensor<32xf32>) -> tensor<32xf32>
      [...]
      %12 = linalg.powf {tile} ins(%6, %10 : tensor<32xf32>, tensor<32xf32>) outs(%extracted_slice_8 : tensor<32xf32>) -> tensor<32xf32>
      %inserted_slice = tensor.insert_slice %12 into %arg2[%arg1] [32] [1] : tensor<32xf32> into tensor<256xf32>
      scf.yield %inserted_slice : tensor<256xf32>
    }

so ceil, ceil, negf, powf instead of ceil, negf, ceil, powf... so I have to rely on the remark for now. I don't really like abusing the remarks for that...

As the output is only for testing, I suggest to use a pass option + output on llvm::errs().

cferry-AMD · 2025-04-30T09:42:07Z

Instead of using the error output, I went for an attribute. I guess it's a bit neater than resorting to a remark that then gets caught back, and is likely not getting used in production contexts.

I'm all for a pass option, unfortunately TileUsingInterface isn't a pass but rather just a standalone function passes are expected to call. How does the use of an extra attribute debug_worklist on the test op for tiling in the transform dialect, as is currently done, look like to you? I'm no big fan of that, and open to anything less invasive...

mgehre-amd · 2025-04-30T09:58:13Z

Instead of using the error output, I went for an attribute. I guess it's a bit neater than resorting to a remark that then gets caught back, and is likely not getting used in production contexts.

I'm all for a pass option, unfortunately TileUsingInterface isn't a pass but rather just a standalone function passes are expected to call. How does the use of an extra attribute debug_worklist on the test op for tiling in the transform dialect, as is currently done, look like to you? I'm no big fan of that, and open to anything less invasive...

There are a few tests that use -debug-only=name + llvm::dbgs() + FileCheck. Maybe we should follow that?

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

mlir/test/Dialect/Linalg/tile-sort.mlir

….tiling_order

mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.td

mlir/test/Dialect/Linalg/tile-sort.mlir

mgehre-amd · 2025-05-05T11:34:23Z

Can you add a test case that uses setWorklistInsertFn to show that we can affect the order?

mlir/test/Dialect/Linalg/tile-sort.mlir

Co-authored-by: Matthias Gehre <[email protected]>

cferry-AMD added 5 commits April 25, 2025 03:15

WIP: worklist traversal

1c64247

Merge remote-tracking branch 'origin/feature/fused-ops' into corentin…

7dcfd68

….tiling_order

Add a test

d20bb09

Make insert_worklist function generic

998575f

Change wording: tiling -> fusion

5bf0b8b

cferry-AMD commented Apr 29, 2025

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Outdated Show resolved Hide resolved

cferry-AMD requested review from josel-amd and mgehre-amd April 29, 2025 13:10

cferry-AMD marked this pull request as ready for review April 29, 2025 13:10

cferry-AMD commented Apr 29, 2025

View reviewed changes

mlir/test/Dialect/Linalg/tile-sort.mlir Outdated Show resolved Hide resolved

cferry-AMD commented Apr 29, 2025

View reviewed changes

mlir/test/Dialect/Linalg/tile-sort.mlir Outdated Show resolved Hide resolved

josel-amd reviewed Apr 30, 2025

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Outdated Show resolved Hide resolved

Remove remark, replace by attribute -> more compact test

a6fd34e

Use llvm::dbgs() + FileCheck

9af785f

cferry-AMD commented Apr 30, 2025

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Outdated Show resolved Hide resolved

cferry-AMD commented Apr 30, 2025

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Outdated Show resolved Hide resolved

cferry-AMD commented Apr 30, 2025

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Outdated Show resolved Hide resolved

cferry-AMD commented Apr 30, 2025

View reviewed changes

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

5402418

josel-amd reviewed May 2, 2025

View reviewed changes

mlir/test/Dialect/Linalg/tile-sort.mlir Outdated Show resolved Hide resolved

josel-amd approved these changes May 2, 2025

View reviewed changes

cferry-AMD added 2 commits May 5, 2025 05:03

Merge remote-tracking branch 'origin/feature/fused-ops' into corentin…

9a5f9c6

….tiling_order

Remove reverse worklist test

f1b7413

cferry-AMD commented May 5, 2025

View reviewed changes

mlir/test/lib/Interfaces/TilingInterface/TestTilingInterfaceTransformOps.td Outdated Show resolved Hide resolved

Apply suggestions from code review

37a8962

mgehre-amd reviewed May 5, 2025

View reviewed changes

mlir/test/Dialect/Linalg/tile-sort.mlir Outdated Show resolved Hide resolved

cferry-AMD added 2 commits May 5, 2025 06:52

Add op with tiling_priority

efe04e4

Add test swapping powf operands

01b0345

cferry-AMD requested a review from mgehre-amd May 5, 2025 12:55

Description did not save

fd7c6b9

mgehre-amd reviewed May 5, 2025

View reviewed changes

mlir/test/Dialect/Linalg/tile-sort.mlir Outdated Show resolved Hide resolved

revise test comments

e064275

mgehre-amd reviewed May 5, 2025

View reviewed changes

mlir/test/Dialect/Linalg/tile-sort.mlir Outdated Show resolved Hide resolved

mgehre-amd approved these changes May 5, 2025

View reviewed changes

Typo in test

0ccb35d

Co-authored-by: Matthias Gehre <[email protected]>

cferry-AMD enabled auto-merge (squash) May 6, 2025 05:55

cferry-AMD merged commit acc5603 into feature/fused-ops May 6, 2025
4 checks passed

cferry-AMD deleted the corentin.tiling_order branch May 6, 2025 06:05

cferry-AMD mentioned this pull request May 6, 2025

tile_sort.mlir: Use --mlir-disable-threading to make test deterministic #538

Merged

[FXML-5890] Order tiling worklist #532

[FXML-5890] Order tiling worklist #532

Uh oh!

Conversation

cferry-AMD commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cferry-AMD commented Apr 30, 2025

Uh oh!

mgehre-amd commented Apr 30, 2025

Uh oh!

cferry-AMD commented Apr 30, 2025

Uh oh!

mgehre-amd commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgehre-amd commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cferry-AMD commented Apr 25, 2025 •

edited

Loading