Skip to content

Conversation

@SwapnilGhanshyala
Copy link

Check if the forOps have any possible side-effects and handle them during MDG formation. This commit checks if forOp contains a memoryEffect operation, which is a possible side-effects and thus, fusion is not possible. In this commit we modifify the dependency graph initialization to add dependency edges between nodes with side-effecting ops and any other node that may be affected by it (basically node that is accessing any memref).

Also, modified test loop-fusion-4.mlir to add a test case.

Issue: maximal loop fusion was not handling side-effecting ops like function calls in the loops.
Original Pipeline: Loops with possible side effects are classified as sequential loops. If the sibling was sequential, it led to a nesting of the loops instead of fusion. In standard fusion (non-maximal), the strategy would have been rejected based on performance heuristics, and another iteration of bounds calculation would start. However, in maximal-fusion, this strategy would not be rejected and hence loops with possible side-effects would be fused.
Solution: MDG initialization needed modification to handle nodes with memoryEffect ops. Edges are now added between nodes with side-effects and nodes that might get impacted.

INPUT:

  func.func @testing2(%arg0: memref<10xf32>) {
    affine.for %i = 0 to 10 {
      %0 = affine.load %arg0[%i] : memref<10xf32>
      llvm.call @g():()->()
    }
    affine.for %j = 0 to 10 {
      %0 = affine.load %arg0[%j] : memref<10xf32>
      llvm.call @f():()->()
    }
    return
  }
  llvm.func external @f()->()
  llvm.func external @g()->()
}

COMMAND: mlir-opt -pass-pipeline='builtin.module(func.func(affine-loop-fusion{fusion-maximal}))'

PREVIOUS OUTPUT:

  func.func @testing2(%arg0: memref<10xf32>) {
    affine.for %arg1 = 0 to 10 {
      affine.for %arg2 = 0 to 10 {
        %1 = affine.load %arg0[%arg2] : memref<10xf32>
        llvm.call @g() : () -> ()
      }
      %0 = affine.load %arg0[%arg1] : memref<10xf32>
      llvm.call @f() : () -> ()
    }
    return
  }
  llvm.func @f()
  llvm.func @g()
}

NEW OUTPUT:

  func.func @testing2(%arg0: memref<10xf32>) {
    affine.for %arg1 = 0 to 10 {
      %0 = affine.load %arg0[%arg1] : memref<10xf32>
      llvm.call @g() : () -> ()
    }
    affine.for %arg1 = 0 to 10 {
      %0 = affine.load %arg0[%arg1] : memref<10xf32>
      llvm.call @f() : () -> ()
    }
    return
  }
  llvm.func @f()
  llvm.func @g()
}

@SwapnilGhanshyala SwapnilGhanshyala self-assigned this Jan 23, 2024
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the LoopNestStateCollector structure to hold memoryEffectOps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the Node struct of MDG to List memoryEffectOps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the op is not memoryEffectFree, LoopStateCollector should add this op to the list of MemoryEffectOps

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the above filter enough? Should it be else if (!mlir::isMemoryEffectFree(op) && !isa<mlir::memref::AllocOp, mlir::memref::DeallocOp>(op)) , instead of the above.

if only check !mlir::ismemoryEffectFree(op), it conservatively includes alloc and dealloc ops as well. Because of this inclusion code like below fail to fuse successfully.

module {
  func.func @testing2(%arg0: memref<10xf32>, %arg2 : memref <10xf32>) {
    %cst = arith.constant 1.000000e+00 : f32
    %cst_0 = arith.constant 2.000000e+00 : f32
    %1 = memref.alloc() : memref<12x12xf32>
    affine.for %arg1 = 0 to 10 { 
      %0 = affine.load %arg0[%arg1] : memref<10xf32>
    }
    affine.for %arg1 = 0 to 10 {
      %0 = affine.load %arg0[%arg1] : memref<10xf32>
      memref.dealloc %1  : memref<12x12xf32>
    }
    return
  }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memref.dealloc and memref.alloc can be better handled in the MDG construction by tracking the memref they access similarly to load and store ops.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bondhugula , current commit conservatively treats alloc and deallocate as having unknown side-effects (as stated in the above comments)
Will fix the above issue by modifying MDG init to handle alloc/dealloc better and update pull request with it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This vector holds the for-nodes with potential memory effects in the order of traversal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

affine.for nodes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a memory reference is accessed for the first time, all the for-nodes with possible memory effects that precede it are added to its access list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the current for-node exhibits memory effects, it is appended to the sequential list of 'for-nodes' that have memory effects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An edge is introduced between the srcNode and dstNode if either has a memory effect or a store op.
The current algorithm tracks memoryEffectOps which are within a loop body. Top-level ops with memoryEffect are handled separately for fusion validity checks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add code comments as suitable instead of github comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small test demonstrating the handling of side-effects when fusing loops with fusion-maximal

@SwapnilGhanshyala
Copy link
Author

Issues addressed here:

  1. Added code comments

@SwapnilGhanshyala
Copy link
Author

Hi @bondhugula , a gentle reminder for the code review.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing code comment here on what memory effect ops are. Load and store also have memory effects. Add doc comment for this variable.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments have to end with a full stop. LLVM style.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::is_contained

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need of parens.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are combining ops with all memory effects but you need those with write effects below. See below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you respond to this comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we are considering side-effects as unknown entities that could be potentially read or write ops. However, if we consider side-effects with only write effect. we might lose out on potential WAR dependency in the MDGs.

See comment response #13 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name will be inaccurate now. dstMayHaveStore.

Moreover, you could have only kept write memory effects here.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering out alloc and dealloc ops as well. Handling of memref:: alloc/dealloc ops needs to be addressed in a separate as a separate issue to make this commit less complicated.

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of an existing issue in handling alloc. The following loop does not merge if %alloc is memref.alloc.

  func.func @should_not_fuse_loops_with_side_effects() {
    %alloc = memref.alloc() : memref<10xf32>
    affine.for %arg0 = 0 to 10 {
      %0 = affine.load %alloc[%arg0] : memref<10xf32>
    }
    affine.for %arg0 = 0 to 10 {
      %0 = affine.load %alloc[%arg0] : memref<10xf32>
    }
    return
  }

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check on the destination cannot be just for writes. Otherwise, it will miss out on handling cases where srcNode has load op and dstNode has both load and a memoryEffectOp(which has unknown side-effects).

The following sibling loops below will merge, whereas they shouldn't, if we remove the !dstNode->memEffects.empty() check:

func.func @should_not_fuse_loops_with_side_effects(%arg0:memref<10xf32>) {
    affine.for %i = 0 to 10 {
      %0 = affine.load %arg0[%i] : memref<10xf32>
    }
    affine.for %j = 0 to 10 {
      %0 = affine.load %arg0[%j] : memref<10xf32>
     "foo"():()->()
    }
    return
  }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also changed variable names to a better representation dstMayHaveMemEffOrStore

Copy link
Author

@SwapnilGhanshyala SwapnilGhanshyala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bondhugula , I have addressed the previous comments and added some new ones.

@SwapnilGhanshyala
Copy link
Author

Hi @bondhugula, a gentle reminder for review.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't hardcode alloc, dealloc; instead check for Alloc and Free effects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected in the latest commit.

Check if the forOps have any possible side-effects and handle them during MDG formation. This commit checks if forOp contains a memoryEffect operation, which is a possible side-effects and thus, fusion is not possible. In this commit we modifify the dependency graph initialization to add dependency edges between nodes with side-effecting ops and any other node that may be affected by it (basically node that is accessing any memref).

Also, modified test loop-fusion-4.mlir to add a test case.
else if (isa<AffineReadOpInterface>(op))
loadOpInsts.push_back(op);
else if (isa<AffineWriteOpInterface>(op))
storeOpInsts.push_back(op);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering out ops with alloc and free effects, reason in #13 (comment)

@SwapnilGhanshyala
Copy link
Author

Hi @bondhugula , gentle reminder for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants