-
Notifications
You must be signed in to change notification settings - Fork 1
[MLIR][AFFINE] Handling possible side-effects during fusion #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: side-effect-dev
Are you sure you want to change the base?
[MLIR][AFFINE] Handling possible side-effects during fusion #13
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the LoopNestStateCollector structure to hold memoryEffectOps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified the Node struct of MDG to List memoryEffectOps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the op is not memoryEffectFree, LoopStateCollector should add this op to the list of MemoryEffectOps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the above filter enough? Should it be else if (!mlir::isMemoryEffectFree(op) && !isa<mlir::memref::AllocOp, mlir::memref::DeallocOp>(op)) , instead of the above.
if only check !mlir::ismemoryEffectFree(op), it conservatively includes alloc and dealloc ops as well. Because of this inclusion code like below fail to fuse successfully.
module {
func.func @testing2(%arg0: memref<10xf32>, %arg2 : memref <10xf32>) {
%cst = arith.constant 1.000000e+00 : f32
%cst_0 = arith.constant 2.000000e+00 : f32
%1 = memref.alloc() : memref<12x12xf32>
affine.for %arg1 = 0 to 10 {
%0 = affine.load %arg0[%arg1] : memref<10xf32>
}
affine.for %arg1 = 0 to 10 {
%0 = affine.load %arg0[%arg1] : memref<10xf32>
memref.dealloc %1 : memref<12x12xf32>
}
return
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memref.dealloc and memref.alloc can be better handled in the MDG construction by tracking the memref they access similarly to load and store ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bondhugula , current commit conservatively treats alloc and deallocate as having unknown side-effects (as stated in the above comments)
Will fix the above issue by modifying MDG init to handle alloc/dealloc better and update pull request with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This vector holds the for-nodes with potential memory effects in the order of traversal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
affine.for nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a memory reference is accessed for the first time, all the for-nodes with possible memory effects that precede it are added to its access list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the current for-node exhibits memory effects, it is appended to the sequential list of 'for-nodes' that have memory effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An edge is introduced between the srcNode and dstNode if either has a memory effect or a store op.
The current algorithm tracks memoryEffectOps which are within a loop body. Top-level ops with memoryEffect are handled separately for fusion validity checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add code comments as suitable instead of github comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small test demonstrating the handling of side-effects when fusing loops with fusion-maximal
|
Issues addressed here:
|
|
Hi @bondhugula , a gentle reminder for the code review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing code comment here on what memory effect ops are. Load and store also have memory effects. Add doc comment for this variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments have to end with a full stop. LLVM style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::is_contained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need of parens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are combining ops with all memory effects but you need those with write effects below. See below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you respond to this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we are considering side-effects as unknown entities that could be potentially read or write ops. However, if we consider side-effects with only write effect. we might lose out on potential WAR dependency in the MDGs.
See comment response #13 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name will be inaccurate now. dstMayHaveStore.
Moreover, you could have only kept write memory effects here.
2d756d0 to
2f33c01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering out alloc and dealloc ops as well. Handling of memref:: alloc/dealloc ops needs to be addressed in a separate as a separate issue to make this commit less complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example of an existing issue in handling alloc. The following loop does not merge if %alloc is memref.alloc.
func.func @should_not_fuse_loops_with_side_effects() {
%alloc = memref.alloc() : memref<10xf32>
affine.for %arg0 = 0 to 10 {
%0 = affine.load %alloc[%arg0] : memref<10xf32>
}
affine.for %arg0 = 0 to 10 {
%0 = affine.load %alloc[%arg0] : memref<10xf32>
}
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check on the destination cannot be just for writes. Otherwise, it will miss out on handling cases where srcNode has load op and dstNode has both load and a memoryEffectOp(which has unknown side-effects).
The following sibling loops below will merge, whereas they shouldn't, if we remove the !dstNode->memEffects.empty() check:
func.func @should_not_fuse_loops_with_side_effects(%arg0:memref<10xf32>) {
affine.for %i = 0 to 10 {
%0 = affine.load %arg0[%i] : memref<10xf32>
}
affine.for %j = 0 to 10 {
%0 = affine.load %arg0[%j] : memref<10xf32>
"foo"():()->()
}
return
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also changed variable names to a better representation dstMayHaveMemEffOrStore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bondhugula , I have addressed the previous comments and added some new ones.
|
Hi @bondhugula, a gentle reminder for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't hardcode alloc, dealloc; instead check for Alloc and Free effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected in the latest commit.
Check if the forOps have any possible side-effects and handle them during MDG formation. This commit checks if forOp contains a memoryEffect operation, which is a possible side-effects and thus, fusion is not possible. In this commit we modifify the dependency graph initialization to add dependency edges between nodes with side-effecting ops and any other node that may be affected by it (basically node that is accessing any memref). Also, modified test loop-fusion-4.mlir to add a test case.
2f33c01 to
c74638d
Compare
| else if (isa<AffineReadOpInterface>(op)) | ||
| loadOpInsts.push_back(op); | ||
| else if (isa<AffineWriteOpInterface>(op)) | ||
| storeOpInsts.push_back(op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering out ops with alloc and free effects, reason in #13 (comment)
|
Hi @bondhugula , gentle reminder for review. |
Check if the forOps have any possible side-effects and handle them during MDG formation. This commit checks if forOp contains a memoryEffect operation, which is a possible side-effects and thus, fusion is not possible. In this commit we modifify the dependency graph initialization to add dependency edges between nodes with side-effecting ops and any other node that may be affected by it (basically node that is accessing any memref).
Also, modified test loop-fusion-4.mlir to add a test case.
Issue: maximal loop fusion was not handling side-effecting ops like function calls in the loops.
Original Pipeline: Loops with possible side effects are classified as sequential loops. If the sibling was sequential, it led to a nesting of the loops instead of fusion. In standard fusion (non-maximal), the strategy would have been rejected based on performance heuristics, and another iteration of bounds calculation would start. However, in maximal-fusion, this strategy would not be rejected and hence loops with possible side-effects would be fused.
Solution: MDG initialization needed modification to handle nodes with memoryEffect ops. Edges are now added between nodes with side-effects and nodes that might get impacted.
INPUT:
COMMAND: mlir-opt -pass-pipeline='builtin.module(func.func(affine-loop-fusion{fusion-maximal}))'
PREVIOUS OUTPUT:
NEW OUTPUT: