-
Notifications
You must be signed in to change notification settings - Fork 831
[CodeGen] Improve scf.for bufferization and make hoisting allocation work #23318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CodeGen] Improve scf.for bufferization and make hoisting allocation work #23318
Conversation
|
I don't expect this impacting the performance. It could enable some failing tests though. |
…work The revision enables `allowReturnAllocsFromLoops` in bufferization, which matches the upstream behavior. Otherwise, it can trigger an error like: ``` error: Yield operand #1 is not equivalent to the corresponding iter bbArg ``` In this context, a `memref.alloca` can be created inside the loop and the dynamic size can be queried from iter_arg. The ValueBoundsConstraintSet check does not support the analysis, because the runtime dimension values can still differ. E.g., ```mlir %result = scf.for ... iter_args(%iter = %init) -> (memref<?xf32>) { %new_buf = memref.alloca(%some_other_size) : memref<?xf32> scf.yield %new_buf : memref<?xf32> // same type, different runtime size } ``` It is weird, but it is allowed. Thus, we need to handle such case in `hoistOneStaticallyBoundAllocation`. The revision verifies the dimension is preserved, via: 1. The yield operand (after walking through cast/subview) is the iter_arg. 2. The yield operand traces to an alloca whose shape matches the iter_arg and whose dynamic size at `dimIndex` is `memref.dim` of the iter_arg. 3. The yield operand is a scf.for result whose init arg is the iter_arg and the inner loop also preserves the dimension (recursive). Signed-off-by: hanhanW <[email protected]>
739f413 to
02e5678
Compare
|
@hanhanW do you know what's up with the linux_x64_bazel tests' compilation failure? |
It is just missing a dep in BUILD.bazel. |
MaheshRavishankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think this is going the opposite of what the end state should be. I am not sure we want to support cases where we end up with local allocas. It almost always indicates something off in my view.
I thought we allow small local allocas as long as they are statically bounded, which is already happening for years? |
The revision enables
allowReturnAllocsFromLoopsin bufferization, which matches the upstream behavior. Otherwise, it can trigger an error like:In this context, a
memref.allocacan be created inside the loop and the dynamic size can be queried from iter_arg. The ValueBoundsConstraintSet check does not support the analysis, because the runtime dimension values can still differ. E.g.,It is weird, but it is allowed. Thus, we need to handle such case in
hoistOneStaticallyBoundAllocation.The revision verifies the dimension is preserved, via:
dimIndexismemref.dimof the iter_arg.Fixes #16956
ci-extra: test_torch