Skip to content

Conversation

@hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Jan 29, 2026

The revision enables allowReturnAllocsFromLoops in bufferization, which matches the upstream behavior. Otherwise, it can trigger an error like:

error: Yield operand #1 is not equivalent to the corresponding iter bbArg

In this context, a memref.alloca can be created inside the loop and the dynamic size can be queried from iter_arg. The ValueBoundsConstraintSet check does not support the analysis, because the runtime dimension values can still differ. E.g.,

%result = scf.for ... iter_args(%iter = %init) -> (memref<?xf32>) {
  %new_buf = memref.alloca(%some_other_size) : memref<?xf32>
  scf.yield %new_buf : memref<?xf32>  // same type, different runtime size
}

It is weird, but it is allowed. Thus, we need to handle such case in hoistOneStaticallyBoundAllocation.

The revision verifies the dimension is preserved, via:

  1. The yield operand (after walking through cast/subview) is the iter_arg.
  2. The yield operand traces to an alloca whose shape matches the iter_arg and whose dynamic size at dimIndex is memref.dim of the iter_arg.
  3. The yield operand is a scf.for result whose init arg is the iter_arg and the inner loop also preserves the dimension (recursive).

Fixes #16956

ci-extra: test_torch

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 29, 2026

I don't expect this impacting the performance. It could enable some failing tests though.

…work

The revision enables `allowReturnAllocsFromLoops` in bufferization,
which matches the upstream behavior. Otherwise, it can trigger an error
like:

```
error: Yield operand #1 is not equivalent to the corresponding iter bbArg
```

In this context, a `memref.alloca` can be created inside the loop and
the dynamic size can be queried from iter_arg. The
ValueBoundsConstraintSet check does not support the analysis, because
the runtime dimension values can still differ. E.g.,

```mlir
%result = scf.for ... iter_args(%iter = %init) -> (memref<?xf32>) {
  %new_buf = memref.alloca(%some_other_size) : memref<?xf32>
  scf.yield %new_buf : memref<?xf32>  // same type, different runtime size
}
```

It is weird, but it is allowed. Thus, we need to handle such case in
`hoistOneStaticallyBoundAllocation`.

The revision verifies the dimension is preserved, via:
1. The yield operand (after walking through cast/subview) is the iter_arg.
2. The yield operand traces to an alloca whose shape matches the iter_arg
   and whose dynamic size at `dimIndex` is `memref.dim` of the iter_arg.
3. The yield operand is a scf.for result whose init arg is the iter_arg
   and the inner loop also preserves the dimension (recursive).

Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW force-pushed the users/hanhanW/improve-bufferization-issue-16956 branch from 739f413 to 02e5678 Compare January 29, 2026 01:55
@amd-eochoalo
Copy link
Contributor

@hanhanW do you know what's up with the linux_x64_bazel tests' compilation failure?

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 29, 2026

@hanhanW do you know what's up with the linux_x64_bazel tests' compilation failure?

It is just missing a dep in BUILD.bazel.

Copy link
Collaborator

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think this is going the opposite of what the end state should be. I am not sure we want to support cases where we end up with local allocas. It almost always indicates something off in my view.

@hanhanW
Copy link
Contributor Author

hanhanW commented Jan 30, 2026

Hmmm, I think this is going the opposite of what the end state should be. I am not sure we want to support cases where we end up with local allocas. It almost always indicates something off in my view.

I thought we allow small local allocas as long as they are statically bounded, which is already happening for years?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing support in scf.for bufferization

3 participants