You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[CIR][ThroughMLIR] Improving eraseIfSafe and canonical ForOp lowering (#1660)
Fixes#1405
## Improving eraseIfSafe:
as far as I understand it eraseIfSafe should intuativly check if all
memref load/store ops are created, which obtain offsets from the
memref.reinterpret_cast in the eraseList. If so the operations in the
eraseList are erased, otherwise they are kept until all cir.load/store
ops relying on them are lowered.
One challenge here is that we can't actually do this by checking the
uses of memref.reinterpret_cast operations, as their results aren't
actually used in the created memref load/store ops (the base alloca
result found via findBaseAndIndices is used). Because of this, this base
alloca result is passed as the newAddr Value to eraseIfSafe in the
[cir.load](https://github.com/llvm/clangir/blob/6e5fa09550c98f84d017873ed3e5667fd5fd909c/clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp#L236C5-L242C6)/[cir.store](https://github.com/llvm/clangir/blob/6e5fa09550c98f84d017873ed3e5667fd5fd909c/clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp#L266C1-L271C6)
lowerings.
Currently the eraseIfSafe function counts all memref.load/store values
that use this base address:
https://github.com/llvm/clangir/blob/6e5fa09550c98f84d017873ed3e5667fd5fd909c/clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp#L215-L218
The problem here is that this also counts all the other
memref.load/store ops, which store/load to/from the base address, but
don't use the memref.reinterpret_cast ops to obtain the offsets. Because
of this the lowering fails if multiple store/loads to/from the same
array are performed in the original C code as in the example of issue
#1405. Because we are erroneously counting all the other
memref.load/store ops, the newUsedNum is (for the later stores) larger
than oldUsedNum (only the uses of the cir.ptr_stride op) and the
memref.reinterpret_cast ops are not removed.
This PR contains a first attempt to fix this (i.e only count the
memref.load/store ops, which obtain offsets from the
memref.reinterpret_cast in the eraseList). I only count
memref.load/store ops, if the first offset value, corresponds to the
offset value in the last memref.reinterpret_cast.
Limitations of this PR:
This fixes the indirect lowering of the example in issue #1405 and also
works for other test I made where multiple store/loads to/from the same
array, but assumes two thing to be the case:
1. The cir.const used as the stride in the cir.ptr_str is not reused in
other cir.ptr_stride ops
2. Only the last cir.ptr_stride can have multiple uses (for multidim
arrays)
Both of these assumptions seem to be true for the C-Code I testet (for
the translation of accesses to C/C++ Arrays to cir ops). But the
eraseIfSafe function might need to be changed/further improved in the
future to support cases, where those assumptions fail.
For example if an optimization is run on cir where the cir.const ops
with the same value are reused for the different cir.ptr_stride ops, the
indirect lowering would still fail. Or if in a multidimensional array a
subarray is accessed, e.g.
```c
int arr[3][4];
int *row = arr[1];
```
(Note: I pretty sure for this it isn't suffiicient to just extend the
function to check if all offset value, corresponds to the offset value
in the all memref.reinterpret_cast, but we would probably need to
seperatly check for each memref.reinterpret_cast if it can be removed
(instead of removing all or non in the eraseList))
## Improving the lowering of canonical ForOps:
While debugging issue #1405 I noticed a few thing that I think could be
improved in the canonical ForOp lowering:
1. There is one edge case, where the forOp should not be marked as
canonical in my opinion:
```c
int i;
for (i = 0; i < 100; i++);
i += 10;
```
(with the current lowering this for is marked canonical but since i is
replaced by the induction var of the scf.for op and the actual memory
representing i is not updated i has a wrong value after the for. This is
avoided when we lower this for as a non canonical for.)
2. I think we can directly replace the loads to the CIR.IV with the
scf.IV and not create the dummy arith.add IV, 0 op (I think this might
be a relic from previous MLIR version where the replaceOp only worked
with operations (not values). This make the IR more readable and easier
to understand. If I'm missing somethink here and the arith.add IV, 0 has
a purpose I'm not seeing let me know.
3. When implementing the change in 1, we know that in a canonical for
the induction variable is definied inside the for and is only valid
here. Because of this and since we replace the loads of the cir IV with
the scf.IV we can remove the unneccessary alloca and store op created
for the cir.IV
(These changes only show up in an non-optimized binary, but aren't
relevant when running clang with optimations, I still think they improve
the readability + understandability of the core ir)
## Running SCFPreparePass cir pass always when lowering throughMLIR:
I also noticed, that we are currently only running the SCFPreparePass
when we are printing the result of the cir to core dialect translation.
https://github.com/llvm/clangir/blob/6e5fa09550c98f84d017873ed3e5667fd5fd909c/clang/lib/CIR/CodeGen/CIRPasses.cpp#L84-L85
Because of this compiling to an object file (or llvm IR) with the
indirect lowering path fails, if the code contains a canonical for. I
suggest always running this pass, when were going throughMLIR.
## Passing through is_nontemporal in loads/store lowerings:
Since the corresponding memref ops also have this attribute it's
basically just passing through a boolean (and doesn't need any special
handling, I think). Even tho there is probably no practical application
now I think this might avoid bugs/confusion in the future. If there is
any reason not to do this let me know.
I also added a new test case for arrays, adjusted the canonical forOp
test to reflect the made changes and combined the non canonical forOp
tests into one file and added a test case for the edge case describe
before.
(Note: if I find the time I will try to run the SingleSource test suite
with the throughMLIR lowering in the next week to get a better idea,
where we are with this pipeline. In general I agree with everything
discussed in issue #1219, but I think we probably can already add more
support in regard to arrays (and maybe pointers) with the existing mlir
core constructs)
0 commit comments