-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[flang] Do not re-localize loop ivs when nested inside block
s
#153350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[flang] Do not re-localize loop ivs when nested inside block
s
#153350
Conversation
Consider the following example: ```fortran implicit none integer :: i, j do concurrent (i=1:10) local(j) block do j=1,20 end do end block end do ``` Without the fix introduced in this PR, the compiler would "re-localize" the `j` variable inside the `fir.do_concurrent` loop: ``` fir.do_concurrent { %7 = fir.alloca i32 {bindc_name = "j"} %8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) ... fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) { %12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) ... %17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -> (index, i32) { fir.store %arg3 to %8#0 : !fir.ref<i32> ... } } } ``` This happened because we did a shallow look-up of `j` and since the loop is nested inside a `block`, the look-up failed and we re-created a local allocation for `j` inside the parent `fir.do_concurrent` loop. This means that we ended up not using the actual localized symbol which is passed as a region argument to the `fir.do_concurrent.loop` op. In case of `j`, we do not need to do a shallow look-up. The shallow look-up is only needed if a symbol is an OpenMP private one or an iteration variable of a `do concurrent` loop. Neither of which applies to `j`. With the fix, `j` is properly resolved to the `local` region argument: ``` fir.do_concurrent { ... fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) { ... %10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) ... %15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -> (index, i32) { fir.store %arg3 to %10#0 : !fir.ref<i32> ... } } } ```
@llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesConsider the following example: implicit none
integer :: i, j
do concurrent (i=1:10) local(j)
block
do j=1,20
end do
end block
end do Without the fix introduced in this PR, the compiler would "re-localize" the fir.do_concurrent {
%7 = fir.alloca i32 {bindc_name = "j"}
%8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
...
fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@<!-- -->_QFloop_in_nested_blockEj_private_i32 %4#<!-- -->0 -> %arg1 : !fir.ref<i32>) {
%12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
...
%17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -> (index, i32) {
fir.store %arg3 to %8#<!-- -->0 : !fir.ref<i32>
...
}
}
} This happened because we did a shallow look-up of In case of With the fix, fir.do_concurrent {
...
fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@<!-- -->_QFloop_in_nested_blockEj_private_i32 %4#<!-- -->0 -> %arg1 : !fir.ref<i32>) {
...
%10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
...
%15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -> (index, i32) {
fir.store %arg3 to %10#<!-- -->0 : !fir.ref<i32>
...
}
}
} Full diff: https://github.com/llvm/llvm-project/pull/153350.diff 2 Files Affected:
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index d16488d444546..881ff630dfa1f 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -1400,21 +1400,23 @@ class FirConverter : public Fortran::lower::AbstractConverter {
mlir::Value genLoopVariableAddress(mlir::Location loc,
const Fortran::semantics::Symbol &sym,
bool isUnordered) {
- if (isUnordered || sym.has<Fortran::semantics::HostAssocDetails>() ||
- sym.has<Fortran::semantics::UseDetails>()) {
- if (!shallowLookupSymbol(sym) &&
- !GetSymbolDSA(sym).test(
- Fortran::semantics::Symbol::Flag::OmpShared)) {
- // Do concurrent loop variables are not mapped yet since they are local
- // to the Do concurrent scope (same for OpenMP loops).
- mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
- builder->setInsertionPointToStart(builder->getAllocaBlock());
- mlir::Type tempTy = genType(sym);
- mlir::Value temp =
- builder->createTemporaryAlloc(loc, tempTy, toStringRef(sym.name()));
- bindIfNewSymbol(sym, temp);
- builder->restoreInsertionPoint(insPt);
- }
+ if (!shallowLookupSymbol(sym) &&
+ (isUnordered ||
+ GetSymbolDSA(sym).test(Fortran::semantics::Symbol::Flag::OmpPrivate) ||
+ GetSymbolDSA(sym).test(
+ Fortran::semantics::Symbol::Flag::OmpFirstPrivate) ||
+ GetSymbolDSA(sym).test(
+ Fortran::semantics::Symbol::Flag::OmpLastPrivate) ||
+ GetSymbolDSA(sym).test(Fortran::semantics::Symbol::Flag::OmpLinear))) {
+ // Do concurrent loop variables are not mapped yet since they are
+ // local to the Do concurrent scope (same for OpenMP loops).
+ mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
+ builder->setInsertionPointToStart(builder->getAllocaBlock());
+ mlir::Type tempTy = genType(sym);
+ mlir::Value temp =
+ builder->createTemporaryAlloc(loc, tempTy, toStringRef(sym.name()));
+ bindIfNewSymbol(sym, temp);
+ builder->restoreInsertionPoint(insPt);
}
auto entry = lookupSymbol(sym);
(void)entry;
diff --git a/flang/test/Lower/do_concurrent_loop_in_nested_block.f90 b/flang/test/Lower/do_concurrent_loop_in_nested_block.f90
new file mode 100644
index 0000000000000..8c4f5048e87fa
--- /dev/null
+++ b/flang/test/Lower/do_concurrent_loop_in_nested_block.f90
@@ -0,0 +1,26 @@
+! RUN: %flang_fc1 -emit-hlfir -mmlir --enable-delayed-privatization-staging=true -o - %s | FileCheck %s
+
+subroutine loop_in_nested_block
+ implicit none
+ integer :: i, j
+
+ do concurrent (i=1:10) local(j)
+ block
+ do j=1,20
+ end do
+ end block
+ end do
+end subroutine
+
+! CHECK-LABEL: func.func @_QPloop_in_nested_block() {
+! CHECK: %[[OUTER_J_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Ej"}
+! CHECK: fir.do_concurrent {
+! CHECK: fir.do_concurrent.loop {{.*}} local(@{{.*}} %[[OUTER_J_DECL]]#0 -> %[[LOCAL_J_ARG:.*]] : !fir.ref<i32>) {
+! CHECK: %[[LOCAL_J_DECL:.*]]:2 = hlfir.declare %[[LOCAL_J_ARG]]
+! CHECK: fir.do_loop {{.*}} iter_args(%[[NESTED_LOOP_ARG:.*]] = {{.*}}) {
+! CHECK: fir.store %[[NESTED_LOOP_ARG]] to %[[LOCAL_J_DECL]]#0
+! CHECK: }
+! CHECK: }
+! CHECK: }
+! CHECK: }
+
|
Consider the following example:
Without the fix introduced in this PR, the compiler would "re-localize" the
j
variable inside thefir.do_concurrent
loop:This happened because we did a shallow look-up of
j
and since the loop is nested inside ablock
, the look-up failed and we re-created a local allocation forj
inside the parentfir.do_concurrent
loop. This means that we ended up not using the actual localized symbol which is passed as a region argument to thefir.do_concurrent.loop
op.In case of
j
, we do not need to do a shallow look-up. The shallow look-up is only needed if a symbol is an OpenMP private one or an iteration variable of ado concurrent
loop. Neither of which applies toj
.With the fix,
j
is properly resolved to thelocal
region argument: