Skip to content

[flang] Do not re-localize loop ivs when nested inside blocks #153350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Aug 13, 2025

Consider the following example:

  implicit none
  integer :: i, j

  do concurrent (i=1:10) local(j)
    block
      do j=1,20
      end do
    end block
  end do

Without the fix introduced in this PR, the compiler would "re-localize" the j variable inside the fir.do_concurrent loop:

    fir.do_concurrent {
      %7 = fir.alloca i32 {bindc_name = "j"}
      %8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        %12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -> (index, i32) {
          fir.store %arg3 to %8#0 : !fir.ref<i32>
          ...
        }
      }
    }

This happened because we did a shallow look-up of j and since the loop is nested inside a block, the look-up failed and we re-created a local allocation for j inside the parent fir.do_concurrent loop. This means that we ended up not using the actual localized symbol which is passed as a region argument to the fir.do_concurrent.loop op.

In case of j, we do not need to do a shallow look-up. The shallow look-up is only needed if a symbol is an OpenMP private one or an iteration variable of a do concurrent loop. Neither of which applies to j.

With the fix, j is properly resolved to the local region argument:

    fir.do_concurrent {
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        ...
        %10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -> (index, i32) {
          fir.store %arg3 to %10#0 : !fir.ref<i32>
          ...
        }
      }
    }

Consider the following example:
```fortran
  implicit none
  integer :: i, j

  do concurrent (i=1:10) local(j)
    block
      do j=1,20
      end do
    end block
  end do
```

Without the fix introduced in this PR, the compiler would "re-localize"
the `j` variable inside the `fir.do_concurrent` loop:
```
    fir.do_concurrent {
      %7 = fir.alloca i32 {bindc_name = "j"}
      %8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        %12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -> (index, i32) {
          fir.store %arg3 to %8#0 : !fir.ref<i32>
          ...
        }
      }
    }
```

This happened because we did a shallow look-up of `j` and since the loop
is nested inside a `block`, the look-up failed and we re-created a local
allocation for `j` inside the parent `fir.do_concurrent` loop. This
means that we ended up not using the actual localized symbol which is
passed as  a region argument to the `fir.do_concurrent.loop` op.

In case of `j`, we do not need to do a shallow look-up. The shallow
look-up is only needed if a symbol is an OpenMP private one or an
iteration variable of a `do concurrent` loop. Neither of which applies
to `j`.

With the fix, `j` is properly resolved to the `local` region argument:
```
    fir.do_concurrent {
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        ...
        %10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -> (index, i32) {
          fir.store %arg3 to %10#0 : !fir.ref<i32>
          ...
        }
      }
    }
```
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Aug 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 13, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Consider the following example:

  implicit none
  integer :: i, j

  do concurrent (i=1:10) local(j)
    block
      do j=1,20
      end do
    end block
  end do

Without the fix introduced in this PR, the compiler would "re-localize" the j variable inside the fir.do_concurrent loop:

    fir.do_concurrent {
      %7 = fir.alloca i32 {bindc_name = "j"}
      %8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref&lt;i32&gt;) -&gt; (!fir.ref&lt;i32&gt;, !fir.ref&lt;i32&gt;)
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@<!-- -->_QFloop_in_nested_blockEj_private_i32 %4#<!-- -->0 -&gt; %arg1 : !fir.ref&lt;i32&gt;) {
        %12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref&lt;i32&gt;) -&gt; (!fir.ref&lt;i32&gt;, !fir.ref&lt;i32&gt;)
        ...
        %17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -&gt; (index, i32) {
          fir.store %arg3 to %8#<!-- -->0 : !fir.ref&lt;i32&gt;
          ...
        }
      }
    }

This happened because we did a shallow look-up of j and since the loop is nested inside a block, the look-up failed and we re-created a local allocation for j inside the parent fir.do_concurrent loop. This means that we ended up not using the actual localized symbol which is passed as a region argument to the fir.do_concurrent.loop op.

In case of j, we do not need to do a shallow look-up. The shallow look-up is only needed if a symbol is an OpenMP private one or an iteration variable of a do concurrent loop. Neither of which applies to j.

With the fix, j is properly resolved to the local region argument:

    fir.do_concurrent {
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@<!-- -->_QFloop_in_nested_blockEj_private_i32 %4#<!-- -->0 -&gt; %arg1 : !fir.ref&lt;i32&gt;) {
        ...
        %10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref&lt;i32&gt;) -&gt; (!fir.ref&lt;i32&gt;, !fir.ref&lt;i32&gt;)
        ...
        %15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -&gt; (index, i32) {
          fir.store %arg3 to %10#<!-- -->0 : !fir.ref&lt;i32&gt;
          ...
        }
      }
    }

Full diff: https://github.com/llvm/llvm-project/pull/153350.diff

2 Files Affected:

  • (modified) flang/lib/Lower/Bridge.cpp (+17-15)
  • (added) flang/test/Lower/do_concurrent_loop_in_nested_block.f90 (+26)
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index d16488d444546..881ff630dfa1f 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -1400,21 +1400,23 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   mlir::Value genLoopVariableAddress(mlir::Location loc,
                                      const Fortran::semantics::Symbol &sym,
                                      bool isUnordered) {
-    if (isUnordered || sym.has<Fortran::semantics::HostAssocDetails>() ||
-        sym.has<Fortran::semantics::UseDetails>()) {
-      if (!shallowLookupSymbol(sym) &&
-          !GetSymbolDSA(sym).test(
-              Fortran::semantics::Symbol::Flag::OmpShared)) {
-        // Do concurrent loop variables are not mapped yet since they are local
-        // to the Do concurrent scope (same for OpenMP loops).
-        mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
-        builder->setInsertionPointToStart(builder->getAllocaBlock());
-        mlir::Type tempTy = genType(sym);
-        mlir::Value temp =
-            builder->createTemporaryAlloc(loc, tempTy, toStringRef(sym.name()));
-        bindIfNewSymbol(sym, temp);
-        builder->restoreInsertionPoint(insPt);
-      }
+    if (!shallowLookupSymbol(sym) &&
+        (isUnordered ||
+         GetSymbolDSA(sym).test(Fortran::semantics::Symbol::Flag::OmpPrivate) ||
+         GetSymbolDSA(sym).test(
+             Fortran::semantics::Symbol::Flag::OmpFirstPrivate) ||
+         GetSymbolDSA(sym).test(
+             Fortran::semantics::Symbol::Flag::OmpLastPrivate) ||
+         GetSymbolDSA(sym).test(Fortran::semantics::Symbol::Flag::OmpLinear))) {
+      // Do concurrent loop variables are not mapped yet since they are
+      // local to the Do concurrent scope (same for OpenMP loops).
+      mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
+      builder->setInsertionPointToStart(builder->getAllocaBlock());
+      mlir::Type tempTy = genType(sym);
+      mlir::Value temp =
+          builder->createTemporaryAlloc(loc, tempTy, toStringRef(sym.name()));
+      bindIfNewSymbol(sym, temp);
+      builder->restoreInsertionPoint(insPt);
     }
     auto entry = lookupSymbol(sym);
     (void)entry;
diff --git a/flang/test/Lower/do_concurrent_loop_in_nested_block.f90 b/flang/test/Lower/do_concurrent_loop_in_nested_block.f90
new file mode 100644
index 0000000000000..8c4f5048e87fa
--- /dev/null
+++ b/flang/test/Lower/do_concurrent_loop_in_nested_block.f90
@@ -0,0 +1,26 @@
+! RUN: %flang_fc1 -emit-hlfir -mmlir --enable-delayed-privatization-staging=true -o - %s | FileCheck %s
+
+subroutine loop_in_nested_block
+  implicit none
+  integer :: i, j
+
+  do concurrent (i=1:10) local(j)
+    block
+      do j=1,20
+      end do
+    end block
+  end do
+end subroutine
+
+! CHECK-LABEL: func.func @_QPloop_in_nested_block() {
+! CHECK:         %[[OUTER_J_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Ej"}
+! CHECK:         fir.do_concurrent {
+! CHECK:           fir.do_concurrent.loop {{.*}} local(@{{.*}} %[[OUTER_J_DECL]]#0 -> %[[LOCAL_J_ARG:.*]] : !fir.ref<i32>) {
+! CHECK:             %[[LOCAL_J_DECL:.*]]:2 = hlfir.declare %[[LOCAL_J_ARG]]
+! CHECK:             fir.do_loop {{.*}} iter_args(%[[NESTED_LOOP_ARG:.*]] = {{.*}}) {
+! CHECK:               fir.store %[[NESTED_LOOP_ARG]] to %[[LOCAL_J_DECL]]#0
+! CHECK:             }
+! CHECK:           }
+! CHECK:         }
+! CHECK:       }
+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants