Skip to content

Conversation

@ergawy
Copy link
Member

@ergawy ergawy commented Aug 13, 2025

Consider the following example:

  implicit none
  integer :: i, j

  do concurrent (i=1:10) local(j)
    block
      do j=1,20
      end do
    end block
  end do

Without the fix introduced in this PR, the compiler would "re-localize" the j variable inside the fir.do_concurrent loop:

    fir.do_concurrent {
      %7 = fir.alloca i32 {bindc_name = "j"}
      %8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        %12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -> (index, i32) {
          fir.store %arg3 to %8#0 : !fir.ref<i32>
          ...
        }
      }
    }

This happened because we did a shallow look-up of j and since the loop is nested inside a block, the look-up failed and we re-created a local allocation for j inside the parent fir.do_concurrent loop. This means that we ended up not using the actual localized symbol which is passed as a region argument to the fir.do_concurrent.loop op.

In case of j, we do not need to do a shallow look-up. The shallow look-up is only needed if a symbol is an OpenMP private one or an iteration variable of a do concurrent loop. Neither of which applies to j.

With the fix, j is properly resolved to the local region argument:

    fir.do_concurrent {
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        ...
        %10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -> (index, i32) {
          fir.store %arg3 to %10#0 : !fir.ref<i32>
          ...
        }
      }
    }

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Aug 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 13, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Consider the following example:

  implicit none
  integer :: i, j

  do concurrent (i=1:10) local(j)
    block
      do j=1,20
      end do
    end block
  end do

Without the fix introduced in this PR, the compiler would "re-localize" the j variable inside the fir.do_concurrent loop:

    fir.do_concurrent {
      %7 = fir.alloca i32 {bindc_name = "j"}
      %8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref&lt;i32&gt;) -&gt; (!fir.ref&lt;i32&gt;, !fir.ref&lt;i32&gt;)
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@<!-- -->_QFloop_in_nested_blockEj_private_i32 %4#<!-- -->0 -&gt; %arg1 : !fir.ref&lt;i32&gt;) {
        %12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref&lt;i32&gt;) -&gt; (!fir.ref&lt;i32&gt;, !fir.ref&lt;i32&gt;)
        ...
        %17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -&gt; (index, i32) {
          fir.store %arg3 to %8#<!-- -->0 : !fir.ref&lt;i32&gt;
          ...
        }
      }
    }

This happened because we did a shallow look-up of j and since the loop is nested inside a block, the look-up failed and we re-created a local allocation for j inside the parent fir.do_concurrent loop. This means that we ended up not using the actual localized symbol which is passed as a region argument to the fir.do_concurrent.loop op.

In case of j, we do not need to do a shallow look-up. The shallow look-up is only needed if a symbol is an OpenMP private one or an iteration variable of a do concurrent loop. Neither of which applies to j.

With the fix, j is properly resolved to the local region argument:

    fir.do_concurrent {
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@<!-- -->_QFloop_in_nested_blockEj_private_i32 %4#<!-- -->0 -&gt; %arg1 : !fir.ref&lt;i32&gt;) {
        ...
        %10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref&lt;i32&gt;) -&gt; (!fir.ref&lt;i32&gt;, !fir.ref&lt;i32&gt;)
        ...
        %15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -&gt; (index, i32) {
          fir.store %arg3 to %10#<!-- -->0 : !fir.ref&lt;i32&gt;
          ...
        }
      }
    }

Full diff: https://github.com/llvm/llvm-project/pull/153350.diff

2 Files Affected:

  • (modified) flang/lib/Lower/Bridge.cpp (+17-15)
  • (added) flang/test/Lower/do_concurrent_loop_in_nested_block.f90 (+26)
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index d16488d444546..881ff630dfa1f 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -1400,21 +1400,23 @@ class FirConverter : public Fortran::lower::AbstractConverter {
   mlir::Value genLoopVariableAddress(mlir::Location loc,
                                      const Fortran::semantics::Symbol &sym,
                                      bool isUnordered) {
-    if (isUnordered || sym.has<Fortran::semantics::HostAssocDetails>() ||
-        sym.has<Fortran::semantics::UseDetails>()) {
-      if (!shallowLookupSymbol(sym) &&
-          !GetSymbolDSA(sym).test(
-              Fortran::semantics::Symbol::Flag::OmpShared)) {
-        // Do concurrent loop variables are not mapped yet since they are local
-        // to the Do concurrent scope (same for OpenMP loops).
-        mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
-        builder->setInsertionPointToStart(builder->getAllocaBlock());
-        mlir::Type tempTy = genType(sym);
-        mlir::Value temp =
-            builder->createTemporaryAlloc(loc, tempTy, toStringRef(sym.name()));
-        bindIfNewSymbol(sym, temp);
-        builder->restoreInsertionPoint(insPt);
-      }
+    if (!shallowLookupSymbol(sym) &&
+        (isUnordered ||
+         GetSymbolDSA(sym).test(Fortran::semantics::Symbol::Flag::OmpPrivate) ||
+         GetSymbolDSA(sym).test(
+             Fortran::semantics::Symbol::Flag::OmpFirstPrivate) ||
+         GetSymbolDSA(sym).test(
+             Fortran::semantics::Symbol::Flag::OmpLastPrivate) ||
+         GetSymbolDSA(sym).test(Fortran::semantics::Symbol::Flag::OmpLinear))) {
+      // Do concurrent loop variables are not mapped yet since they are
+      // local to the Do concurrent scope (same for OpenMP loops).
+      mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
+      builder->setInsertionPointToStart(builder->getAllocaBlock());
+      mlir::Type tempTy = genType(sym);
+      mlir::Value temp =
+          builder->createTemporaryAlloc(loc, tempTy, toStringRef(sym.name()));
+      bindIfNewSymbol(sym, temp);
+      builder->restoreInsertionPoint(insPt);
     }
     auto entry = lookupSymbol(sym);
     (void)entry;
diff --git a/flang/test/Lower/do_concurrent_loop_in_nested_block.f90 b/flang/test/Lower/do_concurrent_loop_in_nested_block.f90
new file mode 100644
index 0000000000000..8c4f5048e87fa
--- /dev/null
+++ b/flang/test/Lower/do_concurrent_loop_in_nested_block.f90
@@ -0,0 +1,26 @@
+! RUN: %flang_fc1 -emit-hlfir -mmlir --enable-delayed-privatization-staging=true -o - %s | FileCheck %s
+
+subroutine loop_in_nested_block
+  implicit none
+  integer :: i, j
+
+  do concurrent (i=1:10) local(j)
+    block
+      do j=1,20
+      end do
+    end block
+  end do
+end subroutine
+
+! CHECK-LABEL: func.func @_QPloop_in_nested_block() {
+! CHECK:         %[[OUTER_J_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Ej"}
+! CHECK:         fir.do_concurrent {
+! CHECK:           fir.do_concurrent.loop {{.*}} local(@{{.*}} %[[OUTER_J_DECL]]#0 -> %[[LOCAL_J_ARG:.*]] : !fir.ref<i32>) {
+! CHECK:             %[[LOCAL_J_DECL:.*]]:2 = hlfir.declare %[[LOCAL_J_ARG]]
+! CHECK:             fir.do_loop {{.*}} iter_args(%[[NESTED_LOOP_ARG:.*]] = {{.*}}) {
+! CHECK:               fir.store %[[NESTED_LOOP_ARG]] to %[[LOCAL_J_DECL]]#0
+! CHECK:             }
+! CHECK:           }
+! CHECK:         }
+! CHECK:       }
+

Consider the following example:
```fortran
  implicit none
  integer :: i, j

  do concurrent (i=1:10) local(j)
    block
      do j=1,20
      end do
    end block
  end do
```

Without the fix introduced in this PR, the compiler would "re-localize"
the `j` variable inside the `fir.do_concurrent` loop:
```
    fir.do_concurrent {
      %7 = fir.alloca i32 {bindc_name = "j"}
      %8:2 = hlfir.declare %7 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        %12:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %17:2 = fir.do_loop %arg2 = %14 to %15 step %c1_1 iter_args(%arg3 = %16) -> (index, i32) {
          fir.store %arg3 to %8#0 : !fir.ref<i32>
          ...
        }
      }
    }
```

This happened because we did a shallow look-up of `j` and since the loop
is nested inside a `block`, the look-up failed and we re-created a local
allocation for `j` inside the parent `fir.do_concurrent` loop. This
means that we ended up not using the actual localized symbol which is
passed as  a region argument to the `fir.do_concurrent.loop` op.

In case of `j`, we do not need to do a shallow look-up. The shallow
look-up is only needed if a symbol is an OpenMP private one or an
iteration variable of a `do concurrent` loop. Neither of which applies
to `j`.

With the fix, `j` is properly resolved to the `local` region argument:
```
    fir.do_concurrent {
      ...
      fir.do_concurrent.loop (%arg0) = (%5) to (%6) step (%c1) local(@_QFloop_in_nested_blockEj_private_i32 %4#0 -> %arg1 : !fir.ref<i32>) {
        ...
        %10:2 = hlfir.declare %arg1 {uniq_name = "_QFloop_in_nested_blockEj"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
        ...
        %15:2 = fir.do_loop %arg2 = %12 to %13 step %c1_1 iter_args(%arg3 = %14) -> (index, i32) {
          fir.store %arg3 to %10#0 : !fir.ref<i32>
          ...
        }
      }
    }
```
@ergawy ergawy force-pushed the users/ergawy/do_not_reprivatize_loop_vars_inside_blocks branch from be6a433 to bba50a4 Compare August 14, 2025 04:25
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix

@ergawy ergawy merged commit b9e33fd into main Aug 15, 2025
9 checks passed
@ergawy ergawy deleted the users/ergawy/do_not_reprivatize_loop_vars_inside_blocks branch August 15, 2025 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants