Skip to content

Conversation

@ergawy
Copy link
Member

@ergawy ergawy commented Feb 4, 2025

Fixes a bug in updating lastprivate variables. Previously, we only iterated over the symbols collected from lastprivate clauses. This meants that for pre-determined symbols, we did not implement the update correctly (e.g. for loop iteration variables of simd constructs).

Fixes a bug in updating `lastprivate` variables. Previously, we only
iterated over the symbols collected from `lastprivate` clauses. This
meants that for pre-determined symbols, we did not implement the update
correctly (e.g. for loop iteration variables of `simd` constructs).
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Feb 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 4, 2025

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Fixes a bug in updating lastprivate variables. Previously, we only iterated over the symbols collected from lastprivate clauses. This meants that for pre-determined symbols, we did not implement the update correctly (e.g. for loop iteration variables of simd constructs).


Full diff: https://github.com/llvm/llvm-project/pull/125628.diff

2 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/DataSharingProcessor.cpp (+79-73)
  • (added) flang/test/Lower/OpenMP/lastprivate-simd.f90 (+60)
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index 36a8efd43f8ce8..55f543ca38178d 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -207,6 +207,9 @@ void DataSharingProcessor::collectSymbolsForPrivatization() {
     }
   }
 
+  // TODO For common blocks, add the underlying objects within the block. Doing
+  // so, we won't need to explicitely handle block objects (or forget to do
+  // so).
   for (auto *sym : explicitlyPrivatizedSymbols)
     allPrivatizedSymbols.insert(sym);
 }
@@ -235,82 +238,85 @@ void DataSharingProcessor::insertLastPrivateCompare(mlir::Operation *op) {
   if (auto wrapper = mlir::dyn_cast<mlir::omp::LoopWrapperInterface>(op))
     loopOp = mlir::cast<mlir::omp::LoopNestOp>(wrapper.getWrappedLoop());
 
-  bool cmpCreated = false;
   mlir::OpBuilder::InsertionGuard guard(firOpBuilder);
-  for (const omp::Clause &clause : clauses) {
-    if (clause.id != llvm::omp::OMPC_lastprivate)
-      continue;
-    if (mlir::isa<mlir::omp::WsloopOp>(op) ||
-        mlir::isa<mlir::omp::SimdOp>(op)) {
-      // Update the original variable just before exiting the worksharing
-      // loop. Conversion as follows:
-      //
-      // omp.wsloop / omp.simd {    omp.wsloop / omp.simd {
-      //   omp.loop_nest {            omp.loop_nest {
-      //     ...                        ...
-      //     store          ===>        store
-      //     omp.yield                  %v = arith.addi %iv, %step
-      //   }                            %cmp = %step < 0 ? %v < %ub : %v > %ub
-      // }                              fir.if %cmp {
-      //                                  fir.store %v to %loopIV
-      //                                  ^%lpv_update_blk:
-      //                                }
-      //                                omp.yield
-      //                              }
-      //                            }
-
-      // Only generate the compare once in presence of multiple LastPrivate
-      // clauses.
-      if (cmpCreated)
-        continue;
-      cmpCreated = true;
-
-      mlir::Location loc = loopOp.getLoc();
-      mlir::Operation *lastOper = loopOp.getRegion().back().getTerminator();
-      firOpBuilder.setInsertionPoint(lastOper);
-
-      mlir::Value cmpOp;
-      llvm::SmallVector<mlir::Value> vs;
-      vs.reserve(loopOp.getIVs().size());
-      for (auto [iv, ub, step] :
-           llvm::zip_equal(loopOp.getIVs(), loopOp.getLoopUpperBounds(),
-                           loopOp.getLoopSteps())) {
-        // v = iv + step
-        // cmp = step < 0 ? v < ub : v > ub
-        mlir::Value v = firOpBuilder.create<mlir::arith::AddIOp>(loc, iv, step);
-        vs.push_back(v);
-        mlir::Value zero =
-            firOpBuilder.createIntegerConstant(loc, step.getType(), 0);
-        mlir::Value negativeStep = firOpBuilder.create<mlir::arith::CmpIOp>(
-            loc, mlir::arith::CmpIPredicate::slt, step, zero);
-        mlir::Value vLT = firOpBuilder.create<mlir::arith::CmpIOp>(
-            loc, mlir::arith::CmpIPredicate::slt, v, ub);
-        mlir::Value vGT = firOpBuilder.create<mlir::arith::CmpIOp>(
-            loc, mlir::arith::CmpIPredicate::sgt, v, ub);
-        mlir::Value icmpOp = firOpBuilder.create<mlir::arith::SelectOp>(
-            loc, negativeStep, vLT, vGT);
-
-        if (cmpOp) {
-          cmpOp = firOpBuilder.create<mlir::arith::AndIOp>(loc, cmpOp, icmpOp);
-        } else {
-          cmpOp = icmpOp;
-        }
-      }
+  bool hasLastPrivate = [&]() {
+    for (const semantics::Symbol *sym : allPrivatizedSymbols) {
+      if (const auto *commonDet =
+              sym->detailsIf<semantics::CommonBlockDetails>()) {
+        for (const auto &mem : commonDet->objects())
+          if (mem->test(semantics::Symbol::Flag::OmpLastPrivate))
+            return true;
+      } else if (sym->test(semantics::Symbol::Flag::OmpLastPrivate))
+        return true;
+    }
 
-      auto ifOp = firOpBuilder.create<fir::IfOp>(loc, cmpOp, /*else*/ false);
-      firOpBuilder.setInsertionPointToStart(&ifOp.getThenRegion().front());
-      for (auto [v, loopIV] : llvm::zip_equal(vs, loopIVs)) {
-        assert(loopIV && "loopIV was not set");
-        firOpBuilder.createStoreWithConvert(loc, v, loopIV);
-      }
-      lastPrivIP = firOpBuilder.saveInsertionPoint();
-    } else if (mlir::isa<mlir::omp::SectionsOp>(op)) {
-      // Already handled by genOMP()
-    } else {
-      TODO(converter.getCurrentLocation(),
-           "lastprivate clause in constructs other than "
-           "simd/worksharing-loop");
+    return false;
+  }();
+
+  if (!hasLastPrivate)
+    return;
+
+  if (mlir::isa<mlir::omp::WsloopOp>(op) || mlir::isa<mlir::omp::SimdOp>(op)) {
+    // Update the original variable just before exiting the worksharing
+    // loop. Conversion as follows:
+    //
+    // omp.wsloop / omp.simd {    omp.wsloop / omp.simd {
+    //   omp.loop_nest {            omp.loop_nest {
+    //     ...                        ...
+    //     store          ===>        store
+    //     omp.yield                  %v = arith.addi %iv, %step
+    //   }                            %cmp = %step < 0 ? %v < %ub : %v > %ub
+    // }                              fir.if %cmp {
+    //                                  fir.store %v to %loopIV
+    //                                  ^%lpv_update_blk:
+    //                                }
+    //                                omp.yield
+    //                              }
+    //                            }
+    mlir::Location loc = loopOp.getLoc();
+    mlir::Operation *lastOper = loopOp.getRegion().back().getTerminator();
+    firOpBuilder.setInsertionPoint(lastOper);
+
+    mlir::Value cmpOp;
+    llvm::SmallVector<mlir::Value> vs;
+    vs.reserve(loopOp.getIVs().size());
+    for (auto [iv, ub, step] :
+         llvm::zip_equal(loopOp.getIVs(), loopOp.getLoopUpperBounds(),
+                         loopOp.getLoopSteps())) {
+      // v = iv + step
+      // cmp = step < 0 ? v < ub : v > ub
+      mlir::Value v = firOpBuilder.create<mlir::arith::AddIOp>(loc, iv, step);
+      vs.push_back(v);
+      mlir::Value zero =
+          firOpBuilder.createIntegerConstant(loc, step.getType(), 0);
+      mlir::Value negativeStep = firOpBuilder.create<mlir::arith::CmpIOp>(
+          loc, mlir::arith::CmpIPredicate::slt, step, zero);
+      mlir::Value vLT = firOpBuilder.create<mlir::arith::CmpIOp>(
+          loc, mlir::arith::CmpIPredicate::slt, v, ub);
+      mlir::Value vGT = firOpBuilder.create<mlir::arith::CmpIOp>(
+          loc, mlir::arith::CmpIPredicate::sgt, v, ub);
+      mlir::Value icmpOp = firOpBuilder.create<mlir::arith::SelectOp>(
+          loc, negativeStep, vLT, vGT);
+
+      if (cmpOp)
+        cmpOp = firOpBuilder.create<mlir::arith::AndIOp>(loc, cmpOp, icmpOp);
+      else
+        cmpOp = icmpOp;
     }
+
+    auto ifOp = firOpBuilder.create<fir::IfOp>(loc, cmpOp, /*else*/ false);
+    firOpBuilder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+    for (auto [v, loopIV] : llvm::zip_equal(vs, loopIVs)) {
+      assert(loopIV && "loopIV was not set");
+      firOpBuilder.createStoreWithConvert(loc, v, loopIV);
+    }
+    lastPrivIP = firOpBuilder.saveInsertionPoint();
+  } else if (mlir::isa<mlir::omp::SectionsOp>(op)) {
+    // Already handled by genOMP()
+  } else {
+    TODO(converter.getCurrentLocation(),
+         "lastprivate clause in constructs other than "
+         "simd/worksharing-loop");
   }
 }
 
diff --git a/flang/test/Lower/OpenMP/lastprivate-simd.f90 b/flang/test/Lower/OpenMP/lastprivate-simd.f90
new file mode 100644
index 00000000000000..75f5ebfe42f39e
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-simd.f90
@@ -0,0 +1,60 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
+
+subroutine simd_ivs
+  implicit none
+  integer :: ido1 = 1
+  integer :: ido2 = 2
+  integer :: ido3 = 3
+  integer :: ido4 = 4
+
+  !$omp parallel
+  !$omp simd collapse(3)
+  do ido1 = 1, 10
+    do ido2 = 1, 10
+      do ido3 = 1, 10
+        do ido4 = 1, 10
+        end do
+      end do
+    end do
+  end do
+  !$omp end simd
+  !$omp end parallel
+end subroutine
+
+! CHECK: func.func @_QPsimd_ivs() {
+! CHECK:   %[[IDO1_HOST_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Eido1"}
+! CHECK:   %[[IDO2_HOST_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Eido2"}
+! CHECK:   %[[IDO3_HOST_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Eido3"}
+
+! CHECK:   omp.parallel {
+! CHECK:     omp.simd private(
+! CHECK-SAME:  @{{.*}}do1_private{{.*}} %[[IDO1_HOST_DECL]]#0 -> %[[IDO1_PRIV_ARG:[^[:space:]]*]],
+! CHECK-SAME:  @{{.*}}do2_private{{.*}} %[[IDO2_HOST_DECL]]#0 -> %[[IDO2_PRIV_ARG:[^[:space:]]*]],
+! CHECK-SAME:  @{{.*}}do3_private{{.*}} %[[IDO3_HOST_DECL]]#0 -> %[[IDO3_PRIV_ARG:[^[:space:]]*]]
+! CHECK-SAME:  : {{.*}}) {
+
+! CHECK:       omp.loop_nest (%[[IV1:.*]], %[[IV2:.*]], %[[IV3:.*]]) : {{.*}} {
+! CHECK:         %[[IDO1_PRIV_DECL:.*]]:2 = hlfir.declare %[[IDO1_PRIV_ARG]] {uniq_name = "{{.*}}Eido1"}
+! CHECK:         %[[IDO2_PRIV_DECL:.*]]:2 = hlfir.declare %[[IDO2_PRIV_ARG]] {uniq_name = "{{.*}}Eido2"}
+! CHECK:         %[[IDO3_PRIV_DECL:.*]]:2 = hlfir.declare %[[IDO3_PRIV_ARG]] {uniq_name = "{{.*}}Eido3"}
+
+! CHECK:         fir.if %33 {
+! CHECK:           fir.store %{{.*}} to %[[IDO1_PRIV_DECL]]#1
+! CHECK:           fir.store %{{.*}} to %[[IDO2_PRIV_DECL]]#1
+! CHECK:           fir.store %{{.*}} to %[[IDO3_PRIV_DECL]]#1
+! CHECK:           %[[IDO1_VAL:.*]] = fir.load %[[IDO1_PRIV_DECL]]#0
+! CHECK:           hlfir.assign %[[IDO1_VAL]] to %[[IDO1_HOST_DECL]]#0
+! CHECK:           %[[IDO2_VAL:.*]] = fir.load %[[IDO2_PRIV_DECL]]#0
+! CHECK:           hlfir.assign %[[IDO2_VAL]] to %[[IDO2_HOST_DECL]]#0
+! CHECK:           %[[IDO3_VAL:.*]] = fir.load %[[IDO3_PRIV_DECL]]#0
+! CHECK:           hlfir.assign %[[IDO3_VAL]] to %[[IDO3_HOST_DECL]]#0
+! CHECK:         }
+! CHECK-NEXT:    omp.yield
+! CHECK:       }
+
+! CHECK:     }
+
+! CHECK:     omp.terminator
+! CHECK:   }
+
+! CHECK: }

Comment on lines +210 to +212
// TODO For common blocks, add the underlying objects within the block. Doing
// so, we won't need to explicitely handle block objects (or forget to do
// so).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a more general version of #125504?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope so, yes. However, sections is handled specially for some reason that I didn't look into yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, there are 2 things that complicate the handling of sections:

  1. omp.sections can only have omp.section operations. So, privatizations must be performed before omp.sections.
  2. lastprivate code must be inserted at the end of the last omp.section.

I guess this could be moved to DataSharingProcessor, but some special handling for sections would still be needed.

Also, we need to be careful when inserting barriers. As not all threads execute each section, inserting a barrier in a section can hang the program.

Comment on lines -262 to -266
// Only generate the compare once in presence of multiple LastPrivate
// clauses.
if (cmpCreated)
continue;
cmpCreated = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't loop to find at least one lastprivate symbol anymore. If hasLastPrivate == true, we create the if op and exit.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the bugfix

Copy link
Contributor

@luporl luporl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Comment on lines +210 to +212
// TODO For common blocks, add the underlying objects within the block. Doing
// so, we won't need to explicitely handle block objects (or forget to do
// so).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, there are 2 things that complicate the handling of sections:

  1. omp.sections can only have omp.section operations. So, privatizations must be performed before omp.sections.
  2. lastprivate code must be inserted at the end of the last omp.section.

I guess this could be moved to DataSharingProcessor, but some special handling for sections would still be needed.

Also, we need to be careful when inserting barriers. As not all threads execute each section, inserting a barrier in a section can hang the program.

@ergawy ergawy merged commit 25f29ee into llvm:main Feb 4, 2025
12 checks passed
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 7, 2025
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…llvm#125628)

Fixes a bug in updating `lastprivate` variables. Previously, we only
iterated over the symbols collected from `lastprivate` clauses. This
meants that for pre-determined symbols, we did not implement the update
correctly (e.g. for loop iteration variables of `simd` constructs).
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 7, 2025
…llvm#125628)

Fixes a bug in updating `lastprivate` variables. Previously, we only
iterated over the symbols collected from `lastprivate` clauses. This
meants that for pre-determined symbols, we did not implement the update
correctly (e.g. for loop iteration variables of `simd` constructs).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants