[flang][OpenMP] Fix reduction init region block management #122079

ergawy · 2025-01-08T09:23:14Z

Replaces #121886
Fixes #120254 (hopefully 🤞)

Problem

Consider the following example:

program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program

The HLFIR+OMP IR for this example looks like this:

  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }

The problem addressed by this PR is related to: the alloca in the omp.parallel region + the related reduction clause on the omp.wsloop op. When we try translate the reduction from MLIR to LLVM, we have to choose an alloca insertion point. This happens in convertOmpWsloop where at entry to that function, this is what the LLVM module looks like:

define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8

Now, when we choose an alloca insertion point for the reduction, this is the chosen block omp.par.entry (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the %5 SSA value. This results in inserting allocations in omp.par.entry that reference allocations in a later block omp.par.region1 which causes the Instruction does not dominate all uses! error.

Possible solution - take 2:

This PR contains a more localized solution than #121886. It makes sure that on entry to initReductionVars, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated omp.reduction.init. This way we avoid splitting after the latest allocation block; which is what causing the issue.

llvmbot · 2025-01-08T09:23:48Z

@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Problem

Consider the following example:

program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program

The HLFIR+OMP IR for this example looks like this:

  func.func @<!-- -->_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#<!-- -->0(%3) : (!fir.ref&lt;!fir.array&lt;1xf32&gt;&gt;, !fir.shape&lt;1&gt;) -&gt; !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      %6 = fir.alloca !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      ...
      omp.wsloop private(@<!-- -->_QFEi_private_ref_i32 %1#<!-- -->0 -&gt; %arg0 : !fir.ref&lt;i32&gt;) reduction(byref @<!-- -->add_reduction_byref_box_1xf32 %6 -&gt; %arg1 : !fir.ref&lt;!fir.box&lt;!fir.array&lt;1xf32&gt;&gt;&gt;) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }

The problem addressed by this PR is related to: the alloca in the omp.parallel region + the related reduction clause on the omp.wsloop op. When we try translate the reduction from MLIR to LLVM, we have to choose an alloca insertion point. This happens in convertOmpWsloop where at entry to that function, this is what the LLVM module looks like:

define void @<!-- -->_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @<!-- -->__kmpc_global_thread_num(ptr @<!-- -->1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8

Now, when we choose an alloca insertion point for the reduction, this is the chosen block omp.par.entry (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the %5 SSA value. This results in inserting allocations in omp.par.entry that reference allocations in a later block omp.par.region1 which causes the Instruction does not dominate all uses! error.

Possible solution - take 2:

This PR contains a more localized solution than #121886. It makes sure that on entry to initReductionVars, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated omp.reduction.init. This way we avoid splitting after the latest allocation block; which is what causing the issue.

Full diff: https://github.com/llvm/llvm-project/pull/122079.diff

8 Files Affected:

(modified) flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 (+5-5)
(modified) flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 (+2-2)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+12-7)
(modified) mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir (+10-10)
(modified) mlir/test/Target/LLVMIR/openmp-private.mlir (+2)
(modified) mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir (+12-9)
(modified) mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir (+4-4)
(modified) mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir (+11-9)

diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 3aa5d042463973..fe3a326702e52a 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -96,9 +96,12 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont13:                                ; preds = %omp.private.copy16
 ! CHECK-NEXT:    %{{.*}} = phi ptr
+! CHECK-NEXT:    br label %omp.par.region
+
+! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont13
 ! CHECK-NEXT:    br label %omp.reduction.init
 
-! CHECK:       omp.reduction.init:                               ; preds = %omp.region.cont13
+! CHECK:       omp.reduction.init:                               ; preds = %omp.par.region
 !                [deffered stores for results of reduction alloc regions]
 ! CHECK:         br label %[[VAL_96:.*]]
 
@@ -132,12 +135,9 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont21:                                ; preds = %omp.reduction.neutral25
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.par.region
-
-! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont21
 ! CHECK-NEXT:    br label %omp.par.region27
 
-! CHECK:       omp.par.region27:                                 ; preds = %omp.par.region
+! CHECK:       omp.par.region27:                                 ; preds = %omp.region.cont21
 !                [call SUM runtime function]
 !                [if (sum(a) == 1)]
 ! CHECK:         br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
index 8e6f55abd5671c..b3e25ae7795617 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
@@ -27,11 +27,11 @@ end subroutine proc
 !CHECK:  %[[F_priv:.*]] = alloca ptr
 !CHECK:  %[[I_priv:.*]] = alloca i32
 
+!CHECK: omp.par.region:
+
 !CHECK: omp.reduction.init:
 !CHECK:  store ptr %{{.*}}, ptr %[[F_priv]]
 !CHECK:  store i32 0, ptr %[[I_priv]]
-
-!CHECK: omp.par.region:
 !CHECK:  br label %[[MALLOC_BB:.*]]
 
 !CHECK: [[MALLOC_BB]]:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 87cb7f03fec6aa..c837162d4cd776 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1039,9 +1039,6 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
-  builder.SetInsertPoint(latestAllocaBlock->getTerminator());
   llvm::BasicBlock *initBlock = splitBB(builder, true, "omp.reduction.init");
   auto allocaIP = llvm::IRBuilderBase::InsertPoint(
       latestAllocaBlock, latestAllocaBlock->getTerminator()->getIterator());
@@ -1061,7 +1058,10 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     }
   }
 
-  builder.SetInsertPoint(&*initBlock->getFirstNonPHIOrDbgOrAlloca());
+  if (initBlock->empty() || initBlock->getTerminator() == nullptr)
+    builder.SetInsertPoint(initBlock);
+  else
+    builder.SetInsertPoint(initBlock->getTerminator());
 
   // store result of the alloc region to the allocated pointer to the real
   // reduction variable
@@ -1086,7 +1086,12 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     assert(phis.size() == 1 && "expected one value to be yielded from the "
                                "reduction neutral element declaration region");
 
-    builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());
+    if (builder.GetInsertBlock()->empty() ||
+        builder.GetInsertBlock()->getTerminator() == nullptr)
+      builder.SetInsertPoint(builder.GetInsertBlock());
+    else
+      builder.SetInsertPoint(
+          builder.GetInsertBlock()->getTerminator());
 
     if (isByRef[i]) {
       if (!reductionDecls[i].getAllocRegion().empty())
@@ -1271,7 +1276,6 @@ static LogicalResult allocAndInitializeReductionVars(
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
   SmallVector<DeferredStore> deferredStores;
 
   if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
@@ -2080,6 +2084,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
       return llvm::make_error<PreviouslyReportedError>();
 
     assert(afterAllocas.get()->getSinglePredecessor());
+    builder.restoreIP(codeGenIP);
+
     if (failed(
             initReductionVars(opInst, reductionArgs, builder, moduleTranslation,
                               afterAllocas.get()->getSinglePredecessor(),
@@ -2099,7 +2105,6 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
         moduleTranslation, allocaIP);
 
     // ParallelOp has only one region associated with it.
-    builder.restoreIP(codeGenIP);
     llvm::Expected<llvm::BasicBlock *> regionBlock = convertOmpOpRegions(
         opInst.getRegion(), "omp.par.region", builder, moduleTranslation);
     if (!regionBlock)
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
index 55fb5954548a04..75161bac2faf42 100644
--- a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
@@ -44,7 +44,7 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -56,10 +56,12 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         %[[VAL_20:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_22:.*]] = alloca [2 x ptr], align 8
-// CHECK:         br label %[[VAL_23:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_24:.*]]
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.reduction.neutral:                            ; preds = %[[VAL_23]]
+// CHECK:         br label %[[VAL_23:omp.par.region]]
+// CHECK:       [[VAL_23]]:                                   ; preds = %[[PAR_ENTRY]]
+// CHECK:         br label %[[VAL_42:.*]]
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         br label %[[VAL_25:omp.reduction.neutral]]
+// CHECK:       [[VAL_25]]:                            ; preds = %[[RED_INIT]]
 // CHECK:         %[[VAL_26:.*]] = ptrtoint ptr %[[VAL_13]] to i64
 // CHECK:         %[[VAL_27:.*]] = icmp eq i64 %[[VAL_26]], 0
 // CHECK:         br i1 %[[VAL_27]], label %[[VAL_28:.*]], label %[[VAL_29:.*]]
@@ -79,15 +81,13 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_38:.*]]
 // CHECK:       omp.reduction.neutral8:                           ; preds = %[[VAL_36]], %[[VAL_37]]
 // CHECK:         br label %[[VAL_39:.*]]
-// CHECK:       omp.region.cont4:                                 ; preds = %[[VAL_38]]
+// CHECK:       [[VAL_39]]:                                 ; preds = %[[VAL_38]]
 // CHECK:         %[[VAL_40:.*]] = phi ptr [ %[[VAL_15]], %[[VAL_38]] ]
 // CHECK:         store ptr %[[VAL_40]], ptr %[[VAL_21]], align 8
 // CHECK:         br label %[[VAL_41:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_39]]
-// CHECK:         br label %[[VAL_42:.*]]
-// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_41]]
+// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_39]]
 // CHECK:         br label %[[VAL_43:.*]]
-// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_42]]
+// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_41]]
 // CHECK:         %[[VAL_44:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 0
 // CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_44]], align 8
 // CHECK:         %[[VAL_45:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 1
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index 5407f97286eb1a..d2ca03a8fa027a 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -199,6 +199,8 @@ llvm.func @bar(!llvm.ptr)
 // CHECK-DAG:     %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 
 // CHECK:         omp.par.region:
+// CHECK:           br label %omp.reduction.init
+// CHECK:         omp.reduction.init:
 // CHECK:           br label %[[PAR_REG_BEG:.*]]
 // CHECK:         [[PAR_REG_BEG]]:
 // CHECK-NEXT:      %{{.*}} = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[RED_ALLOC]], align 8
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
index fdfcc66b91012d..912d5568c5f262 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
@@ -77,7 +77,7 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 }
 
 // CHECK-LABEL: define internal void @sectionsreduction_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_6:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_7:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_8:.*]] = alloca i32, align 4
@@ -90,15 +90,18 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_14:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_15:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
-// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
-// CHECK:         br label %[[VAL_17:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_15]]
+
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_18:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_17]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_15]]
 // CHECK:         %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_18]]
+
+// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
+// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
+// CHECK:         br label %[[VAL_17:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_22]]
 // CHECK:         store i32 0, ptr %[[VAL_7]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_8]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_9]], align 4
@@ -109,8 +112,8 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_26:.*]] = sub i32 %[[VAL_25]], %[[VAL_24]]
 // CHECK:         %[[VAL_27:.*]] = add i32 %[[VAL_26]], 1
 // CHECK:         br label %[[VAL_28:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_22]]
-// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_22]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_17]]
+// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_17]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
 // CHECK:         br label %[[VAL_32:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_28]]
 // CHECK:         %[[VAL_33:.*]] = icmp ult i32 %[[VAL_30]], %[[VAL_27]]
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
index 8e28f0b85b259c..7f2424381e846e 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -50,7 +50,7 @@ module {
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -62,16 +62,16 @@ module {
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_23:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_24:.*]] = alloca [2 x ptr], align 8
+// CHECK:         br label %[[VAL_25:.*]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[INIT_LABEL:.*]]
 // CHECK: [[INIT_LABEL]]:
 // CHECK:         %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_13]], align 8
 // CHECK:         store ptr %[[VAL_13]], ptr %[[VAL_21]], align 8
 // CHECK:         %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_15]], align 8
 // CHECK:         store ptr %[[VAL_15]], ptr %[[VAL_23]], align 8
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_26:.*]]
 // CHECK:         br label %[[VAL_27:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_25]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[INIT_LABEL]]
 // CHECK:         br label %[[VAL_28:.*]]
 // CHECK:       omp.region.cont:                                  ; preds = %[[VAL_27]]
 // CHECK:         %[[VAL_29:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_24]], i64 0, i64 0
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
index ed7e9fada5fc44..05af32622246a6 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
@@ -36,7 +36,7 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 }
 
 // CHECK-LABEL: define internal void @sections_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_9:.*]] = getelementptr { ptr }, ptr %[[VAL_10:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_11:.*]] = load ptr, ptr %[[VAL_9]], align 8
 // CHECK:         %[[VAL_12:.*]] = alloca i32, align 4
@@ -50,14 +50,16 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_20:.*]] = alloca float, align 4
 // CHECK:         %[[VAL_21:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_23:.*]]
-// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
-// CHECK:         br label %[[VAL_24:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_22]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_24]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_22]]
 // CHECK:         br label %[[VAL_26:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_25]]
+
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
+// CHECK:         br label %[[VAL_24:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[RED_INIT]]
 // CHECK:         store i32 0, ptr %[[VAL_13]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_14]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_15]], align 4
@@ -68,8 +70,8 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_30:.*]] = sub i32 %[[VAL_29]], %[[VAL_28]]
 // CHECK:         %[[VAL_31:.*]] = add i32 %[[VAL_30]], 1
 // CHECK:         br label %[[VAL_32:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_26]]
-// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_26]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_24]]
+// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
 // CHECK:         br label %[[VAL_36:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_32]]
 // CHECK:         %[[VAL_37:.*]] = icmp ult i32 %[[VAL_34]], %[[VAL_31]]

llvmbot · 2025-01-08T09:23:48Z

@llvm/pr-subscribers-mlir-llvm

Author: Kareem Ergawy (ergawy)

Changes

Problem

Consider the following example:

program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program

The HLFIR+OMP IR for this example looks like this:

  func.func @<!-- -->_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#<!-- -->0(%3) : (!fir.ref&lt;!fir.array&lt;1xf32&gt;&gt;, !fir.shape&lt;1&gt;) -&gt; !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      %6 = fir.alloca !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      ...
      omp.wsloop private(@<!-- -->_QFEi_private_ref_i32 %1#<!-- -->0 -&gt; %arg0 : !fir.ref&lt;i32&gt;) reduction(byref @<!-- -->add_reduction_byref_box_1xf32 %6 -&gt; %arg1 : !fir.ref&lt;!fir.box&lt;!fir.array&lt;1xf32&gt;&gt;&gt;) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }

The problem addressed by this PR is related to: the alloca in the omp.parallel region + the related reduction clause on the omp.wsloop op. When we try translate the reduction from MLIR to LLVM, we have to choose an alloca insertion point. This happens in convertOmpWsloop where at entry to that function, this is what the LLVM module looks like:

define void @<!-- -->_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @<!-- -->__kmpc_global_thread_num(ptr @<!-- -->1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8

Now, when we choose an alloca insertion point for the reduction, this is the chosen block omp.par.entry (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the %5 SSA value. This results in inserting allocations in omp.par.entry that reference allocations in a later block omp.par.region1 which causes the Instruction does not dominate all uses! error.

Possible solution - take 2:

This PR contains a more localized solution than #121886. It makes sure that on entry to initReductionVars, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated omp.reduction.init. This way we avoid splitting after the latest allocation block; which is what causing the issue.

Full diff: https://github.com/llvm/llvm-project/pull/122079.diff

8 Files Affected:

(modified) flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 (+5-5)
(modified) flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 (+2-2)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+12-7)
(modified) mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir (+10-10)
(modified) mlir/test/Target/LLVMIR/openmp-private.mlir (+2)
(modified) mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir (+12-9)
(modified) mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir (+4-4)
(modified) mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir (+11-9)

diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 3aa5d042463973..fe3a326702e52a 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -96,9 +96,12 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont13:                                ; preds = %omp.private.copy16
 ! CHECK-NEXT:    %{{.*}} = phi ptr
+! CHECK-NEXT:    br label %omp.par.region
+
+! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont13
 ! CHECK-NEXT:    br label %omp.reduction.init
 
-! CHECK:       omp.reduction.init:                               ; preds = %omp.region.cont13
+! CHECK:       omp.reduction.init:                               ; preds = %omp.par.region
 !                [deffered stores for results of reduction alloc regions]
 ! CHECK:         br label %[[VAL_96:.*]]
 
@@ -132,12 +135,9 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont21:                                ; preds = %omp.reduction.neutral25
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.par.region
-
-! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont21
 ! CHECK-NEXT:    br label %omp.par.region27
 
-! CHECK:       omp.par.region27:                                 ; preds = %omp.par.region
+! CHECK:       omp.par.region27:                                 ; preds = %omp.region.cont21
 !                [call SUM runtime function]
 !                [if (sum(a) == 1)]
 ! CHECK:         br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
index 8e6f55abd5671c..b3e25ae7795617 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
@@ -27,11 +27,11 @@ end subroutine proc
 !CHECK:  %[[F_priv:.*]] = alloca ptr
 !CHECK:  %[[I_priv:.*]] = alloca i32
 
+!CHECK: omp.par.region:
+
 !CHECK: omp.reduction.init:
 !CHECK:  store ptr %{{.*}}, ptr %[[F_priv]]
 !CHECK:  store i32 0, ptr %[[I_priv]]
-
-!CHECK: omp.par.region:
 !CHECK:  br label %[[MALLOC_BB:.*]]
 
 !CHECK: [[MALLOC_BB]]:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 87cb7f03fec6aa..c837162d4cd776 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1039,9 +1039,6 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
-  builder.SetInsertPoint(latestAllocaBlock->getTerminator());
   llvm::BasicBlock *initBlock = splitBB(builder, true, "omp.reduction.init");
   auto allocaIP = llvm::IRBuilderBase::InsertPoint(
       latestAllocaBlock, latestAllocaBlock->getTerminator()->getIterator());
@@ -1061,7 +1058,10 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     }
   }
 
-  builder.SetInsertPoint(&*initBlock->getFirstNonPHIOrDbgOrAlloca());
+  if (initBlock->empty() || initBlock->getTerminator() == nullptr)
+    builder.SetInsertPoint(initBlock);
+  else
+    builder.SetInsertPoint(initBlock->getTerminator());
 
   // store result of the alloc region to the allocated pointer to the real
   // reduction variable
@@ -1086,7 +1086,12 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     assert(phis.size() == 1 && "expected one value to be yielded from the "
                                "reduction neutral element declaration region");
 
-    builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());
+    if (builder.GetInsertBlock()->empty() ||
+        builder.GetInsertBlock()->getTerminator() == nullptr)
+      builder.SetInsertPoint(builder.GetInsertBlock());
+    else
+      builder.SetInsertPoint(
+          builder.GetInsertBlock()->getTerminator());
 
     if (isByRef[i]) {
       if (!reductionDecls[i].getAllocRegion().empty())
@@ -1271,7 +1276,6 @@ static LogicalResult allocAndInitializeReductionVars(
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
   SmallVector<DeferredStore> deferredStores;
 
   if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
@@ -2080,6 +2084,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
       return llvm::make_error<PreviouslyReportedError>();
 
     assert(afterAllocas.get()->getSinglePredecessor());
+    builder.restoreIP(codeGenIP);
+
     if (failed(
             initReductionVars(opInst, reductionArgs, builder, moduleTranslation,
                               afterAllocas.get()->getSinglePredecessor(),
@@ -2099,7 +2105,6 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
         moduleTranslation, allocaIP);
 
     // ParallelOp has only one region associated with it.
-    builder.restoreIP(codeGenIP);
     llvm::Expected<llvm::BasicBlock *> regionBlock = convertOmpOpRegions(
         opInst.getRegion(), "omp.par.region", builder, moduleTranslation);
     if (!regionBlock)
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
index 55fb5954548a04..75161bac2faf42 100644
--- a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
@@ -44,7 +44,7 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -56,10 +56,12 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         %[[VAL_20:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_22:.*]] = alloca [2 x ptr], align 8
-// CHECK:         br label %[[VAL_23:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_24:.*]]
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.reduction.neutral:                            ; preds = %[[VAL_23]]
+// CHECK:         br label %[[VAL_23:omp.par.region]]
+// CHECK:       [[VAL_23]]:                                   ; preds = %[[PAR_ENTRY]]
+// CHECK:         br label %[[VAL_42:.*]]
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         br label %[[VAL_25:omp.reduction.neutral]]
+// CHECK:       [[VAL_25]]:                            ; preds = %[[RED_INIT]]
 // CHECK:         %[[VAL_26:.*]] = ptrtoint ptr %[[VAL_13]] to i64
 // CHECK:         %[[VAL_27:.*]] = icmp eq i64 %[[VAL_26]], 0
 // CHECK:         br i1 %[[VAL_27]], label %[[VAL_28:.*]], label %[[VAL_29:.*]]
@@ -79,15 +81,13 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_38:.*]]
 // CHECK:       omp.reduction.neutral8:                           ; preds = %[[VAL_36]], %[[VAL_37]]
 // CHECK:         br label %[[VAL_39:.*]]
-// CHECK:       omp.region.cont4:                                 ; preds = %[[VAL_38]]
+// CHECK:       [[VAL_39]]:                                 ; preds = %[[VAL_38]]
 // CHECK:         %[[VAL_40:.*]] = phi ptr [ %[[VAL_15]], %[[VAL_38]] ]
 // CHECK:         store ptr %[[VAL_40]], ptr %[[VAL_21]], align 8
 // CHECK:         br label %[[VAL_41:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_39]]
-// CHECK:         br label %[[VAL_42:.*]]
-// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_41]]
+// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_39]]
 // CHECK:         br label %[[VAL_43:.*]]
-// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_42]]
+// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_41]]
 // CHECK:         %[[VAL_44:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 0
 // CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_44]], align 8
 // CHECK:         %[[VAL_45:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 1
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index 5407f97286eb1a..d2ca03a8fa027a 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -199,6 +199,8 @@ llvm.func @bar(!llvm.ptr)
 // CHECK-DAG:     %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 
 // CHECK:         omp.par.region:
+// CHECK:           br label %omp.reduction.init
+// CHECK:         omp.reduction.init:
 // CHECK:           br label %[[PAR_REG_BEG:.*]]
 // CHECK:         [[PAR_REG_BEG]]:
 // CHECK-NEXT:      %{{.*}} = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[RED_ALLOC]], align 8
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
index fdfcc66b91012d..912d5568c5f262 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
@@ -77,7 +77,7 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 }
 
 // CHECK-LABEL: define internal void @sectionsreduction_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_6:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_7:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_8:.*]] = alloca i32, align 4
@@ -90,15 +90,18 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_14:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_15:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
-// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
-// CHECK:         br label %[[VAL_17:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_15]]
+
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_18:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_17]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_15]]
 // CHECK:         %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_18]]
+
+// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
+// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
+// CHECK:         br label %[[VAL_17:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_22]]
 // CHECK:         store i32 0, ptr %[[VAL_7]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_8]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_9]], align 4
@@ -109,8 +112,8 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_26:.*]] = sub i32 %[[VAL_25]], %[[VAL_24]]
 // CHECK:         %[[VAL_27:.*]] = add i32 %[[VAL_26]], 1
 // CHECK:         br label %[[VAL_28:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_22]]
-// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_22]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_17]]
+// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_17]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
 // CHECK:         br label %[[VAL_32:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_28]]
 // CHECK:         %[[VAL_33:.*]] = icmp ult i32 %[[VAL_30]], %[[VAL_27]]
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
index 8e28f0b85b259c..7f2424381e846e 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -50,7 +50,7 @@ module {
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -62,16 +62,16 @@ module {
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_23:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_24:.*]] = alloca [2 x ptr], align 8
+// CHECK:         br label %[[VAL_25:.*]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[INIT_LABEL:.*]]
 // CHECK: [[INIT_LABEL]]:
 // CHECK:         %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_13]], align 8
 // CHECK:         store ptr %[[VAL_13]], ptr %[[VAL_21]], align 8
 // CHECK:         %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_15]], align 8
 // CHECK:         store ptr %[[VAL_15]], ptr %[[VAL_23]], align 8
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_26:.*]]
 // CHECK:         br label %[[VAL_27:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_25]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[INIT_LABEL]]
 // CHECK:         br label %[[VAL_28:.*]]
 // CHECK:       omp.region.cont:                                  ; preds = %[[VAL_27]]
 // CHECK:         %[[VAL_29:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_24]], i64 0, i64 0
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
index ed7e9fada5fc44..05af32622246a6 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
@@ -36,7 +36,7 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 }
 
 // CHECK-LABEL: define internal void @sections_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_9:.*]] = getelementptr { ptr }, ptr %[[VAL_10:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_11:.*]] = load ptr, ptr %[[VAL_9]], align 8
 // CHECK:         %[[VAL_12:.*]] = alloca i32, align 4
@@ -50,14 +50,16 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_20:.*]] = alloca float, align 4
 // CHECK:         %[[VAL_21:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_23:.*]]
-// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
-// CHECK:         br label %[[VAL_24:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_22]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_24]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_22]]
 // CHECK:         br label %[[VAL_26:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_25]]
+
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
+// CHECK:         br label %[[VAL_24:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[RED_INIT]]
 // CHECK:         store i32 0, ptr %[[VAL_13]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_14]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_15]], align 4
@@ -68,8 +70,8 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_30:.*]] = sub i32 %[[VAL_29]], %[[VAL_28]]
 // CHECK:         %[[VAL_31:.*]] = add i32 %[[VAL_30]], 1
 // CHECK:         br label %[[VAL_32:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_26]]
-// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_26]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_24]]
+// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
 // CHECK:         br label %[[VAL_36:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_32]]
 // CHECK:         %[[VAL_37:.*]] = icmp ult i32 %[[VAL_34]], %[[VAL_31]]

github-actions · 2025-01-08T09:26:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

tblah

I much prefer this solution. Nice work! Just a minor comment about documentation.

This passes my downstream tests and the gfortran test suite.

tblah · 2025-01-08T09:54:41Z

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp


-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
-  builder.SetInsertPoint(latestAllocaBlock->getTerminator());


What worries me about removing this is that now the function is that now initReductionVars depends on what the current insertion point inside of the builder is.

It isn't obvious to me what the right insertion point for the builder should be when calling this function. Please could you document what assumptions initReductionVars is now making.

Added some docs to explain the pre- and post-conditions of the functions. Let me know if something needs to be clarified further.

@1

Problem Consider the following example: ```fortran program test real :: x(1) integer :: i !$omp parallel do reduction(+:x) do i = 1,1 x = 1 end do !$omp end parallel do end program ``` The HLFIR+OMP IR for this example looks like this: ```mlir func.func @_QQmain() { ... omp.parallel { %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>> %6 = fir.alloca !fir.box<!fir.array<1xf32>> ... omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) { omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) { ... omp.yield } } omp.terminator } return } ``` The problem addressed by this PR is related to: the `alloca` in the `omp.parallel` region + the related `reduction` clause on the `omp.wsloop` op. When we try translate the reduction from MLIR to LLVM, we have to choose an `alloca` insertion point. This happens in `convertOmpWsloop` where at entry to that function, this is what the LLVM module looks like: ```llvm define void @_QQmain() { %tid.addr = alloca i32, align 4 ... entry: %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1) br label %omp.par.entry omp.par.entry: %tid.addr.local = alloca i32, align 4 ... br label %omp.par.region omp.par.region: br label %omp.par.region1 omp.par.region1: ... %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8 ``` Now, when we choose an `alloca` insertion point for the reduction, this is the chosen block `omp.par.entry` (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the `%5` SSA value. This results in inserting allocations in `omp.par.entry` that reference allocations in a later block `omp.par.region1` which causes the `Instruction does not dominate all uses!` error. Possible solution - take 2: This PR contains a more localized solution than llvm#121886. It makes sure that on entry to `initReductionVars`, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated `omp.reduction.init`. This way we avoid splitting after the latest allocation block; which is what causing the issue.

tblah

Thanks for the fix!

jsjodin

LGTM

As a side note I have been wondering if it would make sense to enhance the builder (probably create a subclass) that can keep track of both the alloca and codegen IPs. This is not the first time that restoring the codegen IP isn't done correctly, and improving the builder could potentially be used to check that allocations are done in the right place.

llvmbot added mlir:llvm mlir flang Flang issues not falling into any other category mlir:openmp flang:fir-hlfir flang:openmp labels Jan 8, 2025

ergawy requested a review from tblah January 8, 2025 09:25

ergawy force-pushed the fix_alloca_block_selection_take_2 branch from 8be8cec to 2ada277 Compare January 8, 2025 09:27

ergawy mentioned this pull request Jan 8, 2025

[flang][OpenMP] Allow saving first block of an OMP region for allocas #121886

Closed

ergawy requested review from jsjodin, jsonn, luporl and skatrak and removed request for jsonn January 8, 2025 09:32

tblah reviewed Jan 8, 2025

View reviewed changes

ergawy force-pushed the fix_alloca_block_selection_take_2 branch from 2ada277 to 25ba9fb Compare January 8, 2025 11:08

ergawy force-pushed the fix_alloca_block_selection_take_2 branch from 25ba9fb to 6c170e3 Compare January 8, 2025 11:13

tblah approved these changes Jan 8, 2025

View reviewed changes

jsjodin approved these changes Jan 9, 2025

View reviewed changes

ergawy merged commit 6f9e688 into llvm:main Jan 9, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][OpenMP] Fix reduction init region block management #122079

[flang][OpenMP] Fix reduction init region block management #122079

Uh oh!

ergawy commented Jan 8, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 8, 2025 •

edited

Loading

Problem

Possible solution - take 2:

Uh oh!

llvmbot commented Jan 8, 2025

Problem

Possible solution - take 2:

Uh oh!

github-actions bot commented Jan 8, 2025 •

edited

Loading

Uh oh!

tblah left a comment

Uh oh!

tblah Jan 8, 2025

Uh oh!

ergawy Jan 8, 2025

Uh oh!

tblah left a comment

Uh oh!

jsjodin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		llvm::IRBuilderBase::InsertPointGuard guard(builder);

		builder.SetInsertPoint(latestAllocaBlock->getTerminator());

[flang][OpenMP] Fix reduction init region block management #122079

[flang][OpenMP] Fix reduction init region block management #122079

Uh oh!

Conversation

ergawy commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Possible solution - take 2:

Uh oh!

llvmbot commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Possible solution - take 2:

Uh oh!

llvmbot commented Jan 8, 2025

Problem

Possible solution - take 2:

Uh oh!

github-actions bot commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

tblah Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

ergawy Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

jsjodin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ergawy commented Jan 8, 2025 •

edited

Loading

llvmbot commented Jan 8, 2025 •

edited

Loading

github-actions bot commented Jan 8, 2025 •

edited

Loading