Skip to content

Conversation

@ergawy
Copy link
Member

@ergawy ergawy commented Jan 8, 2025

Replaces #121886
Fixes #120254 (hopefully 🤞)

Problem

Consider the following example:

program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program

The HLFIR+OMP IR for this example looks like this:

  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }

The problem addressed by this PR is related to: the alloca in the omp.parallel region + the related reduction clause on the omp.wsloop op. When we try translate the reduction from MLIR to LLVM, we have to choose an alloca insertion point. This happens in convertOmpWsloop where at entry to that function, this is what the LLVM module looks like:

define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8

Now, when we choose an alloca insertion point for the reduction, this is the chosen block omp.par.entry (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the %5 SSA value. This results in inserting allocations in omp.par.entry that reference allocations in a later block omp.par.region1 which causes the Instruction does not dominate all uses! error.

Possible solution - take 2:

This PR contains a more localized solution than #121886. It makes sure that on entry to initReductionVars, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated omp.reduction.init. This way we avoid splitting after the latest allocation block; which is what causing the issue.

@llvmbot llvmbot added mlir:llvm mlir flang Flang issues not falling into any other category mlir:openmp flang:fir-hlfir flang:openmp labels Jan 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Problem

Consider the following example:

program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program

The HLFIR+OMP IR for this example looks like this:

  func.func @<!-- -->_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#<!-- -->0(%3) : (!fir.ref&lt;!fir.array&lt;1xf32&gt;&gt;, !fir.shape&lt;1&gt;) -&gt; !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      %6 = fir.alloca !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      ...
      omp.wsloop private(@<!-- -->_QFEi_private_ref_i32 %1#<!-- -->0 -&gt; %arg0 : !fir.ref&lt;i32&gt;) reduction(byref @<!-- -->add_reduction_byref_box_1xf32 %6 -&gt; %arg1 : !fir.ref&lt;!fir.box&lt;!fir.array&lt;1xf32&gt;&gt;&gt;) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }

The problem addressed by this PR is related to: the alloca in the omp.parallel region + the related reduction clause on the omp.wsloop op. When we try translate the reduction from MLIR to LLVM, we have to choose an alloca insertion point. This happens in convertOmpWsloop where at entry to that function, this is what the LLVM module looks like:

define void @<!-- -->_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @<!-- -->__kmpc_global_thread_num(ptr @<!-- -->1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8

Now, when we choose an alloca insertion point for the reduction, this is the chosen block omp.par.entry (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the %5 SSA value. This results in inserting allocations in omp.par.entry that reference allocations in a later block omp.par.region1 which causes the Instruction does not dominate all uses! error.

Possible solution - take 2:

This PR contains a more localized solution than #121886. It makes sure that on entry to initReductionVars, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated omp.reduction.init. This way we avoid splitting after the latest allocation block; which is what causing the issue.


Full diff: https://github.com/llvm/llvm-project/pull/122079.diff

8 Files Affected:

  • (modified) flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 (+5-5)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 (+2-2)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+12-7)
  • (modified) mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir (+10-10)
  • (modified) mlir/test/Target/LLVMIR/openmp-private.mlir (+2)
  • (modified) mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir (+12-9)
  • (modified) mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir (+4-4)
  • (modified) mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir (+11-9)
diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 3aa5d042463973..fe3a326702e52a 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -96,9 +96,12 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont13:                                ; preds = %omp.private.copy16
 ! CHECK-NEXT:    %{{.*}} = phi ptr
+! CHECK-NEXT:    br label %omp.par.region
+
+! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont13
 ! CHECK-NEXT:    br label %omp.reduction.init
 
-! CHECK:       omp.reduction.init:                               ; preds = %omp.region.cont13
+! CHECK:       omp.reduction.init:                               ; preds = %omp.par.region
 !                [deffered stores for results of reduction alloc regions]
 ! CHECK:         br label %[[VAL_96:.*]]
 
@@ -132,12 +135,9 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont21:                                ; preds = %omp.reduction.neutral25
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.par.region
-
-! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont21
 ! CHECK-NEXT:    br label %omp.par.region27
 
-! CHECK:       omp.par.region27:                                 ; preds = %omp.par.region
+! CHECK:       omp.par.region27:                                 ; preds = %omp.region.cont21
 !                [call SUM runtime function]
 !                [if (sum(a) == 1)]
 ! CHECK:         br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
index 8e6f55abd5671c..b3e25ae7795617 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
@@ -27,11 +27,11 @@ end subroutine proc
 !CHECK:  %[[F_priv:.*]] = alloca ptr
 !CHECK:  %[[I_priv:.*]] = alloca i32
 
+!CHECK: omp.par.region:
+
 !CHECK: omp.reduction.init:
 !CHECK:  store ptr %{{.*}}, ptr %[[F_priv]]
 !CHECK:  store i32 0, ptr %[[I_priv]]
-
-!CHECK: omp.par.region:
 !CHECK:  br label %[[MALLOC_BB:.*]]
 
 !CHECK: [[MALLOC_BB]]:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 87cb7f03fec6aa..c837162d4cd776 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1039,9 +1039,6 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
-  builder.SetInsertPoint(latestAllocaBlock->getTerminator());
   llvm::BasicBlock *initBlock = splitBB(builder, true, "omp.reduction.init");
   auto allocaIP = llvm::IRBuilderBase::InsertPoint(
       latestAllocaBlock, latestAllocaBlock->getTerminator()->getIterator());
@@ -1061,7 +1058,10 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     }
   }
 
-  builder.SetInsertPoint(&*initBlock->getFirstNonPHIOrDbgOrAlloca());
+  if (initBlock->empty() || initBlock->getTerminator() == nullptr)
+    builder.SetInsertPoint(initBlock);
+  else
+    builder.SetInsertPoint(initBlock->getTerminator());
 
   // store result of the alloc region to the allocated pointer to the real
   // reduction variable
@@ -1086,7 +1086,12 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     assert(phis.size() == 1 && "expected one value to be yielded from the "
                                "reduction neutral element declaration region");
 
-    builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());
+    if (builder.GetInsertBlock()->empty() ||
+        builder.GetInsertBlock()->getTerminator() == nullptr)
+      builder.SetInsertPoint(builder.GetInsertBlock());
+    else
+      builder.SetInsertPoint(
+          builder.GetInsertBlock()->getTerminator());
 
     if (isByRef[i]) {
       if (!reductionDecls[i].getAllocRegion().empty())
@@ -1271,7 +1276,6 @@ static LogicalResult allocAndInitializeReductionVars(
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
   SmallVector<DeferredStore> deferredStores;
 
   if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
@@ -2080,6 +2084,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
       return llvm::make_error<PreviouslyReportedError>();
 
     assert(afterAllocas.get()->getSinglePredecessor());
+    builder.restoreIP(codeGenIP);
+
     if (failed(
             initReductionVars(opInst, reductionArgs, builder, moduleTranslation,
                               afterAllocas.get()->getSinglePredecessor(),
@@ -2099,7 +2105,6 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
         moduleTranslation, allocaIP);
 
     // ParallelOp has only one region associated with it.
-    builder.restoreIP(codeGenIP);
     llvm::Expected<llvm::BasicBlock *> regionBlock = convertOmpOpRegions(
         opInst.getRegion(), "omp.par.region", builder, moduleTranslation);
     if (!regionBlock)
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
index 55fb5954548a04..75161bac2faf42 100644
--- a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
@@ -44,7 +44,7 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -56,10 +56,12 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         %[[VAL_20:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_22:.*]] = alloca [2 x ptr], align 8
-// CHECK:         br label %[[VAL_23:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_24:.*]]
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.reduction.neutral:                            ; preds = %[[VAL_23]]
+// CHECK:         br label %[[VAL_23:omp.par.region]]
+// CHECK:       [[VAL_23]]:                                   ; preds = %[[PAR_ENTRY]]
+// CHECK:         br label %[[VAL_42:.*]]
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         br label %[[VAL_25:omp.reduction.neutral]]
+// CHECK:       [[VAL_25]]:                            ; preds = %[[RED_INIT]]
 // CHECK:         %[[VAL_26:.*]] = ptrtoint ptr %[[VAL_13]] to i64
 // CHECK:         %[[VAL_27:.*]] = icmp eq i64 %[[VAL_26]], 0
 // CHECK:         br i1 %[[VAL_27]], label %[[VAL_28:.*]], label %[[VAL_29:.*]]
@@ -79,15 +81,13 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_38:.*]]
 // CHECK:       omp.reduction.neutral8:                           ; preds = %[[VAL_36]], %[[VAL_37]]
 // CHECK:         br label %[[VAL_39:.*]]
-// CHECK:       omp.region.cont4:                                 ; preds = %[[VAL_38]]
+// CHECK:       [[VAL_39]]:                                 ; preds = %[[VAL_38]]
 // CHECK:         %[[VAL_40:.*]] = phi ptr [ %[[VAL_15]], %[[VAL_38]] ]
 // CHECK:         store ptr %[[VAL_40]], ptr %[[VAL_21]], align 8
 // CHECK:         br label %[[VAL_41:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_39]]
-// CHECK:         br label %[[VAL_42:.*]]
-// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_41]]
+// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_39]]
 // CHECK:         br label %[[VAL_43:.*]]
-// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_42]]
+// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_41]]
 // CHECK:         %[[VAL_44:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 0
 // CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_44]], align 8
 // CHECK:         %[[VAL_45:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 1
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index 5407f97286eb1a..d2ca03a8fa027a 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -199,6 +199,8 @@ llvm.func @bar(!llvm.ptr)
 // CHECK-DAG:     %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 
 // CHECK:         omp.par.region:
+// CHECK:           br label %omp.reduction.init
+// CHECK:         omp.reduction.init:
 // CHECK:           br label %[[PAR_REG_BEG:.*]]
 // CHECK:         [[PAR_REG_BEG]]:
 // CHECK-NEXT:      %{{.*}} = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[RED_ALLOC]], align 8
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
index fdfcc66b91012d..912d5568c5f262 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
@@ -77,7 +77,7 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 }
 
 // CHECK-LABEL: define internal void @sectionsreduction_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_6:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_7:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_8:.*]] = alloca i32, align 4
@@ -90,15 +90,18 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_14:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_15:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
-// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
-// CHECK:         br label %[[VAL_17:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_15]]
+
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_18:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_17]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_15]]
 // CHECK:         %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_18]]
+
+// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
+// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
+// CHECK:         br label %[[VAL_17:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_22]]
 // CHECK:         store i32 0, ptr %[[VAL_7]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_8]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_9]], align 4
@@ -109,8 +112,8 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_26:.*]] = sub i32 %[[VAL_25]], %[[VAL_24]]
 // CHECK:         %[[VAL_27:.*]] = add i32 %[[VAL_26]], 1
 // CHECK:         br label %[[VAL_28:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_22]]
-// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_22]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_17]]
+// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_17]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
 // CHECK:         br label %[[VAL_32:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_28]]
 // CHECK:         %[[VAL_33:.*]] = icmp ult i32 %[[VAL_30]], %[[VAL_27]]
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
index 8e28f0b85b259c..7f2424381e846e 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -50,7 +50,7 @@ module {
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -62,16 +62,16 @@ module {
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_23:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_24:.*]] = alloca [2 x ptr], align 8
+// CHECK:         br label %[[VAL_25:.*]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[INIT_LABEL:.*]]
 // CHECK: [[INIT_LABEL]]:
 // CHECK:         %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_13]], align 8
 // CHECK:         store ptr %[[VAL_13]], ptr %[[VAL_21]], align 8
 // CHECK:         %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_15]], align 8
 // CHECK:         store ptr %[[VAL_15]], ptr %[[VAL_23]], align 8
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_26:.*]]
 // CHECK:         br label %[[VAL_27:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_25]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[INIT_LABEL]]
 // CHECK:         br label %[[VAL_28:.*]]
 // CHECK:       omp.region.cont:                                  ; preds = %[[VAL_27]]
 // CHECK:         %[[VAL_29:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_24]], i64 0, i64 0
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
index ed7e9fada5fc44..05af32622246a6 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
@@ -36,7 +36,7 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 }
 
 // CHECK-LABEL: define internal void @sections_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_9:.*]] = getelementptr { ptr }, ptr %[[VAL_10:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_11:.*]] = load ptr, ptr %[[VAL_9]], align 8
 // CHECK:         %[[VAL_12:.*]] = alloca i32, align 4
@@ -50,14 +50,16 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_20:.*]] = alloca float, align 4
 // CHECK:         %[[VAL_21:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_23:.*]]
-// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
-// CHECK:         br label %[[VAL_24:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_22]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_24]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_22]]
 // CHECK:         br label %[[VAL_26:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_25]]
+
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
+// CHECK:         br label %[[VAL_24:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[RED_INIT]]
 // CHECK:         store i32 0, ptr %[[VAL_13]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_14]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_15]], align 4
@@ -68,8 +70,8 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_30:.*]] = sub i32 %[[VAL_29]], %[[VAL_28]]
 // CHECK:         %[[VAL_31:.*]] = add i32 %[[VAL_30]], 1
 // CHECK:         br label %[[VAL_32:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_26]]
-// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_26]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_24]]
+// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
 // CHECK:         br label %[[VAL_36:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_32]]
 // CHECK:         %[[VAL_37:.*]] = icmp ult i32 %[[VAL_34]], %[[VAL_31]]

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-mlir-llvm

Author: Kareem Ergawy (ergawy)

Changes

Problem

Consider the following example:

program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program

The HLFIR+OMP IR for this example looks like this:

  func.func @<!-- -->_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#<!-- -->0(%3) : (!fir.ref&lt;!fir.array&lt;1xf32&gt;&gt;, !fir.shape&lt;1&gt;) -&gt; !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      %6 = fir.alloca !fir.box&lt;!fir.array&lt;1xf32&gt;&gt;
      ...
      omp.wsloop private(@<!-- -->_QFEi_private_ref_i32 %1#<!-- -->0 -&gt; %arg0 : !fir.ref&lt;i32&gt;) reduction(byref @<!-- -->add_reduction_byref_box_1xf32 %6 -&gt; %arg1 : !fir.ref&lt;!fir.box&lt;!fir.array&lt;1xf32&gt;&gt;&gt;) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }

The problem addressed by this PR is related to: the alloca in the omp.parallel region + the related reduction clause on the omp.wsloop op. When we try translate the reduction from MLIR to LLVM, we have to choose an alloca insertion point. This happens in convertOmpWsloop where at entry to that function, this is what the LLVM module looks like:

define void @<!-- -->_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @<!-- -->__kmpc_global_thread_num(ptr @<!-- -->1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8

Now, when we choose an alloca insertion point for the reduction, this is the chosen block omp.par.entry (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the %5 SSA value. This results in inserting allocations in omp.par.entry that reference allocations in a later block omp.par.region1 which causes the Instruction does not dominate all uses! error.

Possible solution - take 2:

This PR contains a more localized solution than #121886. It makes sure that on entry to initReductionVars, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated omp.reduction.init. This way we avoid splitting after the latest allocation block; which is what causing the issue.


Full diff: https://github.com/llvm/llvm-project/pull/122079.diff

8 Files Affected:

  • (modified) flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 (+5-5)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 (+2-2)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+12-7)
  • (modified) mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir (+10-10)
  • (modified) mlir/test/Target/LLVMIR/openmp-private.mlir (+2)
  • (modified) mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir (+12-9)
  • (modified) mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir (+4-4)
  • (modified) mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir (+11-9)
diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 3aa5d042463973..fe3a326702e52a 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -96,9 +96,12 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont13:                                ; preds = %omp.private.copy16
 ! CHECK-NEXT:    %{{.*}} = phi ptr
+! CHECK-NEXT:    br label %omp.par.region
+
+! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont13
 ! CHECK-NEXT:    br label %omp.reduction.init
 
-! CHECK:       omp.reduction.init:                               ; preds = %omp.region.cont13
+! CHECK:       omp.reduction.init:                               ; preds = %omp.par.region
 !                [deffered stores for results of reduction alloc regions]
 ! CHECK:         br label %[[VAL_96:.*]]
 
@@ -132,12 +135,9 @@ subroutine worst_case(a, b, c, d)
 
 ! CHECK:       omp.region.cont21:                                ; preds = %omp.reduction.neutral25
 ! CHECK-NEXT:    %{{.*}} = phi ptr
-! CHECK-NEXT:    br label %omp.par.region
-
-! CHECK:       omp.par.region:                                   ; preds = %omp.region.cont21
 ! CHECK-NEXT:    br label %omp.par.region27
 
-! CHECK:       omp.par.region27:                                 ; preds = %omp.par.region
+! CHECK:       omp.par.region27:                                 ; preds = %omp.region.cont21
 !                [call SUM runtime function]
 !                [if (sum(a) == 1)]
 ! CHECK:         br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
index 8e6f55abd5671c..b3e25ae7795617 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
@@ -27,11 +27,11 @@ end subroutine proc
 !CHECK:  %[[F_priv:.*]] = alloca ptr
 !CHECK:  %[[I_priv:.*]] = alloca i32
 
+!CHECK: omp.par.region:
+
 !CHECK: omp.reduction.init:
 !CHECK:  store ptr %{{.*}}, ptr %[[F_priv]]
 !CHECK:  store i32 0, ptr %[[I_priv]]
-
-!CHECK: omp.par.region:
 !CHECK:  br label %[[MALLOC_BB:.*]]
 
 !CHECK: [[MALLOC_BB]]:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 87cb7f03fec6aa..c837162d4cd776 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1039,9 +1039,6 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
-  builder.SetInsertPoint(latestAllocaBlock->getTerminator());
   llvm::BasicBlock *initBlock = splitBB(builder, true, "omp.reduction.init");
   auto allocaIP = llvm::IRBuilderBase::InsertPoint(
       latestAllocaBlock, latestAllocaBlock->getTerminator()->getIterator());
@@ -1061,7 +1058,10 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     }
   }
 
-  builder.SetInsertPoint(&*initBlock->getFirstNonPHIOrDbgOrAlloca());
+  if (initBlock->empty() || initBlock->getTerminator() == nullptr)
+    builder.SetInsertPoint(initBlock);
+  else
+    builder.SetInsertPoint(initBlock->getTerminator());
 
   // store result of the alloc region to the allocated pointer to the real
   // reduction variable
@@ -1086,7 +1086,12 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
     assert(phis.size() == 1 && "expected one value to be yielded from the "
                                "reduction neutral element declaration region");
 
-    builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());
+    if (builder.GetInsertBlock()->empty() ||
+        builder.GetInsertBlock()->getTerminator() == nullptr)
+      builder.SetInsertPoint(builder.GetInsertBlock());
+    else
+      builder.SetInsertPoint(
+          builder.GetInsertBlock()->getTerminator());
 
     if (isByRef[i]) {
       if (!reductionDecls[i].getAllocRegion().empty())
@@ -1271,7 +1276,6 @@ static LogicalResult allocAndInitializeReductionVars(
   if (op.getNumReductionVars() == 0)
     return success();
 
-  llvm::IRBuilderBase::InsertPointGuard guard(builder);
   SmallVector<DeferredStore> deferredStores;
 
   if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
@@ -2080,6 +2084,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
       return llvm::make_error<PreviouslyReportedError>();
 
     assert(afterAllocas.get()->getSinglePredecessor());
+    builder.restoreIP(codeGenIP);
+
     if (failed(
             initReductionVars(opInst, reductionArgs, builder, moduleTranslation,
                               afterAllocas.get()->getSinglePredecessor(),
@@ -2099,7 +2105,6 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
         moduleTranslation, allocaIP);
 
     // ParallelOp has only one region associated with it.
-    builder.restoreIP(codeGenIP);
     llvm::Expected<llvm::BasicBlock *> regionBlock = convertOmpOpRegions(
         opInst.getRegion(), "omp.par.region", builder, moduleTranslation);
     if (!regionBlock)
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
index 55fb5954548a04..75161bac2faf42 100644
--- a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
@@ -44,7 +44,7 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -56,10 +56,12 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         %[[VAL_20:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_22:.*]] = alloca [2 x ptr], align 8
-// CHECK:         br label %[[VAL_23:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_24:.*]]
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.reduction.neutral:                            ; preds = %[[VAL_23]]
+// CHECK:         br label %[[VAL_23:omp.par.region]]
+// CHECK:       [[VAL_23]]:                                   ; preds = %[[PAR_ENTRY]]
+// CHECK:         br label %[[VAL_42:.*]]
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         br label %[[VAL_25:omp.reduction.neutral]]
+// CHECK:       [[VAL_25]]:                            ; preds = %[[RED_INIT]]
 // CHECK:         %[[VAL_26:.*]] = ptrtoint ptr %[[VAL_13]] to i64
 // CHECK:         %[[VAL_27:.*]] = icmp eq i64 %[[VAL_26]], 0
 // CHECK:         br i1 %[[VAL_27]], label %[[VAL_28:.*]], label %[[VAL_29:.*]]
@@ -79,15 +81,13 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
 // CHECK:         br label %[[VAL_38:.*]]
 // CHECK:       omp.reduction.neutral8:                           ; preds = %[[VAL_36]], %[[VAL_37]]
 // CHECK:         br label %[[VAL_39:.*]]
-// CHECK:       omp.region.cont4:                                 ; preds = %[[VAL_38]]
+// CHECK:       [[VAL_39]]:                                 ; preds = %[[VAL_38]]
 // CHECK:         %[[VAL_40:.*]] = phi ptr [ %[[VAL_15]], %[[VAL_38]] ]
 // CHECK:         store ptr %[[VAL_40]], ptr %[[VAL_21]], align 8
 // CHECK:         br label %[[VAL_41:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_39]]
-// CHECK:         br label %[[VAL_42:.*]]
-// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_41]]
+// CHECK:       omp.par.region10:                                 ; preds = %[[VAL_39]]
 // CHECK:         br label %[[VAL_43:.*]]
-// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_42]]
+// CHECK:       omp.region.cont9:                                 ; preds = %[[VAL_41]]
 // CHECK:         %[[VAL_44:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 0
 // CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_44]], align 8
 // CHECK:         %[[VAL_45:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 1
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index 5407f97286eb1a..d2ca03a8fa027a 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -199,6 +199,8 @@ llvm.func @bar(!llvm.ptr)
 // CHECK-DAG:     %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 
 // CHECK:         omp.par.region:
+// CHECK:           br label %omp.reduction.init
+// CHECK:         omp.reduction.init:
 // CHECK:           br label %[[PAR_REG_BEG:.*]]
 // CHECK:         [[PAR_REG_BEG]]:
 // CHECK-NEXT:      %{{.*}} = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[RED_ALLOC]], align 8
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
index fdfcc66b91012d..912d5568c5f262 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
@@ -77,7 +77,7 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 }
 
 // CHECK-LABEL: define internal void @sectionsreduction_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_6:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_7:.*]] = alloca i32, align 4
 // CHECK:         %[[VAL_8:.*]] = alloca i32, align 4
@@ -90,15 +90,18 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_14:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_15:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
-// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
-// CHECK:         br label %[[VAL_17:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_15]]
+
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_18:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_17]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_15]]
 // CHECK:         %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_18]]
+
+// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_16:.*]]
+// CHECK:         store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
+// CHECK:         br label %[[VAL_17:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_22]]
 // CHECK:         store i32 0, ptr %[[VAL_7]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_8]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_9]], align 4
@@ -109,8 +112,8 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
 // CHECK:         %[[VAL_26:.*]] = sub i32 %[[VAL_25]], %[[VAL_24]]
 // CHECK:         %[[VAL_27:.*]] = add i32 %[[VAL_26]], 1
 // CHECK:         br label %[[VAL_28:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_22]]
-// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_22]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_29:.*]], %[[VAL_17]]
+// CHECK:         %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_17]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
 // CHECK:         br label %[[VAL_32:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_28]]
 // CHECK:         %[[VAL_33:.*]] = icmp ult i32 %[[VAL_30]], %[[VAL_27]]
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
index 8e28f0b85b259c..7f2424381e846e 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -50,7 +50,7 @@ module {
 // CHECK:         br label %[[VAL_10:.*]]
 // CHECK:       omp.par.exit.split:                               ; preds = %[[VAL_9]]
 // CHECK:         ret void
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
 // CHECK:         %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -62,16 +62,16 @@ module {
 // CHECK:         %[[VAL_21:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_23:.*]] = alloca ptr, align 8
 // CHECK:         %[[VAL_24:.*]] = alloca [2 x ptr], align 8
+// CHECK:         br label %[[VAL_25:.*]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[INIT_LABEL:.*]]
 // CHECK: [[INIT_LABEL]]:
 // CHECK:         %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_13]], align 8
 // CHECK:         store ptr %[[VAL_13]], ptr %[[VAL_21]], align 8
 // CHECK:         %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_15]], align 8
 // CHECK:         store ptr %[[VAL_15]], ptr %[[VAL_23]], align 8
-// CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_26:.*]]
 // CHECK:         br label %[[VAL_27:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_25]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[INIT_LABEL]]
 // CHECK:         br label %[[VAL_28:.*]]
 // CHECK:       omp.region.cont:                                  ; preds = %[[VAL_27]]
 // CHECK:         %[[VAL_29:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_24]], i64 0, i64 0
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
index ed7e9fada5fc44..05af32622246a6 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
@@ -36,7 +36,7 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 }
 
 // CHECK-LABEL: define internal void @sections_..omp_par
-// CHECK:       omp.par.entry:
+// CHECK:       [[PAR_ENTRY:omp.par.entry]]:
 // CHECK:         %[[VAL_9:.*]] = getelementptr { ptr }, ptr %[[VAL_10:.*]], i32 0, i32 0
 // CHECK:         %[[VAL_11:.*]] = load ptr, ptr %[[VAL_9]], align 8
 // CHECK:         %[[VAL_12:.*]] = alloca i32, align 4
@@ -50,14 +50,16 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_20:.*]] = alloca float, align 4
 // CHECK:         %[[VAL_21:.*]] = alloca [1 x ptr], align 8
 // CHECK:         br label %[[VAL_22:.*]]
-// CHECK:       omp.reduction.init:                               ; preds = %[[VAL_23:.*]]
-// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
-// CHECK:         br label %[[VAL_24:.*]]
-// CHECK:       omp.par.region:                                   ; preds = %[[VAL_22]]
+// CHECK:       omp.par.region:                                   ; preds = %[[PAR_ENTRY]]
 // CHECK:         br label %[[VAL_25:.*]]
-// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_24]]
+// CHECK:       omp.par.region1:                                  ; preds = %[[VAL_22]]
 // CHECK:         br label %[[VAL_26:.*]]
-// CHECK:       omp_section_loop.preheader:                       ; preds = %[[VAL_25]]
+
+// CHECK:       [[RED_INIT:omp.reduction.init]]:
+// CHECK:         store float 0.000000e+00, ptr %[[VAL_20]], align 4
+// CHECK:         br label %[[VAL_24:.*]]
+
+// CHECK:       omp_section_loop.preheader:                       ; preds = %[[RED_INIT]]
 // CHECK:         store i32 0, ptr %[[VAL_13]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_14]], align 4
 // CHECK:         store i32 1, ptr %[[VAL_15]], align 4
@@ -68,8 +70,8 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
 // CHECK:         %[[VAL_30:.*]] = sub i32 %[[VAL_29]], %[[VAL_28]]
 // CHECK:         %[[VAL_31:.*]] = add i32 %[[VAL_30]], 1
 // CHECK:         br label %[[VAL_32:.*]]
-// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_26]]
-// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_26]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
+// CHECK:       omp_section_loop.header:                          ; preds = %[[VAL_33:.*]], %[[VAL_24]]
+// CHECK:         %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
 // CHECK:         br label %[[VAL_36:.*]]
 // CHECK:       omp_section_loop.cond:                            ; preds = %[[VAL_32]]
 // CHECK:         %[[VAL_37:.*]] = icmp ult i32 %[[VAL_34]], %[[VAL_31]]

@ergawy ergawy requested a review from tblah January 8, 2025 09:25
@github-actions
Copy link

github-actions bot commented Jan 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@ergawy ergawy force-pushed the fix_alloca_block_selection_take_2 branch from 8be8cec to 2ada277 Compare January 8, 2025 09:27
@ergawy ergawy requested review from jsjodin, jsonn, luporl and skatrak and removed request for jsonn January 8, 2025 09:32
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I much prefer this solution. Nice work! Just a minor comment about documentation.

This passes my downstream tests and the gfortran test suite.


llvm::IRBuilderBase::InsertPointGuard guard(builder);

builder.SetInsertPoint(latestAllocaBlock->getTerminator());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What worries me about removing this is that now the function is that now initReductionVars depends on what the current insertion point inside of the builder is.

It isn't obvious to me what the right insertion point for the builder should be when calling this function. Please could you document what assumptions initReductionVars is now making.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some docs to explain the pre- and post-conditions of the functions. Let me know if something needs to be clarified further.

@ergawy ergawy force-pushed the fix_alloca_block_selection_take_2 branch from 2ada277 to 25ba9fb Compare January 8, 2025 11:08
Problem

Consider the following example:
```fortran
program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program
```

The HLFIR+OMP IR for this example looks like this:
```mlir
  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }
```

The problem addressed by this PR is related to: the `alloca` in the `omp.parallel` region + the related `reduction` clause on the `omp.wsloop` op. When we try translate the reduction from MLIR to LLVM, we have to choose an `alloca` insertion point. This happens in `convertOmpWsloop` where at entry to that function, this is what the LLVM module looks like:

```llvm
define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
```

Now, when we choose an `alloca` insertion point for the reduction, this is the chosen block `omp.par.entry` (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the `%5` SSA value. This results in inserting allocations in `omp.par.entry` that reference allocations in a later block `omp.par.region1` which causes the `Instruction does not dominate all uses!` error.

Possible solution - take 2:

This PR contains a more localized solution than llvm#121886. It makes sure that on entry to `initReductionVars`, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated `omp.reduction.init`. This way we avoid splitting after the latest allocation block; which is what causing the issue.
@ergawy ergawy force-pushed the fix_alloca_block_selection_take_2 branch from 25ba9fb to 6c170e3 Compare January 8, 2025 11:13
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Copy link
Contributor

@jsjodin jsjodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As a side note I have been wondering if it would make sense to enhance the builder (probably create a subclass) that can keep track of both the alloca and codegen IPs. This is not the first time that restoring the codegen IP isn't done correctly, and improving the builder could potentially be used to check that allocations are done in the right place.

@ergawy ergawy merged commit 6f9e688 into llvm:main Jan 9, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Instruction does not dominate" error with multi-dimensional reductions

4 participants