-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[flang][OpenMP] Fix reduction init region block management #122079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-mlir-openmp @llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesProblemConsider the following example: program test
real :: x(1)
integer :: i
!$omp parallel do reduction(+:x)
do i = 1,1
x = 1
end do
!$omp end parallel do
end programThe HLFIR+OMP IR for this example looks like this: func.func @<!-- -->_QQmain() {
...
omp.parallel {
%5 = fir.embox %4#<!-- -->0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
%6 = fir.alloca !fir.box<!fir.array<1xf32>>
...
omp.wsloop private(@<!-- -->_QFEi_private_ref_i32 %1#<!-- -->0 -> %arg0 : !fir.ref<i32>) reduction(byref @<!-- -->add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
...
omp.yield
}
}
omp.terminator
}
return
}The problem addressed by this PR is related to: the define void @<!-- -->_QQmain() {
%tid.addr = alloca i32, align 4
...
entry:
%omp_global_thread_num = call i32 @<!-- -->__kmpc_global_thread_num(ptr @<!-- -->1)
br label %omp.par.entry
omp.par.entry:
%tid.addr.local = alloca i32, align 4
...
br label %omp.par.region
omp.par.region:
br label %omp.par.region1
omp.par.region1:
...
%5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8Now, when we choose an Possible solution - take 2:This PR contains a more localized solution than #121886. It makes sure that on entry to Full diff: https://github.com/llvm/llvm-project/pull/122079.diff 8 Files Affected:
diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 3aa5d042463973..fe3a326702e52a 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -96,9 +96,12 @@ subroutine worst_case(a, b, c, d)
! CHECK: omp.region.cont13: ; preds = %omp.private.copy16
! CHECK-NEXT: %{{.*}} = phi ptr
+! CHECK-NEXT: br label %omp.par.region
+
+! CHECK: omp.par.region: ; preds = %omp.region.cont13
! CHECK-NEXT: br label %omp.reduction.init
-! CHECK: omp.reduction.init: ; preds = %omp.region.cont13
+! CHECK: omp.reduction.init: ; preds = %omp.par.region
! [deffered stores for results of reduction alloc regions]
! CHECK: br label %[[VAL_96:.*]]
@@ -132,12 +135,9 @@ subroutine worst_case(a, b, c, d)
! CHECK: omp.region.cont21: ; preds = %omp.reduction.neutral25
! CHECK-NEXT: %{{.*}} = phi ptr
-! CHECK-NEXT: br label %omp.par.region
-
-! CHECK: omp.par.region: ; preds = %omp.region.cont21
! CHECK-NEXT: br label %omp.par.region27
-! CHECK: omp.par.region27: ; preds = %omp.par.region
+! CHECK: omp.par.region27: ; preds = %omp.region.cont21
! [call SUM runtime function]
! [if (sum(a) == 1)]
! CHECK: br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
index 8e6f55abd5671c..b3e25ae7795617 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
@@ -27,11 +27,11 @@ end subroutine proc
!CHECK: %[[F_priv:.*]] = alloca ptr
!CHECK: %[[I_priv:.*]] = alloca i32
+!CHECK: omp.par.region:
+
!CHECK: omp.reduction.init:
!CHECK: store ptr %{{.*}}, ptr %[[F_priv]]
!CHECK: store i32 0, ptr %[[I_priv]]
-
-!CHECK: omp.par.region:
!CHECK: br label %[[MALLOC_BB:.*]]
!CHECK: [[MALLOC_BB]]:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 87cb7f03fec6aa..c837162d4cd776 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1039,9 +1039,6 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
if (op.getNumReductionVars() == 0)
return success();
- llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
- builder.SetInsertPoint(latestAllocaBlock->getTerminator());
llvm::BasicBlock *initBlock = splitBB(builder, true, "omp.reduction.init");
auto allocaIP = llvm::IRBuilderBase::InsertPoint(
latestAllocaBlock, latestAllocaBlock->getTerminator()->getIterator());
@@ -1061,7 +1058,10 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
}
}
- builder.SetInsertPoint(&*initBlock->getFirstNonPHIOrDbgOrAlloca());
+ if (initBlock->empty() || initBlock->getTerminator() == nullptr)
+ builder.SetInsertPoint(initBlock);
+ else
+ builder.SetInsertPoint(initBlock->getTerminator());
// store result of the alloc region to the allocated pointer to the real
// reduction variable
@@ -1086,7 +1086,12 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
assert(phis.size() == 1 && "expected one value to be yielded from the "
"reduction neutral element declaration region");
- builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());
+ if (builder.GetInsertBlock()->empty() ||
+ builder.GetInsertBlock()->getTerminator() == nullptr)
+ builder.SetInsertPoint(builder.GetInsertBlock());
+ else
+ builder.SetInsertPoint(
+ builder.GetInsertBlock()->getTerminator());
if (isByRef[i]) {
if (!reductionDecls[i].getAllocRegion().empty())
@@ -1271,7 +1276,6 @@ static LogicalResult allocAndInitializeReductionVars(
if (op.getNumReductionVars() == 0)
return success();
- llvm::IRBuilderBase::InsertPointGuard guard(builder);
SmallVector<DeferredStore> deferredStores;
if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
@@ -2080,6 +2084,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
return llvm::make_error<PreviouslyReportedError>();
assert(afterAllocas.get()->getSinglePredecessor());
+ builder.restoreIP(codeGenIP);
+
if (failed(
initReductionVars(opInst, reductionArgs, builder, moduleTranslation,
afterAllocas.get()->getSinglePredecessor(),
@@ -2099,7 +2105,6 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
moduleTranslation, allocaIP);
// ParallelOp has only one region associated with it.
- builder.restoreIP(codeGenIP);
llvm::Expected<llvm::BasicBlock *> regionBlock = convertOmpOpRegions(
opInst.getRegion(), "omp.par.region", builder, moduleTranslation);
if (!regionBlock)
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
index 55fb5954548a04..75161bac2faf42 100644
--- a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
@@ -44,7 +44,7 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
// CHECK: br label %[[VAL_10:.*]]
// CHECK: omp.par.exit.split: ; preds = %[[VAL_9]]
// CHECK: ret void
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
// CHECK: %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
// CHECK: %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -56,10 +56,12 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
// CHECK: %[[VAL_20:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_21:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_22:.*]] = alloca [2 x ptr], align 8
-// CHECK: br label %[[VAL_23:.*]]
-// CHECK: omp.reduction.init: ; preds = %[[VAL_24:.*]]
-// CHECK: br label %[[VAL_25:.*]]
-// CHECK: omp.reduction.neutral: ; preds = %[[VAL_23]]
+// CHECK: br label %[[VAL_23:omp.par.region]]
+// CHECK: [[VAL_23]]: ; preds = %[[PAR_ENTRY]]
+// CHECK: br label %[[VAL_42:.*]]
+// CHECK: [[RED_INIT:omp.reduction.init]]:
+// CHECK: br label %[[VAL_25:omp.reduction.neutral]]
+// CHECK: [[VAL_25]]: ; preds = %[[RED_INIT]]
// CHECK: %[[VAL_26:.*]] = ptrtoint ptr %[[VAL_13]] to i64
// CHECK: %[[VAL_27:.*]] = icmp eq i64 %[[VAL_26]], 0
// CHECK: br i1 %[[VAL_27]], label %[[VAL_28:.*]], label %[[VAL_29:.*]]
@@ -79,15 +81,13 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
// CHECK: br label %[[VAL_38:.*]]
// CHECK: omp.reduction.neutral8: ; preds = %[[VAL_36]], %[[VAL_37]]
// CHECK: br label %[[VAL_39:.*]]
-// CHECK: omp.region.cont4: ; preds = %[[VAL_38]]
+// CHECK: [[VAL_39]]: ; preds = %[[VAL_38]]
// CHECK: %[[VAL_40:.*]] = phi ptr [ %[[VAL_15]], %[[VAL_38]] ]
// CHECK: store ptr %[[VAL_40]], ptr %[[VAL_21]], align 8
// CHECK: br label %[[VAL_41:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_39]]
-// CHECK: br label %[[VAL_42:.*]]
-// CHECK: omp.par.region10: ; preds = %[[VAL_41]]
+// CHECK: omp.par.region10: ; preds = %[[VAL_39]]
// CHECK: br label %[[VAL_43:.*]]
-// CHECK: omp.region.cont9: ; preds = %[[VAL_42]]
+// CHECK: omp.region.cont9: ; preds = %[[VAL_41]]
// CHECK: %[[VAL_44:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 0
// CHECK: store ptr %[[VAL_20]], ptr %[[VAL_44]], align 8
// CHECK: %[[VAL_45:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 1
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index 5407f97286eb1a..d2ca03a8fa027a 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -199,6 +199,8 @@ llvm.func @bar(!llvm.ptr)
// CHECK-DAG: %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
// CHECK: omp.par.region:
+// CHECK: br label %omp.reduction.init
+// CHECK: omp.reduction.init:
// CHECK: br label %[[PAR_REG_BEG:.*]]
// CHECK: [[PAR_REG_BEG]]:
// CHECK-NEXT: %{{.*}} = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[RED_ALLOC]], align 8
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
index fdfcc66b91012d..912d5568c5f262 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
@@ -77,7 +77,7 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
}
// CHECK-LABEL: define internal void @sectionsreduction_..omp_par
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_6:.*]] = alloca i32, align 4
// CHECK: %[[VAL_7:.*]] = alloca i32, align 4
// CHECK: %[[VAL_8:.*]] = alloca i32, align 4
@@ -90,15 +90,18 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
// CHECK: %[[VAL_21:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_14:.*]] = alloca [1 x ptr], align 8
// CHECK: br label %[[VAL_15:.*]]
-// CHECK: omp.reduction.init: ; preds = %[[VAL_16:.*]]
-// CHECK: store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
-// CHECK: br label %[[VAL_17:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_15]]
+
+// CHECK: omp.par.region: ; preds = %[[PAR_ENTRY]]
// CHECK: br label %[[VAL_18:.*]]
-// CHECK: omp.par.region1: ; preds = %[[VAL_17]]
+// CHECK: omp.par.region1: ; preds = %[[VAL_15]]
// CHECK: %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
// CHECK: br label %[[VAL_22:.*]]
-// CHECK: omp_section_loop.preheader: ; preds = %[[VAL_18]]
+
+// CHECK: omp.reduction.init: ; preds = %[[VAL_16:.*]]
+// CHECK: store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
+// CHECK: br label %[[VAL_17:.*]]
+
+// CHECK: omp_section_loop.preheader: ; preds = %[[VAL_22]]
// CHECK: store i32 0, ptr %[[VAL_7]], align 4
// CHECK: store i32 1, ptr %[[VAL_8]], align 4
// CHECK: store i32 1, ptr %[[VAL_9]], align 4
@@ -109,8 +112,8 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
// CHECK: %[[VAL_26:.*]] = sub i32 %[[VAL_25]], %[[VAL_24]]
// CHECK: %[[VAL_27:.*]] = add i32 %[[VAL_26]], 1
// CHECK: br label %[[VAL_28:.*]]
-// CHECK: omp_section_loop.header: ; preds = %[[VAL_29:.*]], %[[VAL_22]]
-// CHECK: %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_22]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
+// CHECK: omp_section_loop.header: ; preds = %[[VAL_29:.*]], %[[VAL_17]]
+// CHECK: %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_17]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
// CHECK: br label %[[VAL_32:.*]]
// CHECK: omp_section_loop.cond: ; preds = %[[VAL_28]]
// CHECK: %[[VAL_33:.*]] = icmp ult i32 %[[VAL_30]], %[[VAL_27]]
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
index 8e28f0b85b259c..7f2424381e846e 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -50,7 +50,7 @@ module {
// CHECK: br label %[[VAL_10:.*]]
// CHECK: omp.par.exit.split: ; preds = %[[VAL_9]]
// CHECK: ret void
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
// CHECK: %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
// CHECK: %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -62,16 +62,16 @@ module {
// CHECK: %[[VAL_21:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_23:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_24:.*]] = alloca [2 x ptr], align 8
+// CHECK: br label %[[VAL_25:.*]]
+// CHECK: omp.par.region: ; preds = %[[PAR_ENTRY]]
// CHECK: br label %[[INIT_LABEL:.*]]
// CHECK: [[INIT_LABEL]]:
// CHECK: %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_13]], align 8
// CHECK: store ptr %[[VAL_13]], ptr %[[VAL_21]], align 8
// CHECK: %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_15]], align 8
// CHECK: store ptr %[[VAL_15]], ptr %[[VAL_23]], align 8
-// CHECK: br label %[[VAL_25:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_26:.*]]
// CHECK: br label %[[VAL_27:.*]]
-// CHECK: omp.par.region1: ; preds = %[[VAL_25]]
+// CHECK: omp.par.region1: ; preds = %[[INIT_LABEL]]
// CHECK: br label %[[VAL_28:.*]]
// CHECK: omp.region.cont: ; preds = %[[VAL_27]]
// CHECK: %[[VAL_29:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_24]], i64 0, i64 0
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
index ed7e9fada5fc44..05af32622246a6 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
@@ -36,7 +36,7 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
}
// CHECK-LABEL: define internal void @sections_..omp_par
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_9:.*]] = getelementptr { ptr }, ptr %[[VAL_10:.*]], i32 0, i32 0
// CHECK: %[[VAL_11:.*]] = load ptr, ptr %[[VAL_9]], align 8
// CHECK: %[[VAL_12:.*]] = alloca i32, align 4
@@ -50,14 +50,16 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
// CHECK: %[[VAL_20:.*]] = alloca float, align 4
// CHECK: %[[VAL_21:.*]] = alloca [1 x ptr], align 8
// CHECK: br label %[[VAL_22:.*]]
-// CHECK: omp.reduction.init: ; preds = %[[VAL_23:.*]]
-// CHECK: store float 0.000000e+00, ptr %[[VAL_20]], align 4
-// CHECK: br label %[[VAL_24:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_22]]
+// CHECK: omp.par.region: ; preds = %[[PAR_ENTRY]]
// CHECK: br label %[[VAL_25:.*]]
-// CHECK: omp.par.region1: ; preds = %[[VAL_24]]
+// CHECK: omp.par.region1: ; preds = %[[VAL_22]]
// CHECK: br label %[[VAL_26:.*]]
-// CHECK: omp_section_loop.preheader: ; preds = %[[VAL_25]]
+
+// CHECK: [[RED_INIT:omp.reduction.init]]:
+// CHECK: store float 0.000000e+00, ptr %[[VAL_20]], align 4
+// CHECK: br label %[[VAL_24:.*]]
+
+// CHECK: omp_section_loop.preheader: ; preds = %[[RED_INIT]]
// CHECK: store i32 0, ptr %[[VAL_13]], align 4
// CHECK: store i32 1, ptr %[[VAL_14]], align 4
// CHECK: store i32 1, ptr %[[VAL_15]], align 4
@@ -68,8 +70,8 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
// CHECK: %[[VAL_30:.*]] = sub i32 %[[VAL_29]], %[[VAL_28]]
// CHECK: %[[VAL_31:.*]] = add i32 %[[VAL_30]], 1
// CHECK: br label %[[VAL_32:.*]]
-// CHECK: omp_section_loop.header: ; preds = %[[VAL_33:.*]], %[[VAL_26]]
-// CHECK: %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_26]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
+// CHECK: omp_section_loop.header: ; preds = %[[VAL_33:.*]], %[[VAL_24]]
+// CHECK: %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
// CHECK: br label %[[VAL_36:.*]]
// CHECK: omp_section_loop.cond: ; preds = %[[VAL_32]]
// CHECK: %[[VAL_37:.*]] = icmp ult i32 %[[VAL_34]], %[[VAL_31]]
|
|
@llvm/pr-subscribers-mlir-llvm Author: Kareem Ergawy (ergawy) ChangesProblemConsider the following example: program test
real :: x(1)
integer :: i
!$omp parallel do reduction(+:x)
do i = 1,1
x = 1
end do
!$omp end parallel do
end programThe HLFIR+OMP IR for this example looks like this: func.func @<!-- -->_QQmain() {
...
omp.parallel {
%5 = fir.embox %4#<!-- -->0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
%6 = fir.alloca !fir.box<!fir.array<1xf32>>
...
omp.wsloop private(@<!-- -->_QFEi_private_ref_i32 %1#<!-- -->0 -> %arg0 : !fir.ref<i32>) reduction(byref @<!-- -->add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
...
omp.yield
}
}
omp.terminator
}
return
}The problem addressed by this PR is related to: the define void @<!-- -->_QQmain() {
%tid.addr = alloca i32, align 4
...
entry:
%omp_global_thread_num = call i32 @<!-- -->__kmpc_global_thread_num(ptr @<!-- -->1)
br label %omp.par.entry
omp.par.entry:
%tid.addr.local = alloca i32, align 4
...
br label %omp.par.region
omp.par.region:
br label %omp.par.region1
omp.par.region1:
...
%5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8Now, when we choose an Possible solution - take 2:This PR contains a more localized solution than #121886. It makes sure that on entry to Full diff: https://github.com/llvm/llvm-project/pull/122079.diff 8 Files Affected:
diff --git a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90 b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
index 3aa5d042463973..fe3a326702e52a 100644
--- a/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
+++ b/flang/test/Integration/OpenMP/parallel-private-reduction-worstcase.f90
@@ -96,9 +96,12 @@ subroutine worst_case(a, b, c, d)
! CHECK: omp.region.cont13: ; preds = %omp.private.copy16
! CHECK-NEXT: %{{.*}} = phi ptr
+! CHECK-NEXT: br label %omp.par.region
+
+! CHECK: omp.par.region: ; preds = %omp.region.cont13
! CHECK-NEXT: br label %omp.reduction.init
-! CHECK: omp.reduction.init: ; preds = %omp.region.cont13
+! CHECK: omp.reduction.init: ; preds = %omp.par.region
! [deffered stores for results of reduction alloc regions]
! CHECK: br label %[[VAL_96:.*]]
@@ -132,12 +135,9 @@ subroutine worst_case(a, b, c, d)
! CHECK: omp.region.cont21: ; preds = %omp.reduction.neutral25
! CHECK-NEXT: %{{.*}} = phi ptr
-! CHECK-NEXT: br label %omp.par.region
-
-! CHECK: omp.par.region: ; preds = %omp.region.cont21
! CHECK-NEXT: br label %omp.par.region27
-! CHECK: omp.par.region27: ; preds = %omp.par.region
+! CHECK: omp.par.region27: ; preds = %omp.region.cont21
! [call SUM runtime function]
! [if (sum(a) == 1)]
! CHECK: br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
diff --git a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90 b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
index 8e6f55abd5671c..b3e25ae7795617 100644
--- a/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
+++ b/flang/test/Lower/OpenMP/parallel-reduction-mixed.f90
@@ -27,11 +27,11 @@ end subroutine proc
!CHECK: %[[F_priv:.*]] = alloca ptr
!CHECK: %[[I_priv:.*]] = alloca i32
+!CHECK: omp.par.region:
+
!CHECK: omp.reduction.init:
!CHECK: store ptr %{{.*}}, ptr %[[F_priv]]
!CHECK: store i32 0, ptr %[[I_priv]]
-
-!CHECK: omp.par.region:
!CHECK: br label %[[MALLOC_BB:.*]]
!CHECK: [[MALLOC_BB]]:
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 87cb7f03fec6aa..c837162d4cd776 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1039,9 +1039,6 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
if (op.getNumReductionVars() == 0)
return success();
- llvm::IRBuilderBase::InsertPointGuard guard(builder);
-
- builder.SetInsertPoint(latestAllocaBlock->getTerminator());
llvm::BasicBlock *initBlock = splitBB(builder, true, "omp.reduction.init");
auto allocaIP = llvm::IRBuilderBase::InsertPoint(
latestAllocaBlock, latestAllocaBlock->getTerminator()->getIterator());
@@ -1061,7 +1058,10 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
}
}
- builder.SetInsertPoint(&*initBlock->getFirstNonPHIOrDbgOrAlloca());
+ if (initBlock->empty() || initBlock->getTerminator() == nullptr)
+ builder.SetInsertPoint(initBlock);
+ else
+ builder.SetInsertPoint(initBlock->getTerminator());
// store result of the alloc region to the allocated pointer to the real
// reduction variable
@@ -1086,7 +1086,12 @@ initReductionVars(OP op, ArrayRef<BlockArgument> reductionArgs,
assert(phis.size() == 1 && "expected one value to be yielded from the "
"reduction neutral element declaration region");
- builder.SetInsertPoint(builder.GetInsertBlock()->getTerminator());
+ if (builder.GetInsertBlock()->empty() ||
+ builder.GetInsertBlock()->getTerminator() == nullptr)
+ builder.SetInsertPoint(builder.GetInsertBlock());
+ else
+ builder.SetInsertPoint(
+ builder.GetInsertBlock()->getTerminator());
if (isByRef[i]) {
if (!reductionDecls[i].getAllocRegion().empty())
@@ -1271,7 +1276,6 @@ static LogicalResult allocAndInitializeReductionVars(
if (op.getNumReductionVars() == 0)
return success();
- llvm::IRBuilderBase::InsertPointGuard guard(builder);
SmallVector<DeferredStore> deferredStores;
if (failed(allocReductionVars(op, reductionArgs, builder, moduleTranslation,
@@ -2080,6 +2084,8 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
return llvm::make_error<PreviouslyReportedError>();
assert(afterAllocas.get()->getSinglePredecessor());
+ builder.restoreIP(codeGenIP);
+
if (failed(
initReductionVars(opInst, reductionArgs, builder, moduleTranslation,
afterAllocas.get()->getSinglePredecessor(),
@@ -2099,7 +2105,6 @@ convertOmpParallel(omp::ParallelOp opInst, llvm::IRBuilderBase &builder,
moduleTranslation, allocaIP);
// ParallelOp has only one region associated with it.
- builder.restoreIP(codeGenIP);
llvm::Expected<llvm::BasicBlock *> regionBlock = convertOmpOpRegions(
opInst.getRegion(), "omp.par.region", builder, moduleTranslation);
if (!regionBlock)
diff --git a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
index 55fb5954548a04..75161bac2faf42 100644
--- a/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-parallel-reduction-multiblock.mlir
@@ -44,7 +44,7 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
// CHECK: br label %[[VAL_10:.*]]
// CHECK: omp.par.exit.split: ; preds = %[[VAL_9]]
// CHECK: ret void
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
// CHECK: %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
// CHECK: %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -56,10 +56,12 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
// CHECK: %[[VAL_20:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_21:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_22:.*]] = alloca [2 x ptr], align 8
-// CHECK: br label %[[VAL_23:.*]]
-// CHECK: omp.reduction.init: ; preds = %[[VAL_24:.*]]
-// CHECK: br label %[[VAL_25:.*]]
-// CHECK: omp.reduction.neutral: ; preds = %[[VAL_23]]
+// CHECK: br label %[[VAL_23:omp.par.region]]
+// CHECK: [[VAL_23]]: ; preds = %[[PAR_ENTRY]]
+// CHECK: br label %[[VAL_42:.*]]
+// CHECK: [[RED_INIT:omp.reduction.init]]:
+// CHECK: br label %[[VAL_25:omp.reduction.neutral]]
+// CHECK: [[VAL_25]]: ; preds = %[[RED_INIT]]
// CHECK: %[[VAL_26:.*]] = ptrtoint ptr %[[VAL_13]] to i64
// CHECK: %[[VAL_27:.*]] = icmp eq i64 %[[VAL_26]], 0
// CHECK: br i1 %[[VAL_27]], label %[[VAL_28:.*]], label %[[VAL_29:.*]]
@@ -79,15 +81,13 @@ llvm.func @missordered_blocks_(%arg0: !llvm.ptr {fir.bindc_name = "x"}, %arg1: !
// CHECK: br label %[[VAL_38:.*]]
// CHECK: omp.reduction.neutral8: ; preds = %[[VAL_36]], %[[VAL_37]]
// CHECK: br label %[[VAL_39:.*]]
-// CHECK: omp.region.cont4: ; preds = %[[VAL_38]]
+// CHECK: [[VAL_39]]: ; preds = %[[VAL_38]]
// CHECK: %[[VAL_40:.*]] = phi ptr [ %[[VAL_15]], %[[VAL_38]] ]
// CHECK: store ptr %[[VAL_40]], ptr %[[VAL_21]], align 8
// CHECK: br label %[[VAL_41:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_39]]
-// CHECK: br label %[[VAL_42:.*]]
-// CHECK: omp.par.region10: ; preds = %[[VAL_41]]
+// CHECK: omp.par.region10: ; preds = %[[VAL_39]]
// CHECK: br label %[[VAL_43:.*]]
-// CHECK: omp.region.cont9: ; preds = %[[VAL_42]]
+// CHECK: omp.region.cont9: ; preds = %[[VAL_41]]
// CHECK: %[[VAL_44:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 0
// CHECK: store ptr %[[VAL_20]], ptr %[[VAL_44]], align 8
// CHECK: %[[VAL_45:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_22]], i64 0, i64 1
diff --git a/mlir/test/Target/LLVMIR/openmp-private.mlir b/mlir/test/Target/LLVMIR/openmp-private.mlir
index 5407f97286eb1a..d2ca03a8fa027a 100644
--- a/mlir/test/Target/LLVMIR/openmp-private.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-private.mlir
@@ -199,6 +199,8 @@ llvm.func @bar(!llvm.ptr)
// CHECK-DAG: %[[RED_ALLOC:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
// CHECK: omp.par.region:
+// CHECK: br label %omp.reduction.init
+// CHECK: omp.reduction.init:
// CHECK: br label %[[PAR_REG_BEG:.*]]
// CHECK: [[PAR_REG_BEG]]:
// CHECK-NEXT: %{{.*}} = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[RED_ALLOC]], align 8
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
index fdfcc66b91012d..912d5568c5f262 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-array-sections.mlir
@@ -77,7 +77,7 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
}
// CHECK-LABEL: define internal void @sectionsreduction_..omp_par
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_6:.*]] = alloca i32, align 4
// CHECK: %[[VAL_7:.*]] = alloca i32, align 4
// CHECK: %[[VAL_8:.*]] = alloca i32, align 4
@@ -90,15 +90,18 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
// CHECK: %[[VAL_21:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_14:.*]] = alloca [1 x ptr], align 8
// CHECK: br label %[[VAL_15:.*]]
-// CHECK: omp.reduction.init: ; preds = %[[VAL_16:.*]]
-// CHECK: store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
-// CHECK: br label %[[VAL_17:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_15]]
+
+// CHECK: omp.par.region: ; preds = %[[PAR_ENTRY]]
// CHECK: br label %[[VAL_18:.*]]
-// CHECK: omp.par.region1: ; preds = %[[VAL_17]]
+// CHECK: omp.par.region1: ; preds = %[[VAL_15]]
// CHECK: %[[VAL_19:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
// CHECK: br label %[[VAL_22:.*]]
-// CHECK: omp_section_loop.preheader: ; preds = %[[VAL_18]]
+
+// CHECK: omp.reduction.init: ; preds = %[[VAL_16:.*]]
+// CHECK: store ptr %[[VAL_20]], ptr %[[VAL_21]], align 8
+// CHECK: br label %[[VAL_17:.*]]
+
+// CHECK: omp_section_loop.preheader: ; preds = %[[VAL_22]]
// CHECK: store i32 0, ptr %[[VAL_7]], align 4
// CHECK: store i32 1, ptr %[[VAL_8]], align 4
// CHECK: store i32 1, ptr %[[VAL_9]], align 4
@@ -109,8 +112,8 @@ llvm.func @sectionsreduction_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attribute
// CHECK: %[[VAL_26:.*]] = sub i32 %[[VAL_25]], %[[VAL_24]]
// CHECK: %[[VAL_27:.*]] = add i32 %[[VAL_26]], 1
// CHECK: br label %[[VAL_28:.*]]
-// CHECK: omp_section_loop.header: ; preds = %[[VAL_29:.*]], %[[VAL_22]]
-// CHECK: %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_22]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
+// CHECK: omp_section_loop.header: ; preds = %[[VAL_29:.*]], %[[VAL_17]]
+// CHECK: %[[VAL_30:.*]] = phi i32 [ 0, %[[VAL_17]] ], [ %[[VAL_31:.*]], %[[VAL_29]] ]
// CHECK: br label %[[VAL_32:.*]]
// CHECK: omp_section_loop.cond: ; preds = %[[VAL_28]]
// CHECK: %[[VAL_33:.*]] = icmp ult i32 %[[VAL_30]], %[[VAL_27]]
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
index 8e28f0b85b259c..7f2424381e846e 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-init-arg.mlir
@@ -50,7 +50,7 @@ module {
// CHECK: br label %[[VAL_10:.*]]
// CHECK: omp.par.exit.split: ; preds = %[[VAL_9]]
// CHECK: ret void
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_11:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12:.*]], i32 0, i32 0
// CHECK: %[[VAL_13:.*]] = load ptr, ptr %[[VAL_11]], align 8
// CHECK: %[[VAL_14:.*]] = getelementptr { ptr, ptr }, ptr %[[VAL_12]], i32 0, i32 1
@@ -62,16 +62,16 @@ module {
// CHECK: %[[VAL_21:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_23:.*]] = alloca ptr, align 8
// CHECK: %[[VAL_24:.*]] = alloca [2 x ptr], align 8
+// CHECK: br label %[[VAL_25:.*]]
+// CHECK: omp.par.region: ; preds = %[[PAR_ENTRY]]
// CHECK: br label %[[INIT_LABEL:.*]]
// CHECK: [[INIT_LABEL]]:
// CHECK: %[[VAL_20:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_13]], align 8
// CHECK: store ptr %[[VAL_13]], ptr %[[VAL_21]], align 8
// CHECK: %[[VAL_22:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[VAL_15]], align 8
// CHECK: store ptr %[[VAL_15]], ptr %[[VAL_23]], align 8
-// CHECK: br label %[[VAL_25:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_26:.*]]
// CHECK: br label %[[VAL_27:.*]]
-// CHECK: omp.par.region1: ; preds = %[[VAL_25]]
+// CHECK: omp.par.region1: ; preds = %[[INIT_LABEL]]
// CHECK: br label %[[VAL_28:.*]]
// CHECK: omp.region.cont: ; preds = %[[VAL_27]]
// CHECK: %[[VAL_29:.*]] = getelementptr inbounds [2 x ptr], ptr %[[VAL_24]], i64 0, i64 0
diff --git a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
index ed7e9fada5fc44..05af32622246a6 100644
--- a/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-reduction-sections.mlir
@@ -36,7 +36,7 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
}
// CHECK-LABEL: define internal void @sections_..omp_par
-// CHECK: omp.par.entry:
+// CHECK: [[PAR_ENTRY:omp.par.entry]]:
// CHECK: %[[VAL_9:.*]] = getelementptr { ptr }, ptr %[[VAL_10:.*]], i32 0, i32 0
// CHECK: %[[VAL_11:.*]] = load ptr, ptr %[[VAL_9]], align 8
// CHECK: %[[VAL_12:.*]] = alloca i32, align 4
@@ -50,14 +50,16 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
// CHECK: %[[VAL_20:.*]] = alloca float, align 4
// CHECK: %[[VAL_21:.*]] = alloca [1 x ptr], align 8
// CHECK: br label %[[VAL_22:.*]]
-// CHECK: omp.reduction.init: ; preds = %[[VAL_23:.*]]
-// CHECK: store float 0.000000e+00, ptr %[[VAL_20]], align 4
-// CHECK: br label %[[VAL_24:.*]]
-// CHECK: omp.par.region: ; preds = %[[VAL_22]]
+// CHECK: omp.par.region: ; preds = %[[PAR_ENTRY]]
// CHECK: br label %[[VAL_25:.*]]
-// CHECK: omp.par.region1: ; preds = %[[VAL_24]]
+// CHECK: omp.par.region1: ; preds = %[[VAL_22]]
// CHECK: br label %[[VAL_26:.*]]
-// CHECK: omp_section_loop.preheader: ; preds = %[[VAL_25]]
+
+// CHECK: [[RED_INIT:omp.reduction.init]]:
+// CHECK: store float 0.000000e+00, ptr %[[VAL_20]], align 4
+// CHECK: br label %[[VAL_24:.*]]
+
+// CHECK: omp_section_loop.preheader: ; preds = %[[RED_INIT]]
// CHECK: store i32 0, ptr %[[VAL_13]], align 4
// CHECK: store i32 1, ptr %[[VAL_14]], align 4
// CHECK: store i32 1, ptr %[[VAL_15]], align 4
@@ -68,8 +70,8 @@ llvm.func @sections_(%arg0: !llvm.ptr {fir.bindc_name = "x"}) attributes {fir.in
// CHECK: %[[VAL_30:.*]] = sub i32 %[[VAL_29]], %[[VAL_28]]
// CHECK: %[[VAL_31:.*]] = add i32 %[[VAL_30]], 1
// CHECK: br label %[[VAL_32:.*]]
-// CHECK: omp_section_loop.header: ; preds = %[[VAL_33:.*]], %[[VAL_26]]
-// CHECK: %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_26]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
+// CHECK: omp_section_loop.header: ; preds = %[[VAL_33:.*]], %[[VAL_24]]
+// CHECK: %[[VAL_34:.*]] = phi i32 [ 0, %[[VAL_24]] ], [ %[[VAL_35:.*]], %[[VAL_33]] ]
// CHECK: br label %[[VAL_36:.*]]
// CHECK: omp_section_loop.cond: ; preds = %[[VAL_32]]
// CHECK: %[[VAL_37:.*]] = icmp ult i32 %[[VAL_34]], %[[VAL_31]]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
8be8cec to
2ada277
Compare
tblah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I much prefer this solution. Nice work! Just a minor comment about documentation.
This passes my downstream tests and the gfortran test suite.
|
|
||
| llvm::IRBuilderBase::InsertPointGuard guard(builder); | ||
|
|
||
| builder.SetInsertPoint(latestAllocaBlock->getTerminator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What worries me about removing this is that now the function is that now initReductionVars depends on what the current insertion point inside of the builder is.
It isn't obvious to me what the right insertion point for the builder should be when calling this function. Please could you document what assumptions initReductionVars is now making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some docs to explain the pre- and post-conditions of the functions. Let me know if something needs to be clarified further.
2ada277 to
25ba9fb
Compare
Problem
Consider the following example:
```fortran
program test
real :: x(1)
integer :: i
!$omp parallel do reduction(+:x)
do i = 1,1
x = 1
end do
!$omp end parallel do
end program
```
The HLFIR+OMP IR for this example looks like this:
```mlir
func.func @_QQmain() {
...
omp.parallel {
%5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
%6 = fir.alloca !fir.box<!fir.array<1xf32>>
...
omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
...
omp.yield
}
}
omp.terminator
}
return
}
```
The problem addressed by this PR is related to: the `alloca` in the `omp.parallel` region + the related `reduction` clause on the `omp.wsloop` op. When we try translate the reduction from MLIR to LLVM, we have to choose an `alloca` insertion point. This happens in `convertOmpWsloop` where at entry to that function, this is what the LLVM module looks like:
```llvm
define void @_QQmain() {
%tid.addr = alloca i32, align 4
...
entry:
%omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
br label %omp.par.entry
omp.par.entry:
%tid.addr.local = alloca i32, align 4
...
br label %omp.par.region
omp.par.region:
br label %omp.par.region1
omp.par.region1:
...
%5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
```
Now, when we choose an `alloca` insertion point for the reduction, this is the chosen block `omp.par.entry` (without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the `%5` SSA value. This results in inserting allocations in `omp.par.entry` that reference allocations in a later block `omp.par.region1` which causes the `Instruction does not dominate all uses!` error.
Possible solution - take 2:
This PR contains a more localized solution than llvm#121886. It makes sure that on entry to `initReductionVars`, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicated `omp.reduction.init`. This way we avoid splitting after the latest allocation block; which is what causing the issue.
25ba9fb to
6c170e3
Compare
tblah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
jsjodin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As a side note I have been wondering if it would make sense to enhance the builder (probably create a subclass) that can keep track of both the alloca and codegen IPs. This is not the first time that restoring the codegen IP isn't done correctly, and improving the builder could potentially be used to check that allocations are done in the right place.
Replaces #121886
Fixes #120254 (hopefully 🤞)
Problem
Consider the following example:
The HLFIR+OMP IR for this example looks like this:
The problem addressed by this PR is related to: the
allocain theomp.parallelregion + the relatedreductionclause on theomp.wsloopop. When we try translate the reduction from MLIR to LLVM, we have to choose anallocainsertion point. This happens inconvertOmpWsloopwhere at entry to that function, this is what the LLVM module looks like:Now, when we choose an
allocainsertion point for the reduction, this is the chosen blockomp.par.entry(without the changes in this PR). The problem is that the allocation needed for the reduction needs to reference the%5SSA value. This results in inserting allocations inomp.par.entrythat reference allocations in a later blockomp.par.region1which causes theInstruction does not dominate all uses!error.Possible solution - take 2:
This PR contains a more localized solution than #121886. It makes sure that on entry to
initReductionVars, the IR builder is at a point where we can starting inserting initialization region; to make things cleaner, we still split the builder insertion point to a dedicatedomp.reduction.init. This way we avoid splitting after the latest allocation block; which is what causing the issue.