Skip to content

Commit 098611a

Browse files
committed
[mlir][OpenMP] rewrite conversion of privatisation for omp.parallel
The existing conversion inlined private alloc regions and firstprivate copy regions in mlir, then undoing the modification of the mlir module before completing the conversion. To make this work, LLVM IR had to be generated using the wrong mapping for privatised values and then later fixed inside of OpenMPIRBuilder. This approach violated an assumption in OpenMPIRBuilder that private variables would be values not constants. Flang sometimes generates code where private variables are promoted to globals, the address of which is treated as a constant in LLVM IR. This caused the incorrect values for the private variable from being replaced by OpenMPIRBuilder: ultimately resulting in programs producing incorrect results. This patch rewrites delayed privatisation for omp.parallel to work more similarly to reductions: translating directly into LLVMIR with correct mappings for private variables. RFC: https://discourse.llvm.org/t/rfc-openmp-fix-issue-in-mlir-to-llvmir-translation-for-delayed-privatisation/81225 Tested against the gfortran testsuite and our internal test suite. Linaro's post-commit bots will check against the fujitsu test suite. I decided to add the new tests as flang integration tests rather than in mlir/test/Target/LLVMIR: - The regression test is for an issue filed against flang. i wanted to keep the reproducer similar to the code in the ticket. - I found the "worst case" CFG test difficult to reason about in abstract it helped me to think about what was going on in terms of a Fortran program. Fixes llvm#106297
1 parent 7ec3209 commit 098611a

File tree

5 files changed

+479
-263
lines changed

5 files changed

+479
-263
lines changed
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
! RUN: %flang_fc1 -fopenmp -emit-llvm %s -o - | FileCheck %s
2+
3+
! stress test cfg and builder insertion points in mlir-to-llvm conversion:
4+
! - mixing multiple delayed privatisations and multiple reductions
5+
! - multiple blocks in the private alloc region
6+
! - private alloc region has to read from the mold variable
7+
! - firstprivate
8+
! - multiple blocks in the private copy region
9+
! - multiple blocks in the reduction init region
10+
! - reduction init region has to read from the mold variable
11+
! - re-used omp.private ops
12+
! - re-used omp.reduction.declare ops
13+
! - unstructured code inside of the parallel region
14+
! - needs private dealloc region, and this has multiple blocks
15+
! - needs reduction cleanup region, and this has multiple blocks
16+
17+
! This maybe belongs in the mlir tests, but what we are doing here is complex
18+
! enough that I find the kind of minimised mlir code prefered by mlir reviwers
19+
! hard to read without some fortran here for reference. Nothing like this would
20+
! be generated by other upstream users of the MLIR OpenMP dialect.
21+
22+
subroutine worst_case(a, b, c, d)
23+
real, allocatable :: a(:), b(:), c(:), d(:)
24+
integer i
25+
26+
!$omp parallel firstprivate(a,b) reduction(+:c,d)
27+
if (sum(a) == 1) stop 1
28+
!$omp end parallel
29+
end subroutine
30+
31+
! CHECK-LABEL: define internal void @worst_case_..omp_par
32+
! CHECK-NEXT: omp.par.entry:
33+
! [reduction alloc regions inlined here]
34+
! CHECK: br label %omp.private.latealloc
35+
36+
! CHECK: omp.private.latealloc: ; preds = %omp.par.entry
37+
! CHECK-NEXT: br label %omp.private.alloc5
38+
39+
! CHECK: omp.private.alloc5: ; preds = %omp.private.latealloc
40+
! [begin private alloc for first var]
41+
! [read the length from the mold argument]
42+
! [if it is non-zero...]
43+
! CHECK: br i1 {{.*}}, label %omp.private.alloc6, label %omp.private.alloc7
44+
45+
! CHECK: omp.private.alloc7: ; preds = %omp.private.alloc5
46+
! [finish private alloc for first var with zero extent]
47+
! CHECK: br label %omp.private.alloc8
48+
49+
! CHECK: omp.private.alloc8: ; preds = %omp.private.alloc6, %omp.private.alloc7
50+
! CHECK-NEXT: br label %omp.region.cont4
51+
52+
! CHECK: omp.region.cont4: ; preds = %omp.private.alloc8
53+
! CHECK-NEXT: %{{.*}} = phi ptr
54+
! CHECK-NEXT: br label %omp.private.alloc
55+
56+
! CHECK: omp.private.alloc: ; preds = %omp.region.cont4
57+
! [begin private alloc for first var]
58+
! [read the length from the mold argument]
59+
! [if it is non-zero...]
60+
! CHECK: br i1 %{{.*}}, label %omp.private.alloc1, label %omp.private.alloc2
61+
62+
! CHECK: omp.private.alloc2: ; preds = %omp.private.alloc
63+
! [finish private alloc for second var with zero extent]
64+
! CHECK: br label %omp.private.alloc3
65+
66+
! CHECK: omp.private.alloc3: ; preds = %omp.private.alloc1, %omp.private.alloc2
67+
! CHECK-NEXT: br label %omp.region.cont
68+
69+
! CHECK: omp.region.cont: ; preds = %omp.private.alloc3
70+
! CHECK-NEXT: %{{.*}} = phi ptr
71+
! CHECK-NEXT: br label %omp.private.copy
72+
73+
! CHECK: omp.private.copy: ; preds = %omp.region.cont
74+
! CHECK-NEXT: br label %omp.private.copy10
75+
76+
! CHECK: omp.private.copy10: ; preds = %omp.private.copy
77+
! [begin firstprivate copy for first var]
78+
! [read the length, is it non-zero?]
79+
! CHECK: br i1 %{{.*}}, label %omp.private.copy11, label %omp.private.copy12
80+
81+
! CHECK: omp.private.copy12: ; preds = %omp.private.copy11, %omp.private.copy10
82+
! CHECK-NEXT: br label %omp.region.cont9
83+
84+
! CHECK: omp.region.cont9: ; preds = %omp.private.copy12
85+
! CHECK-NEXT: %{{.*}} = phi ptr
86+
! CHECK-NEXT: br label %omp.private.copy14
87+
88+
! CHECK: omp.private.copy14: ; preds = %omp.region.cont9
89+
! [begin firstprivate copy for second var]
90+
! [read the length, is it non-zero?]
91+
! CHECK: br i1 %{{.*}}, label %omp.private.copy15, label %omp.private.copy16
92+
93+
! CHECK: omp.private.copy16: ; preds = %omp.private.copy15, %omp.private.copy14
94+
! CHECK-NEXT: br label %omp.region.cont13
95+
96+
! CHECK: omp.region.cont13: ; preds = %omp.private.copy16
97+
! CHECK-NEXT: %{{.*}} = phi ptr
98+
! CHECK-NEXT: br label %omp.reduction.init
99+
100+
! CHECK: omp.reduction.init: ; preds = %omp.region.cont13
101+
! [deffered stores for results of reduction alloc regions]
102+
! CHECK: br label %[[VAL_96:.*]]
103+
104+
! CHECK: omp.reduction.neutral: ; preds = %omp.reduction.init
105+
! [start of reduction initialization region]
106+
! [null check:]
107+
! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral18, label %omp.reduction.neutral19
108+
109+
! CHECK: omp.reduction.neutral19: ; preds = %omp.reduction.neutral
110+
! [malloc and assign the default value to the reduction variable]
111+
! CHECK: br label %omp.reduction.neutral20
112+
113+
! CHECK: omp.reduction.neutral20: ; preds = %omp.reduction.neutral18, %omp.reduction.neutral19
114+
! CHECK-NEXT: br label %omp.region.cont17
115+
116+
! CHECK: omp.region.cont17: ; preds = %omp.reduction.neutral20
117+
! CHECK-NEXT: %{{.*}} = phi ptr
118+
! CHECK-NEXT: br label %omp.reduction.neutral22
119+
120+
! CHECK: omp.reduction.neutral22: ; preds = %omp.region.cont17
121+
! [start of reduction initialization region]
122+
! [null check:]
123+
! CHECK: br i1 %{{.*}}, label %omp.reduction.neutral23, label %omp.reduction.neutral24
124+
125+
! CHECK: omp.reduction.neutral24: ; preds = %omp.reduction.neutral22
126+
! [malloc and assign the default value to the reduction variable]
127+
! CHECK: br label %omp.reduction.neutral25
128+
129+
! CHECK: omp.reduction.neutral25: ; preds = %omp.reduction.neutral23, %omp.reduction.neutral24
130+
! CHECK-NEXT: br label %omp.region.cont21
131+
132+
! CHECK: omp.region.cont21: ; preds = %omp.reduction.neutral25
133+
! CHECK-NEXT: %{{.*}} = phi ptr
134+
! CHECK-NEXT: br label %omp.par.region
135+
136+
! CHECK: omp.par.region: ; preds = %omp.region.cont21
137+
! CHECK-NEXT: br label %omp.par.region27
138+
139+
! CHECK: omp.par.region27: ; preds = %omp.par.region
140+
! [call SUM runtime function]
141+
! [if (sum(a) == 1)]
142+
! CHECK: br i1 %{{.*}}, label %omp.par.region28, label %omp.par.region29
143+
144+
! CHECK: omp.par.region29: ; preds = %omp.par.region27
145+
! CHECK-NEXT: br label %omp.region.cont26
146+
147+
! CHECK: omp.region.cont26: ; preds = %omp.par.region28, %omp.par.region29
148+
! [omp parallel region done, call into the runtime to complete reduction]
149+
! CHECK: %[[VAL_233:.*]] = call i32 @__kmpc_reduce(
150+
! CHECK: switch i32 %[[VAL_233]], label %reduce.finalize [
151+
! CHECK-NEXT: i32 1, label %reduce.switch.nonatomic
152+
! CHECK-NEXT: i32 2, label %reduce.switch.atomic
153+
! CHECK-NEXT: ]
154+
155+
! CHECK: reduce.switch.atomic: ; preds = %omp.region.cont26
156+
! CHECK-NEXT: unreachable
157+
158+
! CHECK: reduce.switch.nonatomic: ; preds = %omp.region.cont26
159+
! CHECK-NEXT: %[[red_private_value_0:.*]] = load ptr, ptr %{{.*}}, align 8
160+
! CHECK-NEXT: br label %omp.reduction.nonatomic.body
161+
162+
! [various blocks implementing the reduction]
163+
164+
! CHECK: omp.region.cont35: ; preds =
165+
! CHECK-NEXT: %{{.*}} = phi ptr
166+
! CHECK-NEXT: call void @__kmpc_end_reduce(
167+
! CHECK-NEXT: br label %reduce.finalize
168+
169+
! CHECK: reduce.finalize: ; preds =
170+
! CHECK-NEXT: br label %omp.par.pre_finalize
171+
172+
! CHECK: omp.par.pre_finalize: ; preds = %reduce.finalize
173+
! CHECK-NEXT: %{{.*}} = load ptr, ptr
174+
! CHECK-NEXT: br label %omp.reduction.cleanup
175+
176+
! CHECK: omp.reduction.cleanup: ; preds = %omp.par.pre_finalize
177+
! [null check]
178+
! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup41, label %omp.reduction.cleanup42
179+
180+
! CHECK: omp.reduction.cleanup42: ; preds = %omp.reduction.cleanup41, %omp.reduction.cleanup
181+
! CHECK-NEXT: br label %omp.region.cont40
182+
183+
! CHECK: omp.region.cont40: ; preds = %omp.reduction.cleanup42
184+
! CHECK-NEXT: %{{.*}} = load ptr, ptr
185+
! CHECK-NEXT: br label %omp.reduction.cleanup44
186+
187+
! CHECK: omp.reduction.cleanup44: ; preds = %omp.region.cont40
188+
! [null check]
189+
! CHECK: br i1 %{{.*}}, label %omp.reduction.cleanup45, label %omp.reduction.cleanup46
190+
191+
! CHECK: omp.reduction.cleanup46: ; preds = %omp.reduction.cleanup45, %omp.reduction.cleanup44
192+
! CHECK-NEXT: br label %omp.region.cont43
193+
194+
! CHECK: omp.region.cont43: ; preds = %omp.reduction.cleanup46
195+
! CHECK-NEXT: br label %omp.private.dealloc
196+
197+
! CHECK: omp.private.dealloc: ; preds = %omp.region.cont43
198+
! [null check]
199+
! CHECK: br i1 %{{.*}}, label %omp.private.dealloc48, label %omp.private.dealloc49
200+
201+
! CHECK: omp.private.dealloc49: ; preds = %omp.private.dealloc48, %omp.private.dealloc
202+
! CHECK-NEXT: br label %omp.region.cont47
203+
204+
! CHECK: omp.region.cont47: ; preds = %omp.private.dealloc49
205+
! CHECK-NEXT: br label %omp.private.dealloc51
206+
207+
! CHECK: omp.private.dealloc51: ; preds = %omp.region.cont47
208+
! [null check]
209+
! CHECK: br i1 %{{.*}}, label %omp.private.dealloc52, label %omp.private.dealloc53
210+
211+
! CHECK: omp.private.dealloc53: ; preds = %omp.private.dealloc52, %omp.private.dealloc51
212+
! CHECK-NEXT: br label %omp.region.cont50
213+
214+
! CHECK: omp.region.cont50: ; preds = %omp.private.dealloc53
215+
! CHECK-NEXT: br label %omp.par.outlined.exit.exitStub
216+
217+
! CHECK: omp.private.dealloc52: ; preds = %omp.private.dealloc51
218+
! [dealloc memory]
219+
! CHECK: br label %omp.private.dealloc53
220+
221+
! CHECK: omp.private.dealloc48: ; preds = %omp.private.dealloc
222+
! [dealloc memory]
223+
! CHECK: br label %omp.private.dealloc49
224+
225+
! CHECK: omp.reduction.cleanup45: ; preds = %omp.reduction.cleanup44
226+
! CHECK-NEXT: call void @free(
227+
! CHECK-NEXT: br label %omp.reduction.cleanup46
228+
229+
! CHECK: omp.reduction.cleanup41: ; preds = %omp.reduction.cleanup
230+
! CHECK-NEXT: call void @free(
231+
! CHECK-NEXT: br label %omp.reduction.cleanup42
232+
233+
! CHECK: omp.par.region28: ; preds = %omp.par.region27
234+
! CHECK-NEXT: call {} @_FortranAStopStatement
235+
236+
! CHECK: omp.reduction.neutral23: ; preds = %omp.reduction.neutral22
237+
! [source length was zero: finish initializing array]
238+
! CHECK: br label %omp.reduction.neutral25
239+
240+
! CHECK: omp.reduction.neutral18: ; preds = %omp.reduction.neutral
241+
! [source length was zero: finish initializing array]
242+
! CHECK: br label %omp.reduction.neutral20
243+
244+
! CHECK: omp.private.copy15: ; preds = %omp.private.copy14
245+
! [source length was non-zero: call assign runtime]
246+
! CHECK: br label %omp.private.copy16
247+
248+
! CHECK: omp.private.copy11: ; preds = %omp.private.copy10
249+
! [source length was non-zero: call assign runtime]
250+
! CHECK: br label %omp.private.copy12
251+
252+
! CHECK: omp.private.alloc1: ; preds = %omp.private.alloc
253+
! [var extent was non-zero: malloc a private array]
254+
! CHECK: br label %omp.private.alloc3
255+
256+
! CHECK: omp.private.alloc6: ; preds = %omp.private.alloc5
257+
! [var extent was non-zero: malloc a private array]
258+
! CHECK: br label %omp.private.alloc8
259+
260+
! CHECK: omp.par.outlined.exit.exitStub: ; preds = %omp.region.cont50
261+
! CHECK-NEXT: ret void
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
!RUN: %flang_fc1 -emit-llvm -fopenmp %s -o - | FileCheck %s
2+
3+
! Regression test for https://github.com/llvm/llvm-project/issues/106297
4+
5+
program bug
6+
implicit none
7+
integer :: table(10)
8+
!$OMP PARALLEL PRIVATE(table)
9+
table = 50
10+
if (any(table/=50)) then
11+
stop 'fail 3'
12+
end if
13+
!$OMP END PARALLEL
14+
print *,'ok'
15+
End Program
16+
17+
18+
! CHECK-LABEL: define internal void {{.*}}..omp_par(
19+
! CHECK: omp.par.entry:
20+
! CHECK: %[[VAL_9:.*]] = alloca i32, align 4
21+
! CHECK: %[[VAL_10:.*]] = load i32, ptr %[[VAL_11:.*]], align 4
22+
! CHECK: store i32 %[[VAL_10]], ptr %[[VAL_9]], align 4
23+
! CHECK: %[[VAL_12:.*]] = load i32, ptr %[[VAL_9]], align 4
24+
! CHECK: %[[PRIV_TABLE:.*]] = alloca [10 x i32], i64 1, align 4
25+
! ...
26+
! check that we use the private copy of table for the assignment
27+
! CHECK: omp.par.region1:
28+
! CHECK: %[[ELEMENTAL_TMP:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
29+
! CHECK: %[[TABLE_BOX_ADDR:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
30+
! CHECK: %[[BOXED_FIFTY:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }, align 8
31+
! CHECK: %[[TABLE_BOX_ADDR2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, i64 1, align 8
32+
! CHECK: %[[TABLE_BOX_VAL:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } { ptr undef, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64), i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]] {{\[\[}}3 x i64] [i64 1, i64 10, i64 ptrtoint (ptr getelementptr (i32, ptr null, i32 1) to i64)]] }, ptr %[[PRIV_TABLE]], 0
33+
! CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[TABLE_BOX_VAL]], ptr %[[TABLE_BOX_ADDR]], align 8
34+
! CHECK: %[[TABLE_BOX_VAL2:.*]] = load { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, ptr %[[TABLE_BOX_ADDR]], align 8
35+
! CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[TABLE_BOX_VAL2]], ptr %[[TABLE_BOX_ADDR2]], align 8
36+
! CHECK: %[[VAL_26:.*]] = call {} @_FortranAAssign(ptr %[[TABLE_BOX_ADDR2]], ptr %[[BOXED_FIFTY]], ptr @{{.*}}, i32 9)
37+
! ...
38+
! check that we use the private copy of table for table/=50
39+
! CHECK: omp.par.region3:
40+
! CHECK: %[[VAL_44:.*]] = sub nsw i64 %{{.*}}, 1
41+
! CHECK: %[[VAL_45:.*]] = mul nsw i64 %[[VAL_44]], 1
42+
! CHECK: %[[VAL_46:.*]] = mul nsw i64 %[[VAL_45]], 1
43+
! CHECK: %[[VAL_47:.*]] = add nsw i64 %[[VAL_46]], 0
44+
! CHECK: %[[VAL_48:.*]] = getelementptr i32, ptr %[[PRIV_TABLE]], i64 %[[VAL_47]]
45+
! CHECK: %[[VAL_49:.*]] = load i32, ptr %[[VAL_48]], align 4
46+
! CHECK: %[[VAL_50:.*]] = icmp ne i32 %[[VAL_49]], 50

0 commit comments

Comments
 (0)