Skip to content

Commit 5ba0287

Browse files
Merge branch 'main' into main
2 parents 968d110 + de6c909 commit 5ba0287

File tree

14 files changed

+468
-16
lines changed

14 files changed

+468
-16
lines changed

flang/docs/DoConcurrentConversionToOpenMP.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside the parallel/target
202202
OpenMP region similar to what the single-range example in previous section
203203
shows.
204204

205+
### Data environment
206+
207+
By default, variables that are used inside a `do concurrent` loop nest are
208+
either treated as `shared` in case of mapping to `host`, or mapped into the
209+
`target` region using a `map` clause in case of mapping to `device`. The only
210+
exceptions to this are:
211+
1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that
212+
case, for each IV, we allocate a local copy as shown by the mapping
213+
examples above.
214+
1. any values that are from allocations outside the loop nest and used
215+
exclusively inside of it. In such cases, a local privatized
216+
copy is created in the OpenMP region to prevent multiple teams of threads
217+
from accessing and destroying the same memory block, which causes runtime
218+
issues. For an example of such cases, see
219+
`flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`.
220+
221+
Implicit mapping detection (for mapping to the target device) is still quite
222+
limited and work to make it smarter is underway for both OpenMP in general
223+
and `do concurrent` mapping.
224+
225+
#### Non-perfectly-nested loops' IVs
226+
227+
For non-perfectly-nested loops, the IVs are still treated as `shared` or
228+
`map` entries as pointed out above. This **might not** be consistent with what
229+
the Fortran specification tells us. In particular, taking the following
230+
snippets from the spec (version 2023) into account:
231+
232+
> § 3.35
233+
> ------
234+
> construct entity
235+
> entity whose identifier has the scope of a construct
236+
237+
> § 19.4
238+
> ------
239+
> A variable that appears as an index-name in a FORALL or DO CONCURRENT
240+
> construct [...] is a construct entity. A variable that has LOCAL or
241+
> LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
242+
> [...]
243+
> The name of a variable that appears as an index-name in a DO CONCURRENT
244+
> construct, FORALL statement, or FORALL construct has a scope of the statement
245+
> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
246+
> CONCURRENT construct has the scope of that construct.
247+
248+
From the above quotes, it seems there is an equivalence between the IV of a `do
249+
concurrent` loop and a variable with a `LOCAL` locality specifier (equivalent
250+
to OpenMP's `private` clause). Which means that we should probably
251+
localize/privatize a `do concurrent` loop's IV even if it is not perfectly
252+
nested in the nest we are parallelizing. For now, however, we **do not** do
253+
that as pointed out previously. In the near future, we propose a middle-ground
254+
solution (see the Next steps section for more details).
255+
205256
<!--
206257
More details about current status will be added along with relevant parts of the
207258
implementation in later upstreaming patches.

flang/lib/Evaluate/intrinsics-library.cpp

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -559,6 +559,23 @@ struct HostRuntimeLibrary<long double, LibraryVersion::LibmExtensions> {
559559
#endif // HAS_FLOAT80 || HAS_LDBL128
560560
#endif //_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600
561561

562+
#ifdef _WIN32
563+
template <> struct HostRuntimeLibrary<double, LibraryVersion::LibmExtensions> {
564+
using F = FuncPointer<double, double>;
565+
using FN = FuncPointer<double, int, double>;
566+
static constexpr HostRuntimeFunction table[]{
567+
FolderFactory<F, F{::_j0}>::Create("bessel_j0"),
568+
FolderFactory<F, F{::_j1}>::Create("bessel_j1"),
569+
FolderFactory<FN, FN{::_jn}>::Create("bessel_jn"),
570+
FolderFactory<F, F{::_y0}>::Create("bessel_y0"),
571+
FolderFactory<F, F{::_y1}>::Create("bessel_y1"),
572+
FolderFactory<FN, FN{::_yn}>::Create("bessel_yn"),
573+
};
574+
static constexpr HostRuntimeMap map{table};
575+
static_assert(map.Verify(), "map must be sorted");
576+
};
577+
#endif
578+
562579
/// Define pgmath description
563580
#if LINK_WITH_LIBPGMATH
564581
// Only use libpgmath for folding if it is available.

flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,6 +313,64 @@ void sinkLoopIVArgs(mlir::ConversionPatternRewriter &rewriter,
313313
++idx;
314314
}
315315
}
316+
317+
/// Collects values that are local to a loop: "loop-local values". A loop-local
318+
/// value is one that is used exclusively inside the loop but allocated outside
319+
/// of it. This usually corresponds to temporary values that are used inside the
320+
/// loop body for initialzing other variables for example.
321+
///
322+
/// See `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90` for an
323+
/// example of why we need this.
324+
///
325+
/// \param [in] doLoop - the loop within which the function searches for values
326+
/// used exclusively inside.
327+
///
328+
/// \param [out] locals - the list of loop-local values detected for \p doLoop.
329+
void collectLoopLocalValues(fir::DoLoopOp doLoop,
330+
llvm::SetVector<mlir::Value> &locals) {
331+
doLoop.walk([&](mlir::Operation *op) {
332+
for (mlir::Value operand : op->getOperands()) {
333+
if (locals.contains(operand))
334+
continue;
335+
336+
bool isLocal = true;
337+
338+
if (!mlir::isa_and_present<fir::AllocaOp>(operand.getDefiningOp()))
339+
continue;
340+
341+
// Values defined inside the loop are not interesting since they do not
342+
// need to be localized.
343+
if (doLoop->isAncestor(operand.getDefiningOp()))
344+
continue;
345+
346+
for (auto *user : operand.getUsers()) {
347+
if (!doLoop->isAncestor(user)) {
348+
isLocal = false;
349+
break;
350+
}
351+
}
352+
353+
if (isLocal)
354+
locals.insert(operand);
355+
}
356+
});
357+
}
358+
359+
/// For a "loop-local" value \p local within a loop's scope, localizes that
360+
/// value within the scope of the parallel region the loop maps to. Towards that
361+
/// end, this function moves the allocation of \p local within \p allocRegion.
362+
///
363+
/// \param local - the value used exclusively within a loop's scope (see
364+
/// collectLoopLocalValues).
365+
///
366+
/// \param allocRegion - the parallel region where \p local's allocation will be
367+
/// privatized.
368+
///
369+
/// \param rewriter - builder used for updating \p allocRegion.
370+
static void localizeLoopLocalValue(mlir::Value local, mlir::Region &allocRegion,
371+
mlir::ConversionPatternRewriter &rewriter) {
372+
rewriter.moveOpBefore(local.getDefiningOp(), &allocRegion.front().front());
373+
}
316374
} // namespace looputils
317375

318376
class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
@@ -339,13 +397,21 @@ class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
339397
"Some `do concurent` loops are not perfectly-nested. "
340398
"These will be serialized.");
341399

400+
llvm::SetVector<mlir::Value> locals;
401+
looputils::collectLoopLocalValues(loopNest.back().first, locals);
342402
looputils::sinkLoopIVArgs(rewriter, loopNest);
403+
343404
mlir::IRMapping mapper;
344-
genParallelOp(doLoop.getLoc(), rewriter, loopNest, mapper);
405+
mlir::omp::ParallelOp parallelOp =
406+
genParallelOp(doLoop.getLoc(), rewriter, loopNest, mapper);
345407
mlir::omp::LoopNestOperands loopNestClauseOps;
346408
genLoopNestClauseOps(doLoop.getLoc(), rewriter, loopNest, mapper,
347409
loopNestClauseOps);
348410

411+
for (mlir::Value local : locals)
412+
looputils::localizeLoopLocalValue(local, parallelOp.getRegion(),
413+
rewriter);
414+
349415
mlir::omp::LoopNestOp ompLoopNest =
350416
genWsLoopOp(rewriter, loopNest.back().first, mapper, loopNestClauseOps,
351417
/*isComposite=*/mapToDevice);
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
! Tests that "loop-local values" are properly handled by localizing them to the
2+
! body of the loop nest. See `collectLoopLocalValues` and `localizeLoopLocalValue`
3+
! for a definition of "loop-local values" and how they are handled.
4+
5+
! RUN: %flang_fc1 -emit-hlfir -fopenmp -fdo-concurrent-to-openmp=host %s -o - \
6+
! RUN: | FileCheck %s
7+
module struct_mod
8+
type test_struct
9+
integer, allocatable :: x_
10+
end type
11+
12+
interface test_struct
13+
pure module function construct_from_components(x) result(struct)
14+
implicit none
15+
integer, intent(in) :: x
16+
type(test_struct) struct
17+
end function
18+
end interface
19+
end module
20+
21+
submodule(struct_mod) struct_sub
22+
implicit none
23+
24+
contains
25+
module procedure construct_from_components
26+
struct%x_ = x
27+
end procedure
28+
end submodule struct_sub
29+
30+
program main
31+
use struct_mod, only : test_struct
32+
33+
implicit none
34+
type(test_struct), dimension(10) :: a
35+
integer :: i
36+
integer :: total
37+
38+
do concurrent (i=1:10)
39+
a(i) = test_struct(i)
40+
end do
41+
42+
do i=1,10
43+
total = total + a(i)%x_
44+
end do
45+
46+
print *, "total =", total
47+
end program main
48+
49+
! CHECK: omp.parallel {
50+
! CHECK: %[[LOCAL_TEMP:.*]] = fir.alloca !fir.type<_QMstruct_modTtest_struct{x_:!fir.box<!fir.heap<i32>>}> {bindc_name = ".result"}
51+
! CHECK: omp.wsloop {
52+
! CHECK: omp.loop_nest {{.*}} {
53+
! CHECK: %[[TEMP_VAL:.*]] = fir.call @_QMstruct_modPconstruct_from_components
54+
! CHECK: fir.save_result %[[TEMP_VAL]] to %[[LOCAL_TEMP]]
55+
! CHECK: %[[EMBOXED_LOCAL:.*]] = fir.embox %[[LOCAL_TEMP]]
56+
! CHECK: %[[CONVERTED_LOCAL:.*]] = fir.convert %[[EMBOXED_LOCAL]]
57+
! CHECK: fir.call @_FortranADestroy(%[[CONVERTED_LOCAL]])
58+
! CHECK: omp.yield
59+
! CHECK: }
60+
! CHECK: }
61+
! CHECK: omp.terminator
62+
! CHECK: }

libcxx/.clang-format

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ AttributeMacros: [
3030
'_LIBCPP_DEPRECATED_IN_CXX20',
3131
'_LIBCPP_DEPRECATED_IN_CXX23',
3232
'_LIBCPP_DEPRECATED',
33-
'_LIBCPP_DISABLE_EXTENSION_WARNING',
3433
'_LIBCPP_EXCLUDE_FROM_EXPLICIT_INSTANTIATION',
3534
'_LIBCPP_EXPORTED_FROM_ABI',
3635
'_LIBCPP_EXTERN_TEMPLATE_TYPE_VIS',

libcxx/include/__atomic/support/c11.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ struct __cxx_atomic_base_impl {
3535
}
3636
#endif // _LIBCPP_CXX03_LANG
3737
_LIBCPP_CONSTEXPR explicit __cxx_atomic_base_impl(_Tp __value) _NOEXCEPT : __a_value(__value) {}
38-
_LIBCPP_DISABLE_EXTENSION_WARNING _Atomic(_Tp) __a_value;
38+
_Atomic(_Tp) __a_value;
3939
};
4040

4141
#define __cxx_atomic_is_lock_free(__s) __c11_atomic_is_lock_free(__s)

libcxx/include/__config

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -352,8 +352,6 @@ typedef __char32_t char32_t;
352352

353353
# define _LIBCPP_ALWAYS_INLINE __attribute__((__always_inline__))
354354

355-
# define _LIBCPP_DISABLE_EXTENSION_WARNING __extension__
356-
357355
# if defined(_LIBCPP_OBJECT_FORMAT_COFF)
358356

359357
# ifdef _DLL

mlir/docs/Dialects/SPIR-V.md

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -528,7 +528,7 @@ MLIR system.
528528
We introduce a `spirv.mlir.selection` and `spirv.mlir.loop` op for structured selections and
529529
loops, respectively. The merge targets are the next ops following them. Inside
530530
their regions, a special terminator, `spirv.mlir.merge` is introduced for branching to
531-
the merge target.
531+
the merge target and yielding values.
532532

533533
### Selection
534534

@@ -603,7 +603,43 @@ func.func @selection(%cond: i1) -> () {
603603
604604
// ...
605605
}
606+
```
607+
608+
The selection can return values by yielding them with `spirv.mlir.merge`. This
609+
mechanism allows values defined within the selection region to be used outside of it.
610+
Without this, values that were sunk into the selection region, but used outside, would
611+
not be able to escape it.
612+
613+
For example
614+
615+
```mlir
616+
func.func @selection(%cond: i1) -> () {
617+
%zero = spirv.Constant 0: i32
618+
%var1 = spirv.Variable init(%zero) : !spirv.ptr<i32, Function>
619+
%var2 = spirv.Variable init(%zero) : !spirv.ptr<i32, Function>
620+
621+
%yield:2 = spirv.mlir.selection -> i32, i32 {
622+
spirv.BranchConditional %cond, ^then, ^else
606623
624+
^then:
625+
%one = spirv.Constant 1: i32
626+
%three = spirv.Constant 3: i32
627+
spirv.Branch ^merge(%one, %three : i32, i32)
628+
629+
^else:
630+
%two = spirv.Constant 2: i32
631+
%four = spirv.Constant 4 : i32
632+
spirv.Branch ^merge(%two, %four : i32, i32)
633+
634+
^merge(%merged_1_2: i32, %merged_3_4: i32):
635+
spirv.mlir.merge %merged_1_2, %merged_3_4 : i32, i32
636+
}
637+
638+
spirv.Store "Function" %var1, %yield#0 : i32
639+
spirv.Store "Function" %var2, %yield#1 : i32
640+
641+
spirv.Return
642+
}
607643
```
608644

609645
### Loop

mlir/include/mlir/Dialect/SPIRV/IR/SPIRVControlFlowOps.td

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -352,7 +352,7 @@ def SPIRV_LoopOp : SPIRV_Op<"mlir.loop", [InFunctionScope]> {
352352
// -----
353353

354354
def SPIRV_MergeOp : SPIRV_Op<"mlir.merge", [
355-
Pure, Terminator, ParentOneOf<["SelectionOp", "LoopOp"]>]> {
355+
Pure, Terminator, ParentOneOf<["SelectionOp", "LoopOp"]>, ReturnLike]> {
356356
let summary = "A special terminator for merging a structured selection/loop.";
357357

358358
let description = [{
@@ -361,13 +361,23 @@ def SPIRV_MergeOp : SPIRV_Op<"mlir.merge", [
361361
merge point, which is the next op following the `spirv.mlir.selection` or
362362
`spirv.mlir.loop` op. This op does not have a corresponding instruction in the
363363
SPIR-V binary format; it's solely for structural purpose.
364+
365+
The instruction is also used to yield values from inside the selection/loop region
366+
to the outside, as values that were sunk into the region cannot otherwise escape it.
364367
}];
365368

366-
let arguments = (ins);
369+
let arguments = (ins Variadic<AnyType>:$operands);
367370

368371
let results = (outs);
369372

370-
let assemblyFormat = "attr-dict";
373+
let assemblyFormat = "attr-dict ($operands^ `:` type($operands))?";
374+
375+
let builders = [
376+
OpBuilder<(ins),
377+
[{
378+
build($_builder, $_state, ValueRange());
379+
}]>
380+
];
371381

372382
let hasOpcode = 0;
373383

@@ -465,13 +475,17 @@ def SPIRV_SelectionOp : SPIRV_Op<"mlir.selection", [InFunctionScope]> {
465475
header block, and one selection merge. The selection header block should be
466476
the first block. The selection merge block should be the last block.
467477
The merge block should only contain a `spirv.mlir.merge` op.
478+
479+
Values defined inside the selection regions cannot be directly used
480+
outside of them; however, the selection region can yield values. These values are
481+
yielded using a `spirv.mlir.merge` op and returned as a result of the selection op.
468482
}];
469483

470484
let arguments = (ins
471485
SPIRV_SelectionControlAttr:$selection_control
472486
);
473487

474-
let results = (outs);
488+
let results = (outs Variadic<AnyType>:$results);
475489

476490
let regions = (region AnyRegion:$body);
477491

@@ -494,6 +508,13 @@ def SPIRV_SelectionOp : SPIRV_Op<"mlir.selection", [InFunctionScope]> {
494508
OpBuilder &builder);
495509
}];
496510

511+
let builders = [
512+
OpBuilder<(ins "spirv::SelectionControl":$selectionControl),
513+
[{
514+
build($_builder, $_state, TypeRange(), selectionControl);
515+
}]>
516+
];
517+
497518
let hasOpcode = 0;
498519

499520
let autogenSerialization = 0;

0 commit comments

Comments
 (0)