-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[MLIR][OpenMP] Add MLIR Lowering Support for dist_schedule #152736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR][OpenMP] Add MLIR Lowering Support for dist_schedule #152736
Conversation
|
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-flang-openmp Author: Jack Styles (Stylie777) Changes
There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. Add llvm loop metadata Update implementation to support processing in workshare loop. Patch is 36.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152736.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 79f25bb05f20e..4117e112367c6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -458,7 +458,8 @@ def OMP_SCHEDULE_Dynamic : EnumVal<"dynamic", 3, 1> {}
def OMP_SCHEDULE_Guided : EnumVal<"guided", 4, 1> {}
def OMP_SCHEDULE_Auto : EnumVal<"auto", 5, 1> {}
def OMP_SCHEDULE_Runtime : EnumVal<"runtime", 6, 1> {}
-def OMP_SCHEDULE_Default : EnumVal<"default", 7, 0> { let isDefault = 1; }
+def OMP_SCHEDULE_Distribute : EnumVal<"distribute", 7, 1> {}
+def OMP_SCHEDULE_Default : EnumVal<"default", 8, 0> { let isDefault = 1; }
def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
let clangClass = "OMPScheduleClause";
let flangClass = "OmpScheduleClause";
@@ -469,6 +470,7 @@ def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
OMP_SCHEDULE_Guided,
OMP_SCHEDULE_Auto,
OMP_SCHEDULE_Runtime,
+ OMP_SCHEDULE_Distribute,
OMP_SCHEDULE_Default
];
}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index f70659120e1e6..395df392babde 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -1096,11 +1096,13 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after
/// the loop.
/// \param LoopType Type of workshare loop.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- omp::WorksharingLoopType LoopType, bool NeedsBarrier);
+ omp::WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule = false, omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop a statically-scheduled workshare loop with a
/// user-specified chunk size.
@@ -1113,13 +1115,20 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after the
/// loop.
/// \param ChunkSize The user-specified chunk size.
+ /// \param SchedType Optional type of scheduling to be passed to the init function.
+ /// \param DistScheduleChunkSize The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticChunkedWorkshareLoop(DebugLoc DL,
CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
bool NeedsBarrier,
- Value *ChunkSize);
+ Value *ChunkSize,
+ omp::OMPScheduleType SchedType = omp::OMPScheduleType::UnorderedStaticChunked,
+ Value *DistScheduleChunkSize = nullptr,
+ omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop to be a dynamically-scheduled workshare loop.
///
@@ -1139,6 +1148,8 @@ class OpenMPIRBuilder {
/// the loop.
/// \param Chunk The size of loop chunk considered as a unit when
/// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleChunk The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyDynamicWorkshareLoop(DebugLoc DL,
@@ -1146,7 +1157,8 @@ class OpenMPIRBuilder {
InsertPointTy AllocaIP,
omp::OMPScheduleType SchedType,
bool NeedsBarrier,
- Value *Chunk = nullptr);
+ Value *Chunk = nullptr,
+ Value *DistScheduleChunk = nullptr);
/// Create alternative version of the loop to support if clause
///
@@ -1197,6 +1209,9 @@ class OpenMPIRBuilder {
/// present.
/// \param LoopType Information about type of loop worksharing.
/// It corresponds to type of loop workshare OpenMP pragma.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ ///
+ /// \param ChunkSize The chunk size for dist_schedule loop
///
/// \returns Point where to insert code after the workshare construct.
LLVM_ABI InsertPointOrErrorTy applyWorkshareLoop(
@@ -1207,7 +1222,9 @@ class OpenMPIRBuilder {
bool HasMonotonicModifier = false, bool HasNonmonotonicModifier = false,
bool HasOrderedClause = false,
omp::WorksharingLoopType LoopType =
- omp::WorksharingLoopType::ForStaticLoop);
+ omp::WorksharingLoopType::ForStaticLoop,
+ bool HasDistSchedule = false,
+ Value* DistScheduleChunkSize = nullptr);
/// Tile a loop nest.
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index ea027e48fa2f1..18da0d772912f 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -136,6 +136,8 @@ static bool isValidWorkshareLoopScheduleType(OMPScheduleType SchedType) {
case OMPScheduleType::NomergeOrderedRuntime:
case OMPScheduleType::NomergeOrderedAuto:
case OMPScheduleType::NomergeOrderedTrapezoidal:
+ case OMPScheduleType::OrderedDistributeChunked:
+ case OMPScheduleType::OrderedDistribute:
break;
default:
return false;
@@ -170,7 +172,7 @@ static const omp::GV &getGridValue(const Triple &T, Function *Kernel) {
/// arguments.
static OMPScheduleType
getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
- bool HasSimdModifier) {
+ bool HasSimdModifier, bool HasDistScheduleChunks) {
// Currently, the default schedule it static.
switch (ClauseKind) {
case OMP_SCHEDULE_Default:
@@ -187,6 +189,9 @@ getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
case OMP_SCHEDULE_Runtime:
return HasSimdModifier ? OMPScheduleType::BaseRuntimeSimd
: OMPScheduleType::BaseRuntime;
+ case OMP_SCHEDULE_Distribute:
+ return HasDistScheduleChunks ? OMPScheduleType::BaseDistributeChunked
+ : OMPScheduleType::BaseDistribute;
}
llvm_unreachable("unhandled schedule clause argument");
}
@@ -255,9 +260,10 @@ getOpenMPMonotonicityScheduleType(OMPScheduleType ScheduleType,
static OMPScheduleType
computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
bool HasSimdModifier, bool HasMonotonicModifier,
- bool HasNonmonotonicModifier, bool HasOrderedClause) {
- OMPScheduleType BaseSchedule =
- getOpenMPBaseScheduleType(ClauseKind, HasChunks, HasSimdModifier);
+ bool HasNonmonotonicModifier, bool HasOrderedClause,
+ bool HasDistScheduleChunks) {
+ OMPScheduleType BaseSchedule = getOpenMPBaseScheduleType(
+ ClauseKind, HasChunks, HasSimdModifier, HasDistScheduleChunks);
OMPScheduleType OrderedSchedule =
getOpenMPOrderingScheduleType(BaseSchedule, HasOrderedClause);
OMPScheduleType Result = getOpenMPMonotonicityScheduleType(
@@ -4637,7 +4643,8 @@ static FunctionCallee getKmpcForStaticInitForType(Type *Ty, Module &M,
OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- WorksharingLoopType LoopType, bool NeedsBarrier) {
+ WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule,
+ OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!isConflictIP(AllocaIP, CLI->getPreheaderIP()) &&
"Require dedicated allocate IP");
@@ -4693,15 +4700,26 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
// Call the "init" function and update the trip count of the loop with the
// value it produced.
- SmallVector<Value *, 10> Args(
- {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound});
- if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
- Value *PDistUpperBound =
- Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
- Args.push_back(PDistUpperBound);
+ auto BuildInitCall = [LoopType, SrcLoc, ThreadNum, PLastIter, PLowerBound,
+ PUpperBound, IVTy, PStride, One, Zero,
+ StaticInit](Value *SchedulingType, auto &Builder) {
+ SmallVector<Value *, 10> Args({SrcLoc, ThreadNum, SchedulingType, PLastIter,
+ PLowerBound, PUpperBound});
+ if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
+ Value *PDistUpperBound =
+ Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
+ Args.push_back(PDistUpperBound);
+ }
+ Args.append({PStride, One, Zero});
+ Builder.CreateCall(StaticInit, Args);
+ };
+ BuildInitCall(SchedulingType, Builder);
+ if (HasDistSchedule &&
+ LoopType != WorksharingLoopType::DistributeStaticLoop) {
+ Constant *DistScheduleSchedType = ConstantInt::get(
+ I32Type, static_cast<int>(omp::OMPScheduleType::OrderedDistribute));
+ BuildInitCall(DistScheduleSchedType, Builder);
}
- Args.append({PStride, One, Zero});
- Builder.CreateCall(StaticInit, Args);
Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound);
Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound);
Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound);
@@ -4740,14 +4758,42 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
return AfterIP;
}
+static void addAccessGroupMetadata(BasicBlock *Block, MDNode *AccessGroup,
+ LoopInfo &LI);
+static void addLoopMetadata(CanonicalLoopInfo *Loop,
+ArrayRef<Metadata *> Properties);
+
+static void applyParallelAccessesMetadata(CanonicalLoopInfo *CLI, LLVMContext &Ctx, Loop *Loop, LoopInfo &LoopInfo, SmallVector<Metadata *> &LoopMDList) {
+ SmallSet<BasicBlock *, 8> Reachable;
+
+ // Get the basic blocks from the loop in which memref instructions
+ // can be found.
+ // TODO: Generalize getting all blocks inside a CanonicalizeLoopInfo,
+ // preferably without running any passes.
+ for (BasicBlock *Block : Loop->getBlocks()) {
+ if (Block == CLI->getCond() ||
+ Block == CLI->getHeader())
+ continue;
+ Reachable.insert(Block);
+ }
+
+ // Add access group metadata to memory-access instructions.
+ MDNode *AccessGroup = MDNode::getDistinct(Ctx, {});
+ for (BasicBlock *BB : Reachable)
+ addAccessGroupMetadata(BB, AccessGroup, LoopInfo);
+ // TODO: If the loop has existing parallel access metadata, have
+ // to combine two lists.
+ LoopMDList.push_back(MDNode::get(
+ Ctx, {MDString::get(Ctx, "llvm.loop.parallel_accesses"), AccessGroup}));
+}
+
OpenMPIRBuilder::InsertPointOrErrorTy
-OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
- CanonicalLoopInfo *CLI,
- InsertPointTy AllocaIP,
- bool NeedsBarrier,
- Value *ChunkSize) {
+OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(
+ DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
+ bool NeedsBarrier, Value *ChunkSize, OMPScheduleType SchedType,
+ Value *DistScheduleChunkSize, OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
- assert(ChunkSize && "Chunk size is required");
+ assert(ChunkSize || DistScheduleChunkSize && "Chunk size is required");
LLVMContext &Ctx = CLI->getFunction()->getContext();
Value *IV = CLI->getIndVar();
@@ -4761,6 +4807,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *Zero = ConstantInt::get(InternalIVTy, 0);
Constant *One = ConstantInt::get(InternalIVTy, 1);
+ Function *F = CLI->getFunction();
+ FunctionAnalysisManager FAM;
+ FAM.registerPass([]() { return DominatorTreeAnalysis(); });
+ FAM.registerPass([]() { return PassInstrumentationAnalysis(); });
+ LoopAnalysis LIA;
+ LoopInfo &&LI = LIA.run(*F, FAM);
+ Loop *L = LI.getLoopFor(CLI->getHeader());
+ SmallVector<Metadata *> LoopMDList;
+ if (ChunkSize || DistScheduleChunkSize)
+ applyParallelAccessesMetadata(CLI, Ctx, L, LI, LoopMDList);
+ addLoopMetadata(CLI, LoopMDList);
+
// Declare useful OpenMP runtime functions.
FunctionCallee StaticInit =
getKmpcForStaticInitForType(InternalIVTy, M, *this);
@@ -4783,13 +4841,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Builder.SetCurrentDebugLocation(DL);
// TODO: Detect overflow in ubsan or max-out with current tripcount.
- Value *CastedChunkSize =
- Builder.CreateZExtOrTrunc(ChunkSize, InternalIVTy, "chunksize");
+ Value *CastedChunkSize = Builder.CreateZExtOrTrunc(
+ ChunkSize ? ChunkSize : Zero, InternalIVTy, "chunksize");
+ Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc(
+ DistScheduleChunkSize ? DistScheduleChunkSize : Zero, InternalIVTy,
+ "distschedulechunksize");
Value *CastedTripCount =
Builder.CreateZExt(OrigTripCount, InternalIVTy, "tripcount");
- Constant *SchedulingType = ConstantInt::get(
- I32Type, static_cast<int>(OMPScheduleType::UnorderedStaticChunked));
+ Constant *SchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(SchedType));
+ Constant *DistSchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(DistScheduleSchedType));
Builder.CreateStore(Zero, PLowerBound);
Value *OrigUpperBound = Builder.CreateSub(CastedTripCount, One);
Builder.CreateStore(OrigUpperBound, PUpperBound);
@@ -4801,12 +4864,25 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *SrcLocStr = getOrCreateSrcLocStr(DL, SrcLocStrSize);
Value *SrcLoc = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
Value *ThreadNum = getOrCreateThreadID(SrcLoc);
- Builder.CreateCall(StaticInit,
- {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
- /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
- /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
- /*pstride=*/PStride, /*incr=*/One,
- /*chunk=*/CastedChunkSize});
+ auto BuildInitCall =
+ [StaticInit, SrcLoc, ThreadNum, PLastIter, PLowerBound, PUpperBound,
+ PStride, One](Value *SchedulingType, Value *ChunkSize, auto &Builder) {
+ Builder.CreateCall(
+ StaticInit, {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
+ /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
+ /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
+ /*pstride=*/PStride, /*incr=*/One,
+ /*chunk=*/ChunkSize});
+ };
+ BuildInitCall(SchedulingType, CastedChunkSize, Builder);
+ if (DistScheduleSchedType != OMPScheduleType::None &&
+ SchedType != OMPScheduleType::OrderedDistributeChunked &&
+ SchedType != OMPScheduleType::OrderedDistribute) {
+ // We want to emit a second init function call for the dist_schedule clause
+ // to the Distribute construct. This should only be done however if a
+ // Workshare Loop is nested within a Distribute Construct
+ BuildInitCall(DistSchedulingType, CastestDistScheduleChunkSize, Builder);
+ }
// Load values written by the "init" function.
Value *FirstChunkStart =
@@ -5130,31 +5206,47 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyWorkshareLoop(
bool NeedsBarrier, omp::ScheduleKind SchedKind, Value *ChunkSize,
bool HasSimdModifier, bool HasMonotonicModifier,
bool HasNonmonotonicModifier, bool HasOrderedClause,
- WorksharingLoopType LoopType) {
+ WorksharingLoopType LoopType, bool HasDistSchedule,
+ Value *DistScheduleChunkSize) {
if (Config.isTargetDevice())
return applyWorkshareLoopTarget(DL, CLI, AllocaIP, LoopType);
OMPScheduleType EffectiveScheduleType = computeOpenMPScheduleType(
SchedKind, ChunkSize, HasSimdModifier, HasMonotonicModifier,
- HasNonmonotonicModifier, HasOrderedClause);
+ HasNonmonotonicModifier, HasOrderedClause, DistScheduleChunkSize);
bool IsOrdered = (EffectiveScheduleType & OMPScheduleType::ModifierOrdered) ==
OMPScheduleType::ModifierOrdered;
+ OMPScheduleType DistScheduleSchedType = OMPScheduleType::None;
+ if (HasDistSchedule) {
+ DistScheduleSchedType = DistScheduleChunkSize
+ ? OMPScheduleType::OrderedDistributeChunked
+ : OMPScheduleType::OrderedDistribute;
+ }
switch (EffectiveScheduleType & ~OMPScheduleType::ModifierMask) {
case OMPScheduleType::BaseStatic:
- assert(!ChunkSize && "No chunk size with static-chunked schedule");
- if (IsOrdered)
+ case OMPScheduleType::BaseDistribute:
+ assert(!ChunkSize || !DistScheduleChunkSize &&
+ "No chunk size with static-chunked schedule");
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier);
+ if (DistScheduleChunkSize)
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
+ return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier,
+ HasDistSchedule);
case OMPScheduleType::BaseStaticChunked:
- if (IsOrdered)
+ case OMPScheduleType::BaseDistributeChunked:
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticChunkedWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier,
- ChunkSize);
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
case OMPScheduleType::BaseRuntime:
case OMPScheduleType::BaseAuto:
@@ -5230,7 +5322,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
OMPScheduleType SchedType,
- bool NeedsBarrier, Value *Chunk) {
+ bool NeedsBarrier, Value *Chunk,
+ Value *DistScheduleChunk) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!...
[truncated]
|
|
@llvm/pr-subscribers-mlir-llvm Author: Jack Styles (Stylie777) Changes
There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. Add llvm loop metadata Update implementation to support processing in workshare loop. Patch is 36.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152736.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 79f25bb05f20e..4117e112367c6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -458,7 +458,8 @@ def OMP_SCHEDULE_Dynamic : EnumVal<"dynamic", 3, 1> {}
def OMP_SCHEDULE_Guided : EnumVal<"guided", 4, 1> {}
def OMP_SCHEDULE_Auto : EnumVal<"auto", 5, 1> {}
def OMP_SCHEDULE_Runtime : EnumVal<"runtime", 6, 1> {}
-def OMP_SCHEDULE_Default : EnumVal<"default", 7, 0> { let isDefault = 1; }
+def OMP_SCHEDULE_Distribute : EnumVal<"distribute", 7, 1> {}
+def OMP_SCHEDULE_Default : EnumVal<"default", 8, 0> { let isDefault = 1; }
def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
let clangClass = "OMPScheduleClause";
let flangClass = "OmpScheduleClause";
@@ -469,6 +470,7 @@ def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
OMP_SCHEDULE_Guided,
OMP_SCHEDULE_Auto,
OMP_SCHEDULE_Runtime,
+ OMP_SCHEDULE_Distribute,
OMP_SCHEDULE_Default
];
}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index f70659120e1e6..395df392babde 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -1096,11 +1096,13 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after
/// the loop.
/// \param LoopType Type of workshare loop.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- omp::WorksharingLoopType LoopType, bool NeedsBarrier);
+ omp::WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule = false, omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop a statically-scheduled workshare loop with a
/// user-specified chunk size.
@@ -1113,13 +1115,20 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after the
/// loop.
/// \param ChunkSize The user-specified chunk size.
+ /// \param SchedType Optional type of scheduling to be passed to the init function.
+ /// \param DistScheduleChunkSize The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticChunkedWorkshareLoop(DebugLoc DL,
CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
bool NeedsBarrier,
- Value *ChunkSize);
+ Value *ChunkSize,
+ omp::OMPScheduleType SchedType = omp::OMPScheduleType::UnorderedStaticChunked,
+ Value *DistScheduleChunkSize = nullptr,
+ omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop to be a dynamically-scheduled workshare loop.
///
@@ -1139,6 +1148,8 @@ class OpenMPIRBuilder {
/// the loop.
/// \param Chunk The size of loop chunk considered as a unit when
/// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleChunk The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyDynamicWorkshareLoop(DebugLoc DL,
@@ -1146,7 +1157,8 @@ class OpenMPIRBuilder {
InsertPointTy AllocaIP,
omp::OMPScheduleType SchedType,
bool NeedsBarrier,
- Value *Chunk = nullptr);
+ Value *Chunk = nullptr,
+ Value *DistScheduleChunk = nullptr);
/// Create alternative version of the loop to support if clause
///
@@ -1197,6 +1209,9 @@ class OpenMPIRBuilder {
/// present.
/// \param LoopType Information about type of loop worksharing.
/// It corresponds to type of loop workshare OpenMP pragma.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ ///
+ /// \param ChunkSize The chunk size for dist_schedule loop
///
/// \returns Point where to insert code after the workshare construct.
LLVM_ABI InsertPointOrErrorTy applyWorkshareLoop(
@@ -1207,7 +1222,9 @@ class OpenMPIRBuilder {
bool HasMonotonicModifier = false, bool HasNonmonotonicModifier = false,
bool HasOrderedClause = false,
omp::WorksharingLoopType LoopType =
- omp::WorksharingLoopType::ForStaticLoop);
+ omp::WorksharingLoopType::ForStaticLoop,
+ bool HasDistSchedule = false,
+ Value* DistScheduleChunkSize = nullptr);
/// Tile a loop nest.
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index ea027e48fa2f1..18da0d772912f 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -136,6 +136,8 @@ static bool isValidWorkshareLoopScheduleType(OMPScheduleType SchedType) {
case OMPScheduleType::NomergeOrderedRuntime:
case OMPScheduleType::NomergeOrderedAuto:
case OMPScheduleType::NomergeOrderedTrapezoidal:
+ case OMPScheduleType::OrderedDistributeChunked:
+ case OMPScheduleType::OrderedDistribute:
break;
default:
return false;
@@ -170,7 +172,7 @@ static const omp::GV &getGridValue(const Triple &T, Function *Kernel) {
/// arguments.
static OMPScheduleType
getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
- bool HasSimdModifier) {
+ bool HasSimdModifier, bool HasDistScheduleChunks) {
// Currently, the default schedule it static.
switch (ClauseKind) {
case OMP_SCHEDULE_Default:
@@ -187,6 +189,9 @@ getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
case OMP_SCHEDULE_Runtime:
return HasSimdModifier ? OMPScheduleType::BaseRuntimeSimd
: OMPScheduleType::BaseRuntime;
+ case OMP_SCHEDULE_Distribute:
+ return HasDistScheduleChunks ? OMPScheduleType::BaseDistributeChunked
+ : OMPScheduleType::BaseDistribute;
}
llvm_unreachable("unhandled schedule clause argument");
}
@@ -255,9 +260,10 @@ getOpenMPMonotonicityScheduleType(OMPScheduleType ScheduleType,
static OMPScheduleType
computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
bool HasSimdModifier, bool HasMonotonicModifier,
- bool HasNonmonotonicModifier, bool HasOrderedClause) {
- OMPScheduleType BaseSchedule =
- getOpenMPBaseScheduleType(ClauseKind, HasChunks, HasSimdModifier);
+ bool HasNonmonotonicModifier, bool HasOrderedClause,
+ bool HasDistScheduleChunks) {
+ OMPScheduleType BaseSchedule = getOpenMPBaseScheduleType(
+ ClauseKind, HasChunks, HasSimdModifier, HasDistScheduleChunks);
OMPScheduleType OrderedSchedule =
getOpenMPOrderingScheduleType(BaseSchedule, HasOrderedClause);
OMPScheduleType Result = getOpenMPMonotonicityScheduleType(
@@ -4637,7 +4643,8 @@ static FunctionCallee getKmpcForStaticInitForType(Type *Ty, Module &M,
OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- WorksharingLoopType LoopType, bool NeedsBarrier) {
+ WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule,
+ OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!isConflictIP(AllocaIP, CLI->getPreheaderIP()) &&
"Require dedicated allocate IP");
@@ -4693,15 +4700,26 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
// Call the "init" function and update the trip count of the loop with the
// value it produced.
- SmallVector<Value *, 10> Args(
- {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound});
- if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
- Value *PDistUpperBound =
- Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
- Args.push_back(PDistUpperBound);
+ auto BuildInitCall = [LoopType, SrcLoc, ThreadNum, PLastIter, PLowerBound,
+ PUpperBound, IVTy, PStride, One, Zero,
+ StaticInit](Value *SchedulingType, auto &Builder) {
+ SmallVector<Value *, 10> Args({SrcLoc, ThreadNum, SchedulingType, PLastIter,
+ PLowerBound, PUpperBound});
+ if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
+ Value *PDistUpperBound =
+ Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
+ Args.push_back(PDistUpperBound);
+ }
+ Args.append({PStride, One, Zero});
+ Builder.CreateCall(StaticInit, Args);
+ };
+ BuildInitCall(SchedulingType, Builder);
+ if (HasDistSchedule &&
+ LoopType != WorksharingLoopType::DistributeStaticLoop) {
+ Constant *DistScheduleSchedType = ConstantInt::get(
+ I32Type, static_cast<int>(omp::OMPScheduleType::OrderedDistribute));
+ BuildInitCall(DistScheduleSchedType, Builder);
}
- Args.append({PStride, One, Zero});
- Builder.CreateCall(StaticInit, Args);
Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound);
Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound);
Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound);
@@ -4740,14 +4758,42 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
return AfterIP;
}
+static void addAccessGroupMetadata(BasicBlock *Block, MDNode *AccessGroup,
+ LoopInfo &LI);
+static void addLoopMetadata(CanonicalLoopInfo *Loop,
+ArrayRef<Metadata *> Properties);
+
+static void applyParallelAccessesMetadata(CanonicalLoopInfo *CLI, LLVMContext &Ctx, Loop *Loop, LoopInfo &LoopInfo, SmallVector<Metadata *> &LoopMDList) {
+ SmallSet<BasicBlock *, 8> Reachable;
+
+ // Get the basic blocks from the loop in which memref instructions
+ // can be found.
+ // TODO: Generalize getting all blocks inside a CanonicalizeLoopInfo,
+ // preferably without running any passes.
+ for (BasicBlock *Block : Loop->getBlocks()) {
+ if (Block == CLI->getCond() ||
+ Block == CLI->getHeader())
+ continue;
+ Reachable.insert(Block);
+ }
+
+ // Add access group metadata to memory-access instructions.
+ MDNode *AccessGroup = MDNode::getDistinct(Ctx, {});
+ for (BasicBlock *BB : Reachable)
+ addAccessGroupMetadata(BB, AccessGroup, LoopInfo);
+ // TODO: If the loop has existing parallel access metadata, have
+ // to combine two lists.
+ LoopMDList.push_back(MDNode::get(
+ Ctx, {MDString::get(Ctx, "llvm.loop.parallel_accesses"), AccessGroup}));
+}
+
OpenMPIRBuilder::InsertPointOrErrorTy
-OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
- CanonicalLoopInfo *CLI,
- InsertPointTy AllocaIP,
- bool NeedsBarrier,
- Value *ChunkSize) {
+OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(
+ DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
+ bool NeedsBarrier, Value *ChunkSize, OMPScheduleType SchedType,
+ Value *DistScheduleChunkSize, OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
- assert(ChunkSize && "Chunk size is required");
+ assert(ChunkSize || DistScheduleChunkSize && "Chunk size is required");
LLVMContext &Ctx = CLI->getFunction()->getContext();
Value *IV = CLI->getIndVar();
@@ -4761,6 +4807,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *Zero = ConstantInt::get(InternalIVTy, 0);
Constant *One = ConstantInt::get(InternalIVTy, 1);
+ Function *F = CLI->getFunction();
+ FunctionAnalysisManager FAM;
+ FAM.registerPass([]() { return DominatorTreeAnalysis(); });
+ FAM.registerPass([]() { return PassInstrumentationAnalysis(); });
+ LoopAnalysis LIA;
+ LoopInfo &&LI = LIA.run(*F, FAM);
+ Loop *L = LI.getLoopFor(CLI->getHeader());
+ SmallVector<Metadata *> LoopMDList;
+ if (ChunkSize || DistScheduleChunkSize)
+ applyParallelAccessesMetadata(CLI, Ctx, L, LI, LoopMDList);
+ addLoopMetadata(CLI, LoopMDList);
+
// Declare useful OpenMP runtime functions.
FunctionCallee StaticInit =
getKmpcForStaticInitForType(InternalIVTy, M, *this);
@@ -4783,13 +4841,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Builder.SetCurrentDebugLocation(DL);
// TODO: Detect overflow in ubsan or max-out with current tripcount.
- Value *CastedChunkSize =
- Builder.CreateZExtOrTrunc(ChunkSize, InternalIVTy, "chunksize");
+ Value *CastedChunkSize = Builder.CreateZExtOrTrunc(
+ ChunkSize ? ChunkSize : Zero, InternalIVTy, "chunksize");
+ Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc(
+ DistScheduleChunkSize ? DistScheduleChunkSize : Zero, InternalIVTy,
+ "distschedulechunksize");
Value *CastedTripCount =
Builder.CreateZExt(OrigTripCount, InternalIVTy, "tripcount");
- Constant *SchedulingType = ConstantInt::get(
- I32Type, static_cast<int>(OMPScheduleType::UnorderedStaticChunked));
+ Constant *SchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(SchedType));
+ Constant *DistSchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(DistScheduleSchedType));
Builder.CreateStore(Zero, PLowerBound);
Value *OrigUpperBound = Builder.CreateSub(CastedTripCount, One);
Builder.CreateStore(OrigUpperBound, PUpperBound);
@@ -4801,12 +4864,25 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *SrcLocStr = getOrCreateSrcLocStr(DL, SrcLocStrSize);
Value *SrcLoc = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
Value *ThreadNum = getOrCreateThreadID(SrcLoc);
- Builder.CreateCall(StaticInit,
- {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
- /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
- /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
- /*pstride=*/PStride, /*incr=*/One,
- /*chunk=*/CastedChunkSize});
+ auto BuildInitCall =
+ [StaticInit, SrcLoc, ThreadNum, PLastIter, PLowerBound, PUpperBound,
+ PStride, One](Value *SchedulingType, Value *ChunkSize, auto &Builder) {
+ Builder.CreateCall(
+ StaticInit, {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
+ /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
+ /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
+ /*pstride=*/PStride, /*incr=*/One,
+ /*chunk=*/ChunkSize});
+ };
+ BuildInitCall(SchedulingType, CastedChunkSize, Builder);
+ if (DistScheduleSchedType != OMPScheduleType::None &&
+ SchedType != OMPScheduleType::OrderedDistributeChunked &&
+ SchedType != OMPScheduleType::OrderedDistribute) {
+ // We want to emit a second init function call for the dist_schedule clause
+ // to the Distribute construct. This should only be done however if a
+ // Workshare Loop is nested within a Distribute Construct
+ BuildInitCall(DistSchedulingType, CastestDistScheduleChunkSize, Builder);
+ }
// Load values written by the "init" function.
Value *FirstChunkStart =
@@ -5130,31 +5206,47 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyWorkshareLoop(
bool NeedsBarrier, omp::ScheduleKind SchedKind, Value *ChunkSize,
bool HasSimdModifier, bool HasMonotonicModifier,
bool HasNonmonotonicModifier, bool HasOrderedClause,
- WorksharingLoopType LoopType) {
+ WorksharingLoopType LoopType, bool HasDistSchedule,
+ Value *DistScheduleChunkSize) {
if (Config.isTargetDevice())
return applyWorkshareLoopTarget(DL, CLI, AllocaIP, LoopType);
OMPScheduleType EffectiveScheduleType = computeOpenMPScheduleType(
SchedKind, ChunkSize, HasSimdModifier, HasMonotonicModifier,
- HasNonmonotonicModifier, HasOrderedClause);
+ HasNonmonotonicModifier, HasOrderedClause, DistScheduleChunkSize);
bool IsOrdered = (EffectiveScheduleType & OMPScheduleType::ModifierOrdered) ==
OMPScheduleType::ModifierOrdered;
+ OMPScheduleType DistScheduleSchedType = OMPScheduleType::None;
+ if (HasDistSchedule) {
+ DistScheduleSchedType = DistScheduleChunkSize
+ ? OMPScheduleType::OrderedDistributeChunked
+ : OMPScheduleType::OrderedDistribute;
+ }
switch (EffectiveScheduleType & ~OMPScheduleType::ModifierMask) {
case OMPScheduleType::BaseStatic:
- assert(!ChunkSize && "No chunk size with static-chunked schedule");
- if (IsOrdered)
+ case OMPScheduleType::BaseDistribute:
+ assert(!ChunkSize || !DistScheduleChunkSize &&
+ "No chunk size with static-chunked schedule");
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier);
+ if (DistScheduleChunkSize)
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
+ return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier,
+ HasDistSchedule);
case OMPScheduleType::BaseStaticChunked:
- if (IsOrdered)
+ case OMPScheduleType::BaseDistributeChunked:
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticChunkedWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier,
- ChunkSize);
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
case OMPScheduleType::BaseRuntime:
case OMPScheduleType::BaseAuto:
@@ -5230,7 +5322,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
OMPScheduleType SchedType,
- bool NeedsBarrier, Value *Chunk) {
+ bool NeedsBarrier, Value *Chunk,
+ Value *DistScheduleChunk) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!...
[truncated]
|
|
@llvm/pr-subscribers-mlir-openmp Author: Jack Styles (Stylie777) Changes
There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. Add llvm loop metadata Update implementation to support processing in workshare loop. Patch is 36.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152736.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 79f25bb05f20e..4117e112367c6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -458,7 +458,8 @@ def OMP_SCHEDULE_Dynamic : EnumVal<"dynamic", 3, 1> {}
def OMP_SCHEDULE_Guided : EnumVal<"guided", 4, 1> {}
def OMP_SCHEDULE_Auto : EnumVal<"auto", 5, 1> {}
def OMP_SCHEDULE_Runtime : EnumVal<"runtime", 6, 1> {}
-def OMP_SCHEDULE_Default : EnumVal<"default", 7, 0> { let isDefault = 1; }
+def OMP_SCHEDULE_Distribute : EnumVal<"distribute", 7, 1> {}
+def OMP_SCHEDULE_Default : EnumVal<"default", 8, 0> { let isDefault = 1; }
def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
let clangClass = "OMPScheduleClause";
let flangClass = "OmpScheduleClause";
@@ -469,6 +470,7 @@ def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
OMP_SCHEDULE_Guided,
OMP_SCHEDULE_Auto,
OMP_SCHEDULE_Runtime,
+ OMP_SCHEDULE_Distribute,
OMP_SCHEDULE_Default
];
}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index f70659120e1e6..395df392babde 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -1096,11 +1096,13 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after
/// the loop.
/// \param LoopType Type of workshare loop.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- omp::WorksharingLoopType LoopType, bool NeedsBarrier);
+ omp::WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule = false, omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop a statically-scheduled workshare loop with a
/// user-specified chunk size.
@@ -1113,13 +1115,20 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after the
/// loop.
/// \param ChunkSize The user-specified chunk size.
+ /// \param SchedType Optional type of scheduling to be passed to the init function.
+ /// \param DistScheduleChunkSize The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticChunkedWorkshareLoop(DebugLoc DL,
CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
bool NeedsBarrier,
- Value *ChunkSize);
+ Value *ChunkSize,
+ omp::OMPScheduleType SchedType = omp::OMPScheduleType::UnorderedStaticChunked,
+ Value *DistScheduleChunkSize = nullptr,
+ omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop to be a dynamically-scheduled workshare loop.
///
@@ -1139,6 +1148,8 @@ class OpenMPIRBuilder {
/// the loop.
/// \param Chunk The size of loop chunk considered as a unit when
/// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleChunk The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyDynamicWorkshareLoop(DebugLoc DL,
@@ -1146,7 +1157,8 @@ class OpenMPIRBuilder {
InsertPointTy AllocaIP,
omp::OMPScheduleType SchedType,
bool NeedsBarrier,
- Value *Chunk = nullptr);
+ Value *Chunk = nullptr,
+ Value *DistScheduleChunk = nullptr);
/// Create alternative version of the loop to support if clause
///
@@ -1197,6 +1209,9 @@ class OpenMPIRBuilder {
/// present.
/// \param LoopType Information about type of loop worksharing.
/// It corresponds to type of loop workshare OpenMP pragma.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ ///
+ /// \param ChunkSize The chunk size for dist_schedule loop
///
/// \returns Point where to insert code after the workshare construct.
LLVM_ABI InsertPointOrErrorTy applyWorkshareLoop(
@@ -1207,7 +1222,9 @@ class OpenMPIRBuilder {
bool HasMonotonicModifier = false, bool HasNonmonotonicModifier = false,
bool HasOrderedClause = false,
omp::WorksharingLoopType LoopType =
- omp::WorksharingLoopType::ForStaticLoop);
+ omp::WorksharingLoopType::ForStaticLoop,
+ bool HasDistSchedule = false,
+ Value* DistScheduleChunkSize = nullptr);
/// Tile a loop nest.
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index ea027e48fa2f1..18da0d772912f 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -136,6 +136,8 @@ static bool isValidWorkshareLoopScheduleType(OMPScheduleType SchedType) {
case OMPScheduleType::NomergeOrderedRuntime:
case OMPScheduleType::NomergeOrderedAuto:
case OMPScheduleType::NomergeOrderedTrapezoidal:
+ case OMPScheduleType::OrderedDistributeChunked:
+ case OMPScheduleType::OrderedDistribute:
break;
default:
return false;
@@ -170,7 +172,7 @@ static const omp::GV &getGridValue(const Triple &T, Function *Kernel) {
/// arguments.
static OMPScheduleType
getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
- bool HasSimdModifier) {
+ bool HasSimdModifier, bool HasDistScheduleChunks) {
// Currently, the default schedule it static.
switch (ClauseKind) {
case OMP_SCHEDULE_Default:
@@ -187,6 +189,9 @@ getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
case OMP_SCHEDULE_Runtime:
return HasSimdModifier ? OMPScheduleType::BaseRuntimeSimd
: OMPScheduleType::BaseRuntime;
+ case OMP_SCHEDULE_Distribute:
+ return HasDistScheduleChunks ? OMPScheduleType::BaseDistributeChunked
+ : OMPScheduleType::BaseDistribute;
}
llvm_unreachable("unhandled schedule clause argument");
}
@@ -255,9 +260,10 @@ getOpenMPMonotonicityScheduleType(OMPScheduleType ScheduleType,
static OMPScheduleType
computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
bool HasSimdModifier, bool HasMonotonicModifier,
- bool HasNonmonotonicModifier, bool HasOrderedClause) {
- OMPScheduleType BaseSchedule =
- getOpenMPBaseScheduleType(ClauseKind, HasChunks, HasSimdModifier);
+ bool HasNonmonotonicModifier, bool HasOrderedClause,
+ bool HasDistScheduleChunks) {
+ OMPScheduleType BaseSchedule = getOpenMPBaseScheduleType(
+ ClauseKind, HasChunks, HasSimdModifier, HasDistScheduleChunks);
OMPScheduleType OrderedSchedule =
getOpenMPOrderingScheduleType(BaseSchedule, HasOrderedClause);
OMPScheduleType Result = getOpenMPMonotonicityScheduleType(
@@ -4637,7 +4643,8 @@ static FunctionCallee getKmpcForStaticInitForType(Type *Ty, Module &M,
OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- WorksharingLoopType LoopType, bool NeedsBarrier) {
+ WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule,
+ OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!isConflictIP(AllocaIP, CLI->getPreheaderIP()) &&
"Require dedicated allocate IP");
@@ -4693,15 +4700,26 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
// Call the "init" function and update the trip count of the loop with the
// value it produced.
- SmallVector<Value *, 10> Args(
- {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound});
- if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
- Value *PDistUpperBound =
- Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
- Args.push_back(PDistUpperBound);
+ auto BuildInitCall = [LoopType, SrcLoc, ThreadNum, PLastIter, PLowerBound,
+ PUpperBound, IVTy, PStride, One, Zero,
+ StaticInit](Value *SchedulingType, auto &Builder) {
+ SmallVector<Value *, 10> Args({SrcLoc, ThreadNum, SchedulingType, PLastIter,
+ PLowerBound, PUpperBound});
+ if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
+ Value *PDistUpperBound =
+ Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
+ Args.push_back(PDistUpperBound);
+ }
+ Args.append({PStride, One, Zero});
+ Builder.CreateCall(StaticInit, Args);
+ };
+ BuildInitCall(SchedulingType, Builder);
+ if (HasDistSchedule &&
+ LoopType != WorksharingLoopType::DistributeStaticLoop) {
+ Constant *DistScheduleSchedType = ConstantInt::get(
+ I32Type, static_cast<int>(omp::OMPScheduleType::OrderedDistribute));
+ BuildInitCall(DistScheduleSchedType, Builder);
}
- Args.append({PStride, One, Zero});
- Builder.CreateCall(StaticInit, Args);
Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound);
Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound);
Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound);
@@ -4740,14 +4758,42 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
return AfterIP;
}
+static void addAccessGroupMetadata(BasicBlock *Block, MDNode *AccessGroup,
+ LoopInfo &LI);
+static void addLoopMetadata(CanonicalLoopInfo *Loop,
+ArrayRef<Metadata *> Properties);
+
+static void applyParallelAccessesMetadata(CanonicalLoopInfo *CLI, LLVMContext &Ctx, Loop *Loop, LoopInfo &LoopInfo, SmallVector<Metadata *> &LoopMDList) {
+ SmallSet<BasicBlock *, 8> Reachable;
+
+ // Get the basic blocks from the loop in which memref instructions
+ // can be found.
+ // TODO: Generalize getting all blocks inside a CanonicalizeLoopInfo,
+ // preferably without running any passes.
+ for (BasicBlock *Block : Loop->getBlocks()) {
+ if (Block == CLI->getCond() ||
+ Block == CLI->getHeader())
+ continue;
+ Reachable.insert(Block);
+ }
+
+ // Add access group metadata to memory-access instructions.
+ MDNode *AccessGroup = MDNode::getDistinct(Ctx, {});
+ for (BasicBlock *BB : Reachable)
+ addAccessGroupMetadata(BB, AccessGroup, LoopInfo);
+ // TODO: If the loop has existing parallel access metadata, have
+ // to combine two lists.
+ LoopMDList.push_back(MDNode::get(
+ Ctx, {MDString::get(Ctx, "llvm.loop.parallel_accesses"), AccessGroup}));
+}
+
OpenMPIRBuilder::InsertPointOrErrorTy
-OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
- CanonicalLoopInfo *CLI,
- InsertPointTy AllocaIP,
- bool NeedsBarrier,
- Value *ChunkSize) {
+OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(
+ DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
+ bool NeedsBarrier, Value *ChunkSize, OMPScheduleType SchedType,
+ Value *DistScheduleChunkSize, OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
- assert(ChunkSize && "Chunk size is required");
+ assert(ChunkSize || DistScheduleChunkSize && "Chunk size is required");
LLVMContext &Ctx = CLI->getFunction()->getContext();
Value *IV = CLI->getIndVar();
@@ -4761,6 +4807,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *Zero = ConstantInt::get(InternalIVTy, 0);
Constant *One = ConstantInt::get(InternalIVTy, 1);
+ Function *F = CLI->getFunction();
+ FunctionAnalysisManager FAM;
+ FAM.registerPass([]() { return DominatorTreeAnalysis(); });
+ FAM.registerPass([]() { return PassInstrumentationAnalysis(); });
+ LoopAnalysis LIA;
+ LoopInfo &&LI = LIA.run(*F, FAM);
+ Loop *L = LI.getLoopFor(CLI->getHeader());
+ SmallVector<Metadata *> LoopMDList;
+ if (ChunkSize || DistScheduleChunkSize)
+ applyParallelAccessesMetadata(CLI, Ctx, L, LI, LoopMDList);
+ addLoopMetadata(CLI, LoopMDList);
+
// Declare useful OpenMP runtime functions.
FunctionCallee StaticInit =
getKmpcForStaticInitForType(InternalIVTy, M, *this);
@@ -4783,13 +4841,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Builder.SetCurrentDebugLocation(DL);
// TODO: Detect overflow in ubsan or max-out with current tripcount.
- Value *CastedChunkSize =
- Builder.CreateZExtOrTrunc(ChunkSize, InternalIVTy, "chunksize");
+ Value *CastedChunkSize = Builder.CreateZExtOrTrunc(
+ ChunkSize ? ChunkSize : Zero, InternalIVTy, "chunksize");
+ Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc(
+ DistScheduleChunkSize ? DistScheduleChunkSize : Zero, InternalIVTy,
+ "distschedulechunksize");
Value *CastedTripCount =
Builder.CreateZExt(OrigTripCount, InternalIVTy, "tripcount");
- Constant *SchedulingType = ConstantInt::get(
- I32Type, static_cast<int>(OMPScheduleType::UnorderedStaticChunked));
+ Constant *SchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(SchedType));
+ Constant *DistSchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(DistScheduleSchedType));
Builder.CreateStore(Zero, PLowerBound);
Value *OrigUpperBound = Builder.CreateSub(CastedTripCount, One);
Builder.CreateStore(OrigUpperBound, PUpperBound);
@@ -4801,12 +4864,25 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *SrcLocStr = getOrCreateSrcLocStr(DL, SrcLocStrSize);
Value *SrcLoc = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
Value *ThreadNum = getOrCreateThreadID(SrcLoc);
- Builder.CreateCall(StaticInit,
- {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
- /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
- /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
- /*pstride=*/PStride, /*incr=*/One,
- /*chunk=*/CastedChunkSize});
+ auto BuildInitCall =
+ [StaticInit, SrcLoc, ThreadNum, PLastIter, PLowerBound, PUpperBound,
+ PStride, One](Value *SchedulingType, Value *ChunkSize, auto &Builder) {
+ Builder.CreateCall(
+ StaticInit, {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
+ /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
+ /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
+ /*pstride=*/PStride, /*incr=*/One,
+ /*chunk=*/ChunkSize});
+ };
+ BuildInitCall(SchedulingType, CastedChunkSize, Builder);
+ if (DistScheduleSchedType != OMPScheduleType::None &&
+ SchedType != OMPScheduleType::OrderedDistributeChunked &&
+ SchedType != OMPScheduleType::OrderedDistribute) {
+ // We want to emit a second init function call for the dist_schedule clause
+ // to the Distribute construct. This should only be done however if a
+ // Workshare Loop is nested within a Distribute Construct
+ BuildInitCall(DistSchedulingType, CastestDistScheduleChunkSize, Builder);
+ }
// Load values written by the "init" function.
Value *FirstChunkStart =
@@ -5130,31 +5206,47 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyWorkshareLoop(
bool NeedsBarrier, omp::ScheduleKind SchedKind, Value *ChunkSize,
bool HasSimdModifier, bool HasMonotonicModifier,
bool HasNonmonotonicModifier, bool HasOrderedClause,
- WorksharingLoopType LoopType) {
+ WorksharingLoopType LoopType, bool HasDistSchedule,
+ Value *DistScheduleChunkSize) {
if (Config.isTargetDevice())
return applyWorkshareLoopTarget(DL, CLI, AllocaIP, LoopType);
OMPScheduleType EffectiveScheduleType = computeOpenMPScheduleType(
SchedKind, ChunkSize, HasSimdModifier, HasMonotonicModifier,
- HasNonmonotonicModifier, HasOrderedClause);
+ HasNonmonotonicModifier, HasOrderedClause, DistScheduleChunkSize);
bool IsOrdered = (EffectiveScheduleType & OMPScheduleType::ModifierOrdered) ==
OMPScheduleType::ModifierOrdered;
+ OMPScheduleType DistScheduleSchedType = OMPScheduleType::None;
+ if (HasDistSchedule) {
+ DistScheduleSchedType = DistScheduleChunkSize
+ ? OMPScheduleType::OrderedDistributeChunked
+ : OMPScheduleType::OrderedDistribute;
+ }
switch (EffectiveScheduleType & ~OMPScheduleType::ModifierMask) {
case OMPScheduleType::BaseStatic:
- assert(!ChunkSize && "No chunk size with static-chunked schedule");
- if (IsOrdered)
+ case OMPScheduleType::BaseDistribute:
+ assert(!ChunkSize || !DistScheduleChunkSize &&
+ "No chunk size with static-chunked schedule");
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier);
+ if (DistScheduleChunkSize)
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
+ return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier,
+ HasDistSchedule);
case OMPScheduleType::BaseStaticChunked:
- if (IsOrdered)
+ case OMPScheduleType::BaseDistributeChunked:
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticChunkedWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier,
- ChunkSize);
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
case OMPScheduleType::BaseRuntime:
case OMPScheduleType::BaseAuto:
@@ -5230,7 +5322,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
OMPScheduleType SchedType,
- bool NeedsBarrier, Value *Chunk) {
+ bool NeedsBarrier, Value *Chunk,
+ Value *DistScheduleChunk) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!...
[truncated]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
464cd87 to
08ed236
Compare
tblah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you update flang/docs/OpenMPSupport.md to show dist_schedule is now supported.
|
I have converted this to a draft for the time being while I investigate some inconsistencies between MLIR generation with this patch and Clang. |
bbcb902 to
96fdc04
Compare
Stylie777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is now ready for review again.
tblah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits
2e1c286 to
039e86e
Compare
|
|
||
| llvm.func @distribute_dist_schedule_chunk_size(%lb : i32, %ub : i32, %step : i32, %x : i32) { | ||
| // CHECK: call void @__kmpc_for_static_init_4u(ptr @1, i32 %omp_global_thread_num, i32 91, ptr %p.lastiter, ptr %p.lowerbound, ptr %p.upperbound, ptr %p.stride, i32 1, i32 1024) | ||
| // CHECK: call void @[[RUNTIME_FUNC:.*]](ptr @1, i32 %omp_global_thread_num, i32 91, ptr %p.lastiter, ptr %p.lowerbound, ptr %p.upperbound, ptr %p.stride, i32 1, i32 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // CHECK: call void @[[RUNTIME_FUNC:.*]](ptr @1, i32 %omp_global_thread_num, i32 91, ptr %p.lastiter, ptr %p.lowerbound, ptr %p.upperbound, ptr %p.stride, i32 1, i32 1024) | |
| // CHECK: call void @[[RUNTIME_FUNC:__kmpc_for_static_init_4u]](ptr @1, i32 %omp_global_thread_num, i32 91, ptr %p.lastiter, ptr %p.lowerbound, ptr %p.upperbound, ptr %p.stride, i32 1, i32 1024) |
This way we still ensure the right runtime function is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
`dist_schedule` was previously supported in Flang/Clang but was not implemented in MLIR, instead a user would get a "not yet implemented" error. This patch adds support for the `dist_schedule` clause to be lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop` section. There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. This patch also ensures that when using dist_schedule or a chunked schedule clause, the correct llvm loop parallel accesses details are added.
dist_schedule is not supported for dynamically allocated loops, so we can remove the need for the DistScheduleChunkSize from this function.
039e86e to
444531d
Compare
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/207/builds/10223 Here is the relevant piece of the build log for the reference |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/19564 Here is the relevant piece of the build log for the reference |
After the merging of llvm#152736, there are a number of OpenMP Features that are now fully supported. Missing from the initial patch was changing the status of this from `P` to `Y` to indicate they are now fully supported. The notes around not supporting `dist_schedule` were removed in the initial patch.
After the merging of #152736, there are a number of OpenMP Features that are now fully supported. Missing from the initial patch was changing the status of this from `P` to `Y` to indicate they are now fully supported. The notes around not supporting `dist_schedule` were removed in the initial patch.
After the merging of llvm#152736, there are a number of OpenMP Features that are now fully supported. Missing from the initial patch was changing the status of this from `P` to `Y` to indicate they are now fully supported. The notes around not supporting `dist_schedule` were removed in the initial patch.
| assert(!ChunkSize || !DistScheduleChunkSize && | ||
| "No chunk size with static-chunked schedule"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small bug- should be
assert((!ChunkSize || !DistScheduleChunkSize) &&
"No chunk size with static-chunked schedule");There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I will raise a patch to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #170269 to address this.
When llvm#152736 was initially merged, the assert that checks for the chunksize when applying a static-chunked schedule was incorrect. While it would not have changed the behaviour of the assert, the string attached to it would have been emitted in cases where it was simplified.
`dist_schedule` was previously supported in Flang/Clang but was not implemented in MLIR, instead a user would get a "not yet implemented" error. This patch adds support for the `dist_schedule` clause to be lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop` section. There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. This patch also ensures that when using dist_schedule or a chunked schedule clause, the correct llvm loop parallel accesses details are added.
After the merging of llvm#152736, there are a number of OpenMP Features that are now fully supported. Missing from the initial patch was changing the status of this from `P` to `Y` to indicate they are now fully supported. The notes around not supporting `dist_schedule` were removed in the initial patch.
When #152736 was initially merged, the assert that checks for the chunksize when applying a static-chunked schedule was incorrect. While it would not have changed the behaviour of the assert, the string attached to it would have been emitted in cases where it was simplified. This was raised here: #152736 (comment) Testing for this was explored, but this assert is a last chance failure point that should never be reached as applyWorkshareLoop decides the `EffectiveScheduleType` based on the existence of `ChunkSize` or `DistScheduleChunkSize`, so this will only trigger if there are issues with that conversion, and UnitTesting already exists for `applyWorkshareLoop`
…#170269) When #152736 was initially merged, the assert that checks for the chunksize when applying a static-chunked schedule was incorrect. While it would not have changed the behaviour of the assert, the string attached to it would have been emitted in cases where it was simplified. This was raised here: llvm/llvm-project#152736 (comment) Testing for this was explored, but this assert is a last chance failure point that should never be reached as applyWorkshareLoop decides the `EffectiveScheduleType` based on the existence of `ChunkSize` or `DistScheduleChunkSize`, so this will only trigger if there are issues with that conversion, and UnitTesting already exists for `applyWorkshareLoop`
`dist_schedule` was previously supported in Flang/Clang but was not implemented in MLIR, instead a user would get a "not yet implemented" error. This patch adds support for the `dist_schedule` clause to be lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop` section. There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. This patch also ensures that when using dist_schedule or a chunked schedule clause, the correct llvm loop parallel accesses details are added.
After the merging of llvm#152736, there are a number of OpenMP Features that are now fully supported. Missing from the initial patch was changing the status of this from `P` to `Y` to indicate they are now fully supported. The notes around not supporting `dist_schedule` were removed in the initial patch.
When llvm#152736 was initially merged, the assert that checks for the chunksize when applying a static-chunked schedule was incorrect. While it would not have changed the behaviour of the assert, the string attached to it would have been emitted in cases where it was simplified. This was raised here: llvm#152736 (comment) Testing for this was explored, but this assert is a last chance failure point that should never be reached as applyWorkshareLoop decides the `EffectiveScheduleType` based on the existence of `ChunkSize` or `DistScheduleChunkSize`, so this will only trigger if there are issues with that conversion, and UnitTesting already exists for `applyWorkshareLoop`
When llvm#152736 was initially merged, the assert that checks for the chunksize when applying a static-chunked schedule was incorrect. While it would not have changed the behaviour of the assert, the string attached to it would have been emitted in cases where it was simplified. This was raised here: llvm#152736 (comment) Testing for this was explored, but this assert is a last chance failure point that should never be reached as applyWorkshareLoop decides the `EffectiveScheduleType` based on the existence of `ChunkSize` or `DistScheduleChunkSize`, so this will only trigger if there are issues with that conversion, and UnitTesting already exists for `applyWorkshareLoop`
dist_schedulewas previously supported in Flang/Clang but was notimplemented in MLIR, instead a user would get a "not yet implemented"
error. This patch adds support for the
dist_scheduleclause to belowered to LLVM IR when used in an
omp.distributeoromp.wsloopsection.
There has needed to be some rework required to ensure that MLIR/LLVM
emits the correct Schedule Type for the clause, as it uses a different
schedule type to other OpenMP directives/clauses in the runtime library.
This patch also ensures that when using dist_schedule or a chunked
schedule clause, the correct llvm loop parallel accesses details are
added.