Skip to content

Commit 49ccf46

Browse files
authored
[OpenMP] [IR Builder] Changes to Support Scan Operation (#136035)
Scan reductions are supported in OpenMP with the help of scan directive. Reduction clause of the for loop/simd directive can take an `inscan` modifier along with the body of the directive specifying a `scan` directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP. The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase. Below is a high level view of the code generated. ``` <declare pointer to buffer> ptr omp parallel { size num_iters = <num_iters> // temp buffer allocation omp masked { buff = malloc(num_iters*scanvarstype) *ptr = buff } barrier; // input phase loop for (i: 0..<num_iters>) { <input phase>; buffer = *ptr; buffer[i] = red; } // scan reduction omp masked { for (int k = 0; k != ceil(log2(num_iters)); ++k) { i=pow(2,k) for (size cnt = last_iter; cnt >= i; --cnt) { buffer = *ptr; buffer[cnt] op= buffer[cnt-i]; } } } barrier; // scan phase loop for (0..<num_iters>) { buffer = *ptr; red = buffer[i] ; <scan phase>; } // temp buffer deletion omp masked { free(*ptr) } barrier; } ``` The temporary buffer needs to be shared between all threads performing reduction since it is read/written in Input and Scan workshare Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread. This is the reason why allocation, deallocation and scan reduction are performed within `masked`. The code is verified to produce correct results for Fortran programs with the code changes in the PR #133149
1 parent 9faac93 commit 49ccf46

File tree

3 files changed

+765
-2
lines changed

3 files changed

+765
-2
lines changed

llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h

Lines changed: 218 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131

3232
namespace llvm {
3333
class CanonicalLoopInfo;
34+
class ScanInfo;
3435
struct TargetRegionEntryInfo;
3536
class OffloadEntriesInfoManager;
3637
class OpenMPIRBuilder;
@@ -707,6 +708,9 @@ class OpenMPIRBuilder {
707708
LLVM_ABI InsertPointOrErrorTy createCancellationPoint(
708709
const LocationDescription &Loc, omp::Directive CanceledDirective);
709710

711+
/// Creates a ScanInfo object, allocates and returns the pointer.
712+
Expected<ScanInfo *> scanInfoInitialize();
713+
710714
/// Generator for '#omp parallel'
711715
///
712716
/// \param Loc The insert and source location description.
@@ -750,6 +754,42 @@ class OpenMPIRBuilder {
750754
LoopBodyGenCallbackTy BodyGenCB, Value *TripCount,
751755
const Twine &Name = "loop");
752756

757+
/// Generator for the control flow structure of an OpenMP canonical loops if
758+
/// the parent directive has an `inscan` modifier specified.
759+
/// If the `inscan` modifier is specified, the region of the parent is
760+
/// expected to have a `scan` directive. Based on the clauses in
761+
/// scan directive, the body of the loop is split into two loops: Input loop
762+
/// and Scan Loop. Input loop contains the code generated for input phase of
763+
/// scan and Scan loop contains the code generated for scan phase of scan.
764+
/// From the bodyGen callback of these loops, `createScan` would be called
765+
/// when a scan directive is encountered from the loop body. `createScan`
766+
/// based on whether 1. inclusive or exclusive scan is specified and, 2. input
767+
/// loop or scan loop is generated, lowers the body of the for loop
768+
/// accordingly.
769+
///
770+
/// \param Loc The insert and source location description.
771+
/// \param BodyGenCB Callback that will generate the loop body code.
772+
/// \param Start Value of the loop counter for the first iterations.
773+
/// \param Stop Loop counter values past this will stop the loop.
774+
/// \param Step Loop counter increment after each iteration; negative
775+
/// means counting down.
776+
/// \param IsSigned Whether Start, Stop and Step are signed integers.
777+
/// \param InclusiveStop Whether \p Stop itself is a valid value for the loop
778+
/// counter.
779+
/// \param ComputeIP Insertion point for instructions computing the trip
780+
/// count. Can be used to ensure the trip count is available
781+
/// at the outermost loop of a loop nest. If not set,
782+
/// defaults to the preheader of the generated loop.
783+
/// \param Name Base name used to derive BB and instruction names.
784+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
785+
/// `ScanInfoInitialize`.
786+
///
787+
/// \returns A vector containing Loop Info of Input Loop and Scan Loop.
788+
Expected<SmallVector<llvm::CanonicalLoopInfo *>> createCanonicalScanLoops(
789+
const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB,
790+
Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop,
791+
InsertPointTy ComputeIP, const Twine &Name, ScanInfo *ScanRedInfo);
792+
753793
/// Calculate the trip count of a canonical loop.
754794
///
755795
/// This allows specifying user-defined loop counter values using increment,
@@ -818,13 +858,17 @@ class OpenMPIRBuilder {
818858
/// at the outermost loop of a loop nest. If not set,
819859
/// defaults to the preheader of the generated loop.
820860
/// \param Name Base name used to derive BB and instruction names.
861+
/// \param InScan Whether loop has a scan reduction specified.
862+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
863+
/// `ScanInfoInitialize`.
821864
///
822865
/// \returns An object representing the created control flow structure which
823866
/// can be used for loop-associated directives.
824867
LLVM_ABI Expected<CanonicalLoopInfo *> createCanonicalLoop(
825868
const LocationDescription &Loc, LoopBodyGenCallbackTy BodyGenCB,
826869
Value *Start, Value *Stop, Value *Step, bool IsSigned, bool InclusiveStop,
827-
InsertPointTy ComputeIP = {}, const Twine &Name = "loop");
870+
InsertPointTy ComputeIP = {}, const Twine &Name = "loop",
871+
bool InScan = false, ScanInfo *ScanRedInfo = nullptr);
828872

829873
/// Collapse a loop nest into a single loop.
830874
///
@@ -1556,6 +1600,47 @@ class OpenMPIRBuilder {
15561600
ArrayRef<OpenMPIRBuilder::ReductionInfo> ReductionInfos,
15571601
Function *ReduceFn, AttributeList FuncAttrs);
15581602

1603+
/// Helper function for CreateCanonicalScanLoops to create InputLoop
1604+
/// in the firstGen and Scan Loop in the SecondGen
1605+
/// \param InputLoopGen Callback for generating the loop for input phase
1606+
/// \param ScanLoopGen Callback for generating the loop for scan phase
1607+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
1608+
/// `ScanInfoInitialize`.
1609+
///
1610+
/// \return error if any produced, else return success.
1611+
Error emitScanBasedDirectiveIR(
1612+
llvm::function_ref<Error()> InputLoopGen,
1613+
llvm::function_ref<Error(LocationDescription Loc)> ScanLoopGen,
1614+
ScanInfo *ScanRedInfo);
1615+
1616+
/// Creates the basic blocks required for scan reduction.
1617+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
1618+
/// `ScanInfoInitialize`.
1619+
void createScanBBs(ScanInfo *ScanRedInfo);
1620+
1621+
/// Dynamically allocates the buffer needed for scan reduction.
1622+
/// \param AllocaIP The IP where possibly-shared pointer of buffer needs to
1623+
/// be declared.
1624+
/// \param ScanVars Scan Variables.
1625+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
1626+
/// `ScanInfoInitialize`.
1627+
///
1628+
/// \return error if any produced, else return success.
1629+
Error emitScanBasedDirectiveDeclsIR(InsertPointTy AllocaIP,
1630+
ArrayRef<llvm::Value *> ScanVars,
1631+
ArrayRef<llvm::Type *> ScanVarsType,
1632+
ScanInfo *ScanRedInfo);
1633+
1634+
/// Copies the result back to the reduction variable.
1635+
/// \param ReductionInfos Array type containing the ReductionOps.
1636+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
1637+
/// `ScanInfoInitialize`.
1638+
///
1639+
/// \return error if any produced, else return success.
1640+
Error emitScanBasedDirectiveFinalsIR(
1641+
ArrayRef<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos,
1642+
ScanInfo *ScanInfo);
1643+
15591644
/// This function emits a helper that gathers Reduce lists from the first
15601645
/// lane of every active warp to lanes in the first warp.
15611646
///
@@ -2184,6 +2269,9 @@ class OpenMPIRBuilder {
21842269
/// free'd.
21852270
std::forward_list<CanonicalLoopInfo> LoopInfos;
21862271

2272+
/// Collection of owned ScanInfo objects that eventually need to be free'd.
2273+
std::forward_list<ScanInfo> ScanInfos;
2274+
21872275
/// Add a new region that will be outlined later.
21882276
void addOutlineInfo(OutlineInfo &&OI) { OutlineInfos.emplace_back(OI); }
21892277

@@ -2639,6 +2727,48 @@ class OpenMPIRBuilder {
26392727
FinalizeCallbackTy FiniCB,
26402728
Value *Filter);
26412729

2730+
/// This function performs the scan reduction of the values updated in
2731+
/// the input phase. The reduction logic needs to be emitted between input
2732+
/// and scan loop returned by `CreateCanonicalScanLoops`. The following
2733+
/// is the code that is generated, `buffer` and `span` are expected to be
2734+
/// populated before executing the generated code.
2735+
/// \code{c}
2736+
/// for (int k = 0; k != ceil(log2(span)); ++k) {
2737+
/// i=pow(2,k)
2738+
/// for (size cnt = last_iter; cnt >= i; --cnt)
2739+
/// buffer[cnt] op= buffer[cnt-i];
2740+
/// }
2741+
/// \endcode
2742+
/// \param Loc The insert and source location description.
2743+
/// \param ReductionInfos Array type containing the ReductionOps.
2744+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
2745+
/// `ScanInfoInitialize`.
2746+
///
2747+
/// \returns The insertion position *after* the masked.
2748+
InsertPointOrErrorTy emitScanReduction(
2749+
const LocationDescription &Loc,
2750+
ArrayRef<llvm::OpenMPIRBuilder::ReductionInfo> ReductionInfos,
2751+
ScanInfo *ScanRedInfo);
2752+
2753+
/// This directive split and directs the control flow to input phase
2754+
/// blocks or scan phase blocks based on 1. whether input loop or scan loop
2755+
/// is executed, 2. whether exclusive or inclusive scan is used.
2756+
///
2757+
/// \param Loc The insert and source location description.
2758+
/// \param AllocaIP The IP where the temporary buffer for scan reduction
2759+
// needs to be allocated.
2760+
/// \param ScanVars Scan Variables.
2761+
/// \param IsInclusive Whether it is an inclusive or exclusive scan.
2762+
/// \param ScanRedInfo Pointer to the ScanInfo objected created using
2763+
/// `ScanInfoInitialize`.
2764+
///
2765+
/// \returns The insertion position *after* the scan.
2766+
InsertPointOrErrorTy createScan(const LocationDescription &Loc,
2767+
InsertPointTy AllocaIP,
2768+
ArrayRef<llvm::Value *> ScanVars,
2769+
ArrayRef<llvm::Type *> ScanVarsType,
2770+
bool IsInclusive, ScanInfo *ScanRedInfo);
2771+
26422772
/// Generator for '#omp critical'
26432773
///
26442774
/// \param Loc The insert and source location description.
@@ -3779,6 +3909,93 @@ class CanonicalLoopInfo {
37793909
LLVM_ABI void invalidate();
37803910
};
37813911

3912+
/// ScanInfo holds the information to assist in lowering of Scan reduction.
3913+
/// Before lowering, the body of the for loop specifying scan reduction is
3914+
/// expected to have the following structure
3915+
///
3916+
/// Loop Body Entry
3917+
/// |
3918+
/// Code before the scan directive
3919+
/// |
3920+
/// Scan Directive
3921+
/// |
3922+
/// Code after the scan directive
3923+
/// |
3924+
/// Loop Body Exit
3925+
/// When `createCanonicalScanLoops` is executed, the bodyGen callback of it
3926+
/// transforms the body to:
3927+
///
3928+
/// Loop Body Entry
3929+
/// |
3930+
/// OMPScanDispatch
3931+
///
3932+
/// OMPBeforeScanBlock
3933+
/// |
3934+
/// OMPScanLoopExit
3935+
/// |
3936+
/// Loop Body Exit
3937+
///
3938+
/// The insert point is updated to the first insert point of OMPBeforeScanBlock.
3939+
/// It dominates the control flow of code generated until
3940+
/// scan directive is encountered and OMPAfterScanBlock dominates the
3941+
/// control flow of code generated after scan is encountered. The successor
3942+
/// of OMPScanDispatch can be OMPBeforeScanBlock or OMPAfterScanBlock based
3943+
/// on 1.whether it is in Input phase or Scan Phase , 2. whether it is an
3944+
/// exclusive or inclusive scan. This jump is added when `createScan` is
3945+
/// executed. If input loop is being generated, if it is inclusive scan,
3946+
/// `OMPAfterScanBlock` succeeds `OMPScanDispatch` , if exclusive,
3947+
/// `OMPBeforeScanBlock` succeeds `OMPDispatch` and vice versa for scan loop. At
3948+
/// the end of the input loop, temporary buffer is populated and at the
3949+
/// beginning of the scan loop, temporary buffer is read. After scan directive
3950+
/// is encountered, insertion point is updated to `OMPAfterScanBlock` as it is
3951+
/// expected to dominate the code after the scan directive. Both Before and
3952+
/// After scan blocks are succeeded by `OMPScanLoopExit`.
3953+
/// Temporary buffer allocations are done in `ScanLoopInit` block before the
3954+
/// lowering of for-loop. The results are copied back to reduction variable in
3955+
/// `ScanLoopFinish` block.
3956+
class ScanInfo {
3957+
public:
3958+
/// Dominates the body of the loop before scan directive
3959+
llvm::BasicBlock *OMPBeforeScanBlock = nullptr;
3960+
3961+
/// Dominates the body of the loop before scan directive
3962+
llvm::BasicBlock *OMPAfterScanBlock = nullptr;
3963+
3964+
/// Controls the flow to before or after scan blocks
3965+
llvm::BasicBlock *OMPScanDispatch = nullptr;
3966+
3967+
/// Exit block of loop body
3968+
llvm::BasicBlock *OMPScanLoopExit = nullptr;
3969+
3970+
/// Block before loop body where scan initializations are done
3971+
llvm::BasicBlock *OMPScanInit = nullptr;
3972+
3973+
/// Block after loop body where scan finalizations are done
3974+
llvm::BasicBlock *OMPScanFinish = nullptr;
3975+
3976+
/// If true, it indicates Input phase is lowered; else it indicates
3977+
/// ScanPhase is lowered
3978+
bool OMPFirstScanLoop = false;
3979+
3980+
/// Maps the private reduction variable to the pointer of the temporary
3981+
/// buffer
3982+
llvm::SmallDenseMap<llvm::Value *, llvm::Value *> *ScanBuffPtrs;
3983+
3984+
/// Keeps track of value of iteration variable for input/scan loop to be
3985+
/// used for Scan directive lowering
3986+
llvm::Value *IV;
3987+
3988+
/// Stores the span of canonical loop being lowered to be used for temporary
3989+
/// buffer allocation or Finalization.
3990+
llvm::Value *Span;
3991+
3992+
ScanInfo() {
3993+
ScanBuffPtrs = new llvm::SmallDenseMap<llvm::Value *, llvm::Value *>();
3994+
}
3995+
3996+
~ScanInfo() { delete (ScanBuffPtrs); }
3997+
};
3998+
37823999
} // end namespace llvm
37834000

37844001
#endif // LLVM_FRONTEND_OPENMP_OMPIRBUILDER_H

0 commit comments

Comments
 (0)