Skip to content

Commit 8f58397

Browse files
committed
[OpenMP][Offload] Handle for non-memberof present/to/from entries irrespective of order.
For cases like: ```c map(alloc: x) map(to: x) ``` If the entry of `map(to: x)` is encountered after the entry for `map(alloc:x)`, we still want to do a data-transfer even though the ref-count of `x` was already 0, because the new allocation for `x` happened as part of the current directive. Similarly, for: ```c ... map(alloc: x) map(from: x) ``` If the entry for `map(from:x)` is encountered before the entry for `map(alloc:x)`, we want to do a data-transfer even though the ref-count was not 0 when looking at the `from` entry, because by the end of the directive, the ref-count of `x` will go down to zero. And for: ```c ... map(from : x) map(alloc, present: x) ``` If the "present" entry is encountered after the "from" entry, then it becomes a no-op, as the "from" entry will do an allocation if no match was found. In this PR, these are handled by the runtime via the following: * For `to` and `present`, we also look-up in the existing table where we tracked new allocations when making the decision for the entry. * For `from`, we keep track of any deferred data transfers and when the ref-count of a pointer goes to zero, see if there were any previously deferred `from` transfers for that pointer. This can be done in the compiler, and that would avoid any runtime overhead, but it would require creating two separate offload struct entries for the entry and exit mappings (even for the `target` construct), with properly decayed maps, and either: (1) sorted in order of: * `present > to > ...` for the implied `target enter data`; and * `from > ...` for the `target exit data` e.g. ```c #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x) // has to be broken into: // from becomes alloc on entry: // #pragma omp target enter data map(present, alloc: x) // map(to: x) // map(alloc: x) // // "present" and "to" just "decay" into "alloc" // #pragma omp target exit data map(always, from: x) // map(alloc: x) // map(alloc: x) ``` Or, (2) Merged into one entry each on the `target enter/exit data` directives. ```c #pragma omp target map(to: x) map(present, alloc: x) map(always, from: x) // has to be broken into: // from becomes alloc on entry: // #pragma omp target enter data map(present, to: x) // // "present" and "to" just "decay" into "alloc" // #pragma omp target exit data map(always, from: x) ``` The number of entries on the two would need to stay the same on the two to avoid ref-count mismatch. (1) would be simpler, but won't likely work for cases like: ```c ... map(delete: x) map(from:x) ``` as there is no clear "winner" between the two. So, for such cases, the compiler would likely have to do (2), which is the cleanest solution, but will take longer to implement. For EXPR comparisons, it can build-upon the `AttachPtrExprComparator` that was implemented as part of #153683, but that should probably wait for the PR to be merged to avoid conflicts. Another alternative is to sort the entries in the runtime, which may be slower than on-demand lookups/updates that this PR does, because we always would be doing this sorting even when not needed, but may be faster in others where the constant-time overhead of map/set insertions/lookups becomes too large because of the number of maps. But that will still have to worry about the `from` + `delete` case.
1 parent c3d905e commit 8f58397

10 files changed

+249
-57
lines changed

offload/include/OpenMP/Mapping.h

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -484,28 +484,34 @@ struct AttachMapInfo {
484484
MapType(Type), Pointername(Name) {}
485485
};
486486

487-
/// Structure to track ATTACH entries and new allocations across recursive calls
488-
/// (for handling mappers) to targetDataBegin for a given construct.
489-
struct AttachInfoTy {
490-
/// ATTACH map entries for deferred processing.
487+
/// Structure to track new allocations, ATTACH entries and deferred data
488+
/// transfer information for a given construct, across recursive calls (for
489+
/// handling mappers) to targetDataBegin/targetDataEnd.
490+
struct StateInfoTy {
491+
/// ATTACH map entries for deferred processing until all other maps are done.
491492
llvm::SmallVector<AttachMapInfo> AttachEntries;
492493

494+
/// Host pointers for which new memory was allocated.
493495
/// Key: host pointer, Value: allocation size.
494496
llvm::DenseMap<void *, int64_t> NewAllocations;
495497

496-
AttachInfoTy() = default;
498+
/// Host pointers that had a FROM entry, but for which a data transfer didn't
499+
/// occur due to the ref-count not being zero.
500+
llvm::SmallSet<void *, 32> DeferredFromPtrs;
501+
502+
StateInfoTy() = default;
497503

498504
// Delete copy constructor and copy assignment operator to prevent copying
499-
AttachInfoTy(const AttachInfoTy &) = delete;
500-
AttachInfoTy &operator=(const AttachInfoTy &) = delete;
505+
StateInfoTy(const StateInfoTy &) = delete;
506+
StateInfoTy &operator=(const StateInfoTy &) = delete;
501507
};
502508

503509
// Function pointer type for targetData* functions (targetDataBegin,
504510
// targetDataEnd and targetDataUpdate).
505511
typedef int (*TargetDataFuncPtrTy)(ident_t *, DeviceTy &, int32_t, void **,
506512
void **, int64_t *, int64_t *,
507513
map_var_info_t *, void **, AsyncInfoTy &,
508-
AttachInfoTy *, bool);
514+
StateInfoTy *, bool);
509515

510516
void dumpTargetPointerMappings(const ident_t *Loc, DeviceTy &Device,
511517
bool toStdOut = false);
@@ -514,24 +520,22 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
514520
void **ArgsBase, void **Args, int64_t *ArgSizes,
515521
int64_t *ArgTypes, map_var_info_t *ArgNames,
516522
void **ArgMappers, AsyncInfoTy &AsyncInfo,
517-
AttachInfoTy *AttachInfo = nullptr,
518-
bool FromMapper = false);
523+
StateInfoTy *StateInfo = nullptr, bool FromMapper = false);
519524

520525
int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
521526
void **ArgBases, void **Args, int64_t *ArgSizes,
522527
int64_t *ArgTypes, map_var_info_t *ArgNames,
523528
void **ArgMappers, AsyncInfoTy &AsyncInfo,
524-
AttachInfoTy *AttachInfo = nullptr, bool FromMapper = false);
529+
StateInfoTy *StateInfo = nullptr, bool FromMapper = false);
525530

526531
int targetDataUpdate(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
527532
void **ArgsBase, void **Args, int64_t *ArgSizes,
528533
int64_t *ArgTypes, map_var_info_t *ArgNames,
529534
void **ArgMappers, AsyncInfoTy &AsyncInfo,
530-
AttachInfoTy *AttachInfo = nullptr,
531-
bool FromMapper = false);
535+
StateInfoTy *StateInfo = nullptr, bool FromMapper = false);
532536

533537
// Process deferred ATTACH map entries collected during targetDataBegin.
534-
int processAttachEntries(DeviceTy &Device, AttachInfoTy &AttachInfo,
538+
int processAttachEntries(DeviceTy &Device, StateInfoTy &StateInfo,
535539
AsyncInfoTy &AsyncInfo);
536540

537541
struct MappingInfoTy {
@@ -572,7 +576,7 @@ struct MappingInfoTy {
572576
bool HasFlagTo, bool HasFlagAlways, bool IsImplicit, bool UpdateRefCount,
573577
bool HasCloseModifier, bool HasPresentModifier, bool HasHoldModifier,
574578
AsyncInfoTy &AsyncInfo, HostDataToTargetTy *OwnedTPR = nullptr,
575-
bool ReleaseHDTTMap = true);
579+
bool ReleaseHDTTMap = true, StateInfoTy *StateInfo = nullptr);
576580

577581
/// Return the target pointer for \p HstPtrBegin in \p HDTTMap. The accessor
578582
/// ensures exclusive access to the HDTT map.

offload/libomptarget/OpenMP/Mapping.cpp

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,8 @@ TargetPointerResultTy MappingInfoTy::getTargetPointer(
202202
int64_t TgtPadding, int64_t Size, map_var_info_t HstPtrName, bool HasFlagTo,
203203
bool HasFlagAlways, bool IsImplicit, bool UpdateRefCount,
204204
bool HasCloseModifier, bool HasPresentModifier, bool HasHoldModifier,
205-
AsyncInfoTy &AsyncInfo, HostDataToTargetTy *OwnedTPR, bool ReleaseHDTTMap) {
205+
AsyncInfoTy &AsyncInfo, HostDataToTargetTy *OwnedTPR, bool ReleaseHDTTMap,
206+
StateInfoTy *StateInfo) {
206207

207208
LookupResult LR = lookupMapping(HDTTMap, HstPtrBegin, Size, OwnedTPR);
208209
LR.TPR.Flags.IsPresent = true;
@@ -324,8 +325,18 @@ TargetPointerResultTy MappingInfoTy::getTargetPointer(
324325

325326
// If the target pointer is valid, and we need to transfer data, issue the
326327
// data transfer.
328+
auto WasNewlyAllocatedOnCurrentConstruct = [&]() {
329+
if (!StateInfo)
330+
return false;
331+
return StateInfo->NewAllocations.contains(HstPtrBegin);
332+
};
333+
334+
// Even if this isn't a new entry, we still need to do a data-transfer if
335+
// the pointer was newly allocated previously on the same construct.
327336
if (LR.TPR.TargetPointer && !LR.TPR.Flags.IsHostPointer && HasFlagTo &&
328-
(LR.TPR.Flags.IsNewEntry || HasFlagAlways) && Size != 0) {
337+
(LR.TPR.Flags.IsNewEntry || HasFlagAlways ||
338+
WasNewlyAllocatedOnCurrentConstruct()) &&
339+
Size != 0) {
329340

330341
// If we have something like:
331342
// #pragma omp target map(to: s.myarr[0:10]) map(to: s.myarr[0:10])

offload/libomptarget/interface.cpp

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -167,19 +167,22 @@ targetData(ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
167167

168168
int Rc = OFFLOAD_SUCCESS;
169169

170-
// Only allocate AttachInfo for targetDataBegin
171-
std::unique_ptr<AttachInfoTy> AttachInfo;
172-
if (TargetDataFunction == targetDataBegin)
173-
AttachInfo = std::make_unique<AttachInfoTy>();
170+
// Allocate StateInfo for targetDataBegin and targetDataEnd to track
171+
// allocations, pointer attachments and deferred transfers.
172+
// This is not needed for targetDataUpdate.
173+
std::unique_ptr<StateInfoTy> StateInfo;
174+
if (TargetDataFunction == targetDataBegin ||
175+
TargetDataFunction == targetDataEnd)
176+
StateInfo = std::make_unique<StateInfoTy>();
174177

175178
Rc = TargetDataFunction(Loc, *DeviceOrErr, ArgNum, ArgsBase, Args, ArgSizes,
176179
ArgTypes, ArgNames, ArgMappers, AsyncInfo,
177-
AttachInfo.get(), /*FromMapper=*/false);
180+
StateInfo.get(), /*FromMapper=*/false);
178181

179182
if (Rc == OFFLOAD_SUCCESS) {
180183
// Process deferred ATTACH entries BEFORE synchronization
181-
if (AttachInfo && !AttachInfo->AttachEntries.empty())
182-
Rc = processAttachEntries(*DeviceOrErr, *AttachInfo, AsyncInfo);
184+
if (StateInfo && !StateInfo->AttachEntries.empty())
185+
Rc = processAttachEntries(*DeviceOrErr, *StateInfo, AsyncInfo);
183186

184187
if (Rc == OFFLOAD_SUCCESS)
185188
Rc = AsyncInfo.synchronize();

0 commit comments

Comments
 (0)