Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 19 additions & 49 deletions clang/docs/HIPSupport.rst
Original file line number Diff line number Diff line change
Expand Up @@ -545,37 +545,22 @@ The following restrictions imposed on user code apply to both modes:
1. Pointers to function, and all associated features, such as e.g. dynamic
polymorphism, cannot be used (directly or transitively) by the user provided
callable passed to an algorithm invocation;
2. Global / namespace scope / ``static`` / ``thread`` storage duration variables
cannot be used (directly or transitively) in name by the user provided
callable;

- When executing in **HMM Mode** they can be used in address e.g.:

.. code-block:: C++

namespace { int foo = 42; }

bool never(const std::vector<int>& v) {
return std::any_of(std::execution::par_unseq, std::cbegin(v), std::cend(v), [](auto&& x) {
return x == foo;
});
}

bool only_in_hmm_mode(const std::vector<int>& v) {
return std::any_of(std::execution::par_unseq, std::cbegin(v), std::cend(v),
[p = &foo](auto&& x) { return x == *p; });
}

3. Only algorithms that are invoked with the ``parallel_unsequenced_policy`` are
2. ``static`` (except for program-wide unique ones) / ``thread`` storage
duration variables cannot be used (directly or transitively) in name by the
user provided callable;
3. User code must be compiled in ``-fgpu-rdc`` mode in order for global /
namespace scope variables / program-wide unique ``static`` storage duration
variables to be usable in name by the user provided callable;
4. Only algorithms that are invoked with the ``parallel_unsequenced_policy`` are
candidates for offload;
4. Only algorithms that are invoked with iterator arguments that model
5. Only algorithms that are invoked with iterator arguments that model
`random_access_iterator <https://en.cppreference.com/w/cpp/iterator/random_access_iterator>`_
are candidates for offload;
5. `Exceptions <https://en.cppreference.com/w/cpp/language/exceptions>`_ cannot
6. `Exceptions <https://en.cppreference.com/w/cpp/language/exceptions>`_ cannot
be used by the user provided callable;
6. Dynamic memory allocation (e.g. ``operator new``) cannot be used by the user
7. Dynamic memory allocation (e.g. ``operator new``) cannot be used by the user
provided callable;
7. Selective offload is not possible i.e. it is not possible to indicate that
8. Selective offload is not possible i.e. it is not possible to indicate that
only some algorithms invoked with the ``parallel_unsequenced_policy`` are to
be executed on the accelerator.

Expand All @@ -585,15 +570,6 @@ additional restrictions:
1. All code that is expected to interoperate has to be recompiled with the
``--hipstdpar-interpose-alloc`` flag i.e. it is not safe to compose libraries
that have been independently compiled;
2. automatic storage duration (i.e. stack allocated) variables cannot be used
(directly or transitively) by the user provided callable e.g.

.. code-block:: c++

bool never(const std::vector<int>& v, int n) {
return std::any_of(std::execution::par_unseq, std::cbegin(v), std::cend(v),
[p = &n](auto&& x) { return x == *p; });
}

Current Support
===============
Expand Down Expand Up @@ -626,17 +602,12 @@ Linux operating system. Support is synthesised in the following table:

The minimum Linux kernel version for running in HMM mode is 6.4.

The forwarding header can be obtained from
`its GitHub repository <https://github.com/ROCm/roc-stdpar>`_.
It will be packaged with a future `ROCm <https://rocm.docs.amd.com/en/latest/>`_
release. Because accelerated algorithms are provided via
`rocThrust <https://rocm.docs.amd.com/projects/rocThrust/en/latest/>`_, a
transitive dependency on
`rocPrim <https://rocm.docs.amd.com/projects/rocPRIM/en/latest/>`_ exists. Both
can be obtained either by installing their associated components of the
`ROCm <https://rocm.docs.amd.com/en/latest/>`_ stack, or from their respective
repositories. The list algorithms that can be offloaded is available
`here <https://github.com/ROCm/roc-stdpar#algorithm-support-status>`_.
The forwarding header is packaged by
`ROCm <https://rocm.docs.amd.com/en/latest/>`_, and is obtainable by installing
the `hipstdpar` packege. The list algorithms that can be offloaded is available
`here <https://github.com/ROCm/roc-stdpar#algorithm-support-status>`_. More
details are available via the dedicated blog
`<https://rocm.blogs.amd.com/software-tools-optimization/hipstdpar/README.html>`_.

HIP Specific Elements
---------------------
Expand Down Expand Up @@ -690,9 +661,8 @@ HIP Specific Elements
Open Questions / Future Developments
====================================

1. The restriction on the use of global / namespace scope / ``static`` /
``thread`` storage duration variables in offloaded algorithms will be lifted
in the future, when running in **HMM Mode**;
1. The restriction on the use of ``static`` / ``thread`` storage duration
variables in offloaded algorithms might be lifted;
2. The restriction on the use of dynamic memory allocation in offloaded
algorithms will be lifted in the future.
3. The restriction on the use of pointers to function, and associated features
Expand Down
218 changes: 208 additions & 10 deletions llvm/lib/Transforms/HipStdPar/HipStdPar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"

Expand Down Expand Up @@ -114,24 +115,221 @@ static inline void clearModule(Module &M) { // TODO: simplify.
eraseFromModule(*M.ifuncs().begin());
}

static inline SmallVector<std::reference_wrapper<Use>>
collectIndirectableUses(GlobalVariable *G) {
// We are interested only in use chains that end in an Instruction.
SmallVector<std::reference_wrapper<Use>> Uses;

SmallVector<std::reference_wrapper<Use>> Tmp(G->use_begin(), G->use_end());
while (!Tmp.empty()) {
Use &U = Tmp.back();
Tmp.pop_back();
if (isa<Instruction>(U.getUser()))
Uses.emplace_back(U);
else
transform(U.getUser()->uses(), std::back_inserter(Tmp),
[](auto &&U) { return std::ref(U); });
}

return Uses;
}

static inline GlobalVariable *getGlobalForName(GlobalVariable *G) {
// Create an anonymous global which stores the variable's name, which will be
// used by the HIPSTDPAR runtime to look up the program-wide symbol.
LLVMContext &Ctx = G->getContext();
auto *CDS = ConstantDataArray::getString(Ctx, G->getName());

GlobalVariable *N = G->getParent()->getOrInsertGlobal("", CDS->getType());
N->setInitializer(CDS);
N->setLinkage(GlobalValue::LinkageTypes::PrivateLinkage);
N->setConstant(true);

return N;
}

static inline GlobalVariable *getIndirectionGlobal(Module *M) {
// Create an anonymous global which stores a pointer to a pointer, which will
// be externally initialised by the HIPSTDPAR runtime with the address of the
// program-wide symbol.
Type *PtrTy = PointerType::get(
M->getContext(), M->getDataLayout().getDefaultGlobalsAddressSpace());
GlobalVariable *NewG = M->getOrInsertGlobal("", PtrTy);

NewG->setInitializer(PoisonValue::get(NewG->getValueType()));
NewG->setLinkage(GlobalValue::LinkageTypes::PrivateLinkage);
NewG->setConstant(true);
NewG->setExternallyInitialized(true);

return NewG;
}

static inline Constant *
appendIndirectedGlobal(const GlobalVariable *IndirectionTable,
SmallVector<Constant *> &SymbolIndirections,
GlobalVariable *ToIndirect) {
Module *M = ToIndirect->getParent();

auto *InitTy = cast<StructType>(IndirectionTable->getValueType());
auto *SymbolListTy = cast<StructType>(InitTy->getStructElementType(2));
Type *NameTy = SymbolListTy->getElementType(0);
Type *IndirectTy = SymbolListTy->getElementType(1);

Constant *NameG = getGlobalForName(ToIndirect);
Constant *IndirectG = getIndirectionGlobal(M);
Constant *Entry = ConstantStruct::get(
SymbolListTy, {ConstantExpr::getAddrSpaceCast(NameG, NameTy),
ConstantExpr::getAddrSpaceCast(IndirectG, IndirectTy)});
SymbolIndirections.push_back(Entry);

return IndirectG;
}

static void fillIndirectionTable(GlobalVariable *IndirectionTable,
SmallVector<Constant *> Indirections) {
Module *M = IndirectionTable->getParent();
size_t SymCnt = Indirections.size();

auto *InitTy = cast<StructType>(IndirectionTable->getValueType());
Type *SymbolListTy = InitTy->getStructElementType(1);
auto *SymbolTy = cast<StructType>(InitTy->getStructElementType(2));

Constant *Count = ConstantInt::get(InitTy->getStructElementType(0), SymCnt);
M->removeGlobalVariable(IndirectionTable);
GlobalVariable *Symbols =
M->getOrInsertGlobal("", ArrayType::get(SymbolTy, SymCnt));
Symbols->setLinkage(GlobalValue::LinkageTypes::PrivateLinkage);
Symbols->setInitializer(
ConstantArray::get(ArrayType::get(SymbolTy, SymCnt), {Indirections}));
Symbols->setConstant(true);

Constant *ASCSymbols = ConstantExpr::getAddrSpaceCast(Symbols, SymbolListTy);
Constant *Init = ConstantStruct::get(
InitTy, {Count, ASCSymbols, PoisonValue::get(SymbolTy)});
M->insertGlobalVariable(IndirectionTable);
IndirectionTable->setInitializer(Init);
}

static void replaceWithIndirectUse(const Use &U, const GlobalVariable *G,
Constant *IndirectedG) {
auto *I = cast<Instruction>(U.getUser());

IRBuilder<> Builder(I);
Value *Op = I->getOperand(U.getOperandNo());

// We walk back up the use chain, which could be an arbitrarily long sequence
// of constexpr AS casts, ptr-to-int and GEP instructions, until we reach the
// indirected global.
while (auto *CE = dyn_cast<ConstantExpr>(Op)) {
assert((CE->getOpcode() == Instruction::GetElementPtr ||
CE->getOpcode() == Instruction::AddrSpaceCast ||
CE->getOpcode() == Instruction::PtrToInt) &&
"Only GEP, ASCAST or PTRTOINT constant uses supported!");

Instruction *NewI = Builder.Insert(CE->getAsInstruction());
I->replaceUsesOfWith(Op, NewI);
I = NewI;
Op = I->getOperand(0);
Builder.SetInsertPoint(I);
}

assert(Op == G && "Must reach indirected global!");

Builder.GetInsertPoint()->setOperand(
0, Builder.CreateLoad(G->getType(), IndirectedG));
}

static inline bool isValidIndirectionTable(GlobalVariable *IndirectionTable) {
std::string W;
raw_string_ostream OS(W);

Type *Ty = IndirectionTable->getValueType();
bool Valid = false;

if (!isa<StructType>(Ty)) {
OS << "The Indirection Table must be a struct type; ";
Ty->print(OS);
OS << " is incorrect.\n";
} else if (cast<StructType>(Ty)->getNumElements() != 3u) {
OS << "The Indirection Table must have 3 elements; "
<< cast<StructType>(Ty)->getNumElements() << " is incorrect.\n";
} else if (!isa<IntegerType>(cast<StructType>(Ty)->getStructElementType(0))) {
OS << "The first element in the Indirection Table must be an integer; ";
cast<StructType>(Ty)->getStructElementType(0)->print(OS);
OS << " is incorrect.\n";
} else if (!isa<PointerType>(cast<StructType>(Ty)->getStructElementType(1))) {
OS << "The second element in the Indirection Table must be a pointer; ";
cast<StructType>(Ty)->getStructElementType(1)->print(OS);
OS << " is incorrect.\n";
} else if (!isa<StructType>(cast<StructType>(Ty)->getStructElementType(2))) {
OS << "The third element in the Indirection Table must be a struct type; ";
cast<StructType>(Ty)->getStructElementType(2)->print(OS);
OS << " is incorrect.\n";
} else {
Valid = true;
}

if (!Valid)
IndirectionTable->getContext().diagnose(DiagnosticInfoGeneric(W, DS_Error));

return Valid;
}

static void indirectGlobals(GlobalVariable *IndirectionTable,
SmallVector<GlobalVariable *> ToIndirect) {
// We replace globals with an indirected access via a pointer that will get
// set by the HIPSTDPAR runtime, using their accessible, program-wide unique
// address as set by the host linker-loader.
SmallVector<Constant *> SymbolIndirections;
for (auto &&G : ToIndirect) {
SmallVector<std::reference_wrapper<Use>> Uses = collectIndirectableUses(G);

if (Uses.empty())
continue;

Constant *IndirectedGlobal =
appendIndirectedGlobal(IndirectionTable, SymbolIndirections, G);

for_each(Uses,
[=](auto &&U) { replaceWithIndirectUse(U, G, IndirectedGlobal); });

eraseFromModule(*G);
}

if (SymbolIndirections.empty())
return;

fillIndirectionTable(IndirectionTable, std::move(SymbolIndirections));
}

static inline void maybeHandleGlobals(Module &M) {
unsigned GlobAS = M.getDataLayout().getDefaultGlobalsAddressSpace();
for (auto &&G : M.globals()) { // TODO: should we handle these in the FE?

SmallVector<GlobalVariable *> ToIndirect;
for (auto &&G : M.globals()) {
if (!checkIfSupported(G))
return clearModule(M);

if (G.isThreadLocal())
continue;
if (G.isConstant())
continue;
if (G.getAddressSpace() != GlobAS)
continue;
if (G.getLinkage() != GlobalVariable::ExternalLinkage)
if (G.isConstant() && G.hasInitializer() && G.hasAtLeastLocalUnnamedAddr())
continue;

G.setLinkage(GlobalVariable::ExternalWeakLinkage);
G.setInitializer(nullptr);
G.setExternallyInitialized(true);
ToIndirect.push_back(&G);
}

if (ToIndirect.empty())
return;

if (auto *IT = M.getNamedGlobal("__hipstdpar_symbol_indirection_table")) {
if (!isValidIndirectionTable(IT))
return clearModule(M);
return indirectGlobals(IT, std::move(ToIndirect));
} else {
for (auto &&G : ToIndirect) {
// We will internalise these, so we provide a poison initialiser.
if (!G->hasInitializer())
G->setInitializer(PoisonValue::get(G->getValueType()));
}
}
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
; REQUIRES: amdgpu-registered-target
; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa -passes=hipstdpar-select-accelerator-code \
; RUN: %s 2>&1 | FileCheck %s

; CHECK: error: The first element in the Indirection Table must be an integer; %struct.anon.1 = type { ptr, ptr } is incorrect.
%struct.anon.1 = type { ptr, ptr }
%class.anon = type { %struct.anon.1, ptr, %struct.anon.1 }
@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) externally_initialized constant %class.anon zeroinitializer, align 8

define amdgpu_kernel void @store(ptr %p) {
entry:
store ptr %p, ptr addrspace(1) @a, align 8
ret void
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
; REQUIRES: amdgpu-registered-target
; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa -passes=hipstdpar-select-accelerator-code \
; RUN: %s 2>&1 | FileCheck %s

; CHECK: error: The second element in the Indirection Table must be a pointer; %struct.anon.1 = type { ptr, ptr } is incorrect.
%struct.anon.1 = type { ptr, ptr }
%class.anon = type { i64, %struct.anon.1, %struct.anon.1 }
@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) externally_initialized constant %class.anon zeroinitializer, align 8

define amdgpu_kernel void @store(ptr %p) {
entry:
store ptr %p, ptr addrspace(1) @a, align 8
ret void
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
; REQUIRES: amdgpu-registered-target
; RUN: not opt -S -mtriple=amdgcn-amd-amdhsa -passes=hipstdpar-select-accelerator-code \
; RUN: %s 2>&1 | FileCheck %s

; CHECK: error: The third element in the Indirection Table must be a struct type; i64 is incorrect.
%struct.anon.1 = type { ptr, ptr }
%class.anon = type { i64, ptr, i64 }
@a = external hidden local_unnamed_addr addrspace(1) global ptr, align 8
@__hipstdpar_symbol_indirection_table = weak_odr protected addrspace(4) externally_initialized constant %class.anon zeroinitializer, align 8

define amdgpu_kernel void @store(ptr %p) {
entry:
store ptr %p, ptr addrspace(1) @a, align 8
ret void
}
Loading