-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[OpenMP][clang] Indirect and Virtual function call mapping from host to device #159857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Runtime / Registration support for indirect and virtual function calls in OpenMP target regions - Register Vtable's to OpenMP offload table - Modify PluginInterface to register Vtables to indirect call table This Patch does not have the logic for calling __llvm_omp_indirect_call_lookup, and lacks implementation logic --------- Co-authored-by: Chi-Chun Chen <[email protected]> Co-authored-by: Jeffery Sandoval <[email protected]>
…to device This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup on the device when an indirect function call or a virtual function call is made within an OpenMP target region. --------- Co-authored-by: Youngsuk Kim
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: None (Jason-VanBeusekom) ChangesThis adds a feature to insert CPU-to-GPU function pointer translation at GPU How it works: This is the second out of Two PR's to implement this, commit 5247c1f is not a part of the Pull request and is handled in: #159856 Patch is 89.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159857.diff 21 Files Affected:
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index e6e4947882544..cc4c21a719f4c 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -6583,6 +6583,26 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
Address(Handle, Handle->getType(), CGM.getPointerAlign()));
Callee.setFunctionPointer(Stub);
}
+
+ // Check whether the associated CallExpr is in the set OMPTargetCalls.
+ // If YES, insert a call to devicertl function __llvm_omp_indirect_call_lookup
+ //
+ // This is used for the indriect function Case, virtual function case is
+ // handled in ItaniumCXXABI.cpp
+ if (getLangOpts().OpenMPIsTargetDevice && CGM.OMPTargetCalls.contains(E)) {
+ auto *PtrTy = CGM.VoidPtrTy;
+ llvm::Type *RtlFnArgs[] = {PtrTy};
+ llvm::FunctionCallee DeviceRtlFn = CGM.CreateRuntimeFunction(
+ llvm::FunctionType::get(PtrTy, RtlFnArgs, false),
+ "__llvm_omp_indirect_call_lookup");
+ llvm::Value *Func = Callee.getFunctionPointer();
+ llvm::Type *BackupTy = Func->getType();
+ Func = Builder.CreatePointerBitCastOrAddrSpaceCast(Func, PtrTy);
+ Func = EmitRuntimeCall(DeviceRtlFn, {Func});
+ Func = Builder.CreatePointerBitCastOrAddrSpaceCast(Func, BackupTy);
+ Callee.setFunctionPointer(Func);
+ }
+
llvm::CallBase *LocalCallOrInvoke = nullptr;
RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &LocalCallOrInvoke,
E == MustTailCall, E->getExprLoc());
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a503aaf613e30..ac1d467affc00 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -24,6 +24,7 @@
#include "clang/AST/OpenMPClause.h"
#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Basic/OpenMPKinds.h"
#include "clang/Basic/SourceManager.h"
#include "clang/CodeGen/ConstantInitBuilder.h"
@@ -1771,12 +1772,126 @@ void CGOpenMPRuntime::emitDeclareTargetFunction(const FunctionDecl *FD,
Addr->setVisibility(llvm::GlobalValue::ProtectedVisibility);
}
+ // Register the indirect Vtable:
+ // This is similar to OMPTargetGlobalVarEntryIndirect, except that the
+ // size field refers to the size of memory pointed to, not the size of
+ // the pointer symbol itself (which is implicitly the size of a pointer).
OMPBuilder.OffloadInfoManager.registerDeviceGlobalVarEntryInfo(
Name, Addr, CGM.GetTargetTypeStoreSize(CGM.VoidPtrTy).getQuantity(),
llvm::OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect,
llvm::GlobalValue::WeakODRLinkage);
}
+void CGOpenMPRuntime::registerVTableOffloadEntry(llvm::GlobalVariable *VTable,
+ const VarDecl *VD) {
+ // TODO: add logic to avoid duplicate vtable registrations per
+ // translation unit; though for external linkage, this should no
+ // longer be an issue - or at least we can avoid the issue by
+ // checking for an existing offloading entry. But, perhaps the
+ // better approach is to defer emission of the vtables and offload
+ // entries until later (by tracking a list of items that need to be
+ // emitted).
+
+ llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder();
+
+ // Generate a new externally visible global to point to the
+ // internally visible vtable. Doing this allows us to keep the
+ // visibility and linkage of the associated vtable unchanged while
+ // allowing the runtime to access its value. The externally
+ // visible global var needs to be emitted with a unique mangled
+ // name that won't conflict with similarly named (internal)
+ // vtables in other translation units.
+
+ // Register vtable with source location of dynamic object in map
+ // clause.
+ llvm::TargetRegionEntryInfo EntryInfo = getEntryInfoFromPresumedLoc(
+ CGM, OMPBuilder, VD->getCanonicalDecl()->getBeginLoc(),
+ VTable->getName());
+
+ llvm::GlobalVariable *Addr = VTable;
+ size_t PointerSize = CGM.getDataLayout().getPointerSize();
+ SmallString<128> AddrName;
+ OMPBuilder.OffloadInfoManager.getTargetRegionEntryFnName(AddrName, EntryInfo);
+ AddrName.append("addr");
+
+ if (CGM.getLangOpts().OpenMPIsTargetDevice) {
+ Addr = new llvm::GlobalVariable(
+ CGM.getModule(), VTable->getType(),
+ /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, VTable,
+ AddrName,
+ /*InsertBefore*/ nullptr, llvm::GlobalValue::NotThreadLocal,
+ CGM.getModule().getDataLayout().getDefaultGlobalsAddressSpace());
+ Addr->setVisibility(llvm::GlobalValue::ProtectedVisibility);
+ }
+ OMPBuilder.OffloadInfoManager.registerDeviceGlobalVarEntryInfo(
+ AddrName, VTable,
+ CGM.getDataLayout().getTypeAllocSize(VTable->getInitializer()->getType()),
+ llvm::OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirectVTable,
+ llvm::GlobalValue::WeakODRLinkage);
+}
+
+// Register VTable by scanning through the map clause of OpenMP target region.
+void CGOpenMPRuntime::registerVTable(const OMPExecutableDirective &D) {
+ // Get CXXRecordDecl and VarDecl from Expr.
+ auto getVTableDecl = [](const Expr *E) {
+ QualType VDTy = E->getType();
+ CXXRecordDecl *CXXRecord = nullptr;
+ if (const auto *RefType = VDTy->getAs<LValueReferenceType>())
+ VDTy = RefType->getPointeeType();
+ if (VDTy->isPointerType())
+ CXXRecord = VDTy->getPointeeType()->getAsCXXRecordDecl();
+ else
+ CXXRecord = VDTy->getAsCXXRecordDecl();
+
+ const VarDecl *VD = nullptr;
+ if (auto *DRE = dyn_cast<DeclRefExpr>(E))
+ VD = cast<VarDecl>(DRE->getDecl());
+ return std::pair<CXXRecordDecl *, const VarDecl *>(CXXRecord, VD);
+ };
+
+ // Emit VTable and register the VTable to OpenMP offload entry recursively.
+ std::function<void(CodeGenModule &, CXXRecordDecl *, const VarDecl *)>
+ emitAndRegisterVTable = [&emitAndRegisterVTable](CodeGenModule &CGM,
+ CXXRecordDecl *CXXRecord,
+ const VarDecl *VD) {
+ // Register C++ VTable to OpenMP Offload Entry if it's a new
+ // CXXRecordDecl.
+ if (CXXRecord && CXXRecord->isDynamicClass() &&
+ CGM.getOpenMPRuntime().VTableDeclMap.find(CXXRecord) ==
+ CGM.getOpenMPRuntime().VTableDeclMap.end()) {
+ CGM.getOpenMPRuntime().VTableDeclMap.try_emplace(CXXRecord, VD);
+ CGM.EmitVTable(CXXRecord);
+ auto VTables = CGM.getVTables();
+ auto *VTablesAddr = VTables.GetAddrOfVTable(CXXRecord);
+ if (VTablesAddr) {
+ CGM.getOpenMPRuntime().registerVTableOffloadEntry(VTablesAddr, VD);
+ }
+ // Emit VTable for all the fields containing dynamic CXXRecord
+ for (const FieldDecl *Field : CXXRecord->fields()) {
+ if (CXXRecordDecl *RecordDecl =
+ Field->getType()->getAsCXXRecordDecl()) {
+ emitAndRegisterVTable(CGM, RecordDecl, VD);
+ }
+ }
+ // Emit VTable for all dynamic parent class
+ for (CXXBaseSpecifier &Base : CXXRecord->bases()) {
+ if (CXXRecordDecl *BaseDecl =
+ Base.getType()->getAsCXXRecordDecl()) {
+ emitAndRegisterVTable(CGM, BaseDecl, VD);
+ }
+ }
+ }
+ };
+
+ // Collect VTable from OpenMP map clause.
+ for (const auto *C : D.getClausesOfKind<OMPMapClause>()) {
+ for (const auto *E : C->varlist()) {
+ auto DeclPair = getVTableDecl(E);
+ emitAndRegisterVTable(CGM, DeclPair.first, DeclPair.second);
+ }
+ }
+}
+
Address CGOpenMPRuntime::getAddrOfArtificialThreadPrivate(CodeGenFunction &CGF,
QualType VarType,
StringRef Name) {
@@ -6221,6 +6336,25 @@ void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
llvm::Function *&OutlinedFn, llvm::Constant *&OutlinedFnID,
bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
+ class OMPTargetCallCollector
+ : public RecursiveASTVisitor<OMPTargetCallCollector> {
+ public:
+ OMPTargetCallCollector(CodeGenFunction &CGF,
+ llvm::SmallPtrSetImpl<const CallExpr *> &TargetCalls)
+ : CGF(CGF), TargetCalls(TargetCalls) {}
+
+ bool VisitCallExpr(CallExpr *CE) {
+ if (!CE->getDirectCallee()) {
+ TargetCalls.insert(CE);
+ }
+ return true;
+ }
+
+ private:
+ CodeGenFunction &CGF;
+ llvm::SmallPtrSetImpl<const CallExpr *> &TargetCalls;
+ };
+
llvm::TargetRegionEntryInfo EntryInfo =
getEntryInfoFromPresumedLoc(CGM, OMPBuilder, D.getBeginLoc(), ParentName);
@@ -6229,6 +6363,16 @@ void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
[&CGF, &D, &CodeGen](StringRef EntryFnName) {
const CapturedStmt &CS = *D.getCapturedStmt(OMPD_target);
+ // Search Clang AST within "omp target" region for CallExprs.
+ // Store them in the set OMPTargetCalls (kept by CodeGenModule).
+ // This is used for the translation of indirect function calls.
+ const auto &LangOpts = CGF.getLangOpts();
+ if (LangOpts.OpenMPIsTargetDevice) {
+ // Search AST for target "CallExpr"s of "OMPTargetAutoLookup".
+ OMPTargetCallCollector Visitor(CGF, CGF.CGM.OMPTargetCalls);
+ Visitor.TraverseStmt(const_cast<Stmt*>(CS.getCapturedStmt()));
+ }
+
CGOpenMPTargetRegionInfo CGInfo(CS, CodeGen, EntryFnName);
CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);
return CGF.GenerateOpenMPCapturedStmtFunction(CS, D);
@@ -6249,6 +6393,7 @@ void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
CGM.handleAMDGPUWavesPerEUAttr(OutlinedFn, Attr);
}
}
+ registerVTable(D);
}
/// Checks if the expression is constant or does not have non-trivial function
@@ -9955,6 +10100,19 @@ void CGOpenMPRuntime::scanForTargetRegionsFunctions(const Stmt *S,
if (!S)
return;
+ // Register vtable from device for target data and target directives.
+ // Add this block here since scanForTargetRegionsFunctions ignores
+ // target data by checking if S is a executable directive (target).
+ if (isa<OMPExecutableDirective>(S) &&
+ isOpenMPTargetDataManagementDirective(
+ cast<OMPExecutableDirective>(S)->getDirectiveKind())) {
+ auto &E = *cast<OMPExecutableDirective>(S);
+ // Don't need to check if it's device compile
+ // since scanForTargetRegionsFunctions currently only called
+ // in device compilation.
+ registerVTable(E);
+ }
+
// Codegen OMP target directives that offload compute to the device.
bool RequiresDeviceCodegen =
isa<OMPExecutableDirective>(S) &&
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index eb04eceee236c..0f7937ae95c06 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -605,6 +605,9 @@ class CGOpenMPRuntime {
LValue PosLVal, const OMPTaskDataTy::DependData &Data,
Address DependenciesArray);
+ /// Keep track of VTable Declarations so we don't register duplicate VTable.
+ llvm::DenseMap<CXXRecordDecl*, const VarDecl*> VTableDeclMap;
+
public:
explicit CGOpenMPRuntime(CodeGenModule &CGM);
virtual ~CGOpenMPRuntime() {}
@@ -1111,6 +1114,16 @@ class CGOpenMPRuntime {
virtual void emitDeclareTargetFunction(const FunctionDecl *FD,
llvm::GlobalValue *GV);
+ /// Register VTable to OpenMP offload entry.
+ /// \param VTable VTable of the C++ class.
+ /// \param RD C++ class decl.
+ virtual void registerVTableOffloadEntry(llvm::GlobalVariable *VTable,
+ const VarDecl *VD);
+ /// Emit code for registering vtable by scanning through map clause
+ /// in OpenMP target region.
+ /// \param D OpenMP target directive.
+ virtual void registerVTable(const OMPExecutableDirective &D);
+
/// Creates artificial threadprivate variable with name \p Name and type \p
/// VarType.
/// \param VarType Type of the artificial threadprivate variable.
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index d72cd8fbfd608..582dd0f3ade65 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -7617,6 +7617,10 @@ void CodeGenFunction::EmitOMPUseDeviceAddrClause(
// Generate the instructions for '#pragma omp target data' directive.
void CodeGenFunction::EmitOMPTargetDataDirective(
const OMPTargetDataDirective &S) {
+ // Emit vtable only from host for target data directive.
+ if (!CGM.getLangOpts().OpenMPIsTargetDevice) {
+ CGM.getOpenMPRuntime().registerVTable(S);
+ }
CGOpenMPRuntime::TargetDataInfo Info(/*RequiresDevicePointerInfo=*/true,
/*SeparateBeginEndCalls=*/true);
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index e14e883a55ac5..de4a67db313ea 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -38,6 +38,12 @@ llvm::Constant *CodeGenModule::GetAddrOfThunk(StringRef Name, llvm::Type *FnTy,
/*DontDefer=*/true, /*IsThunk=*/true);
}
+llvm::GlobalVariable *CodeGenVTables::GetAddrOfVTable(const CXXRecordDecl *RD) {
+ llvm::GlobalVariable *VTable =
+ CGM.getCXXABI().getAddrOfVTable(RD, CharUnits());
+ return VTable;
+}
+
static void setThunkProperties(CodeGenModule &CGM, const ThunkInfo &Thunk,
llvm::Function *ThunkFn, bool ForVTable,
GlobalDecl GD) {
diff --git a/clang/lib/CodeGen/CGVTables.h b/clang/lib/CodeGen/CGVTables.h
index 5c45e355fb145..37458eee02e34 100644
--- a/clang/lib/CodeGen/CGVTables.h
+++ b/clang/lib/CodeGen/CGVTables.h
@@ -122,6 +122,10 @@ class CodeGenVTables {
llvm::GlobalVariable::LinkageTypes Linkage,
const CXXRecordDecl *RD);
+ /// GetAddrOfVTable - Get the address of the VTable for the given record
+ /// decl.
+ llvm::GlobalVariable *GetAddrOfVTable(const CXXRecordDecl *RD);
+
/// EmitThunks - Emit the associated thunks for the given global decl.
void EmitThunks(GlobalDecl GD);
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 3971b296b3f80..4ace1abcb5246 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -754,6 +754,9 @@ class CodeGenModule : public CodeGenTypeCache {
// i32 @__isPlatformVersionAtLeast(i32, i32, i32, i32)
llvm::FunctionCallee IsPlatformVersionAtLeastFn = nullptr;
+ // Store indirect CallExprs that are within an omp target region
+ llvm::SmallPtrSet<const CallExpr *, 16> OMPTargetCalls;
+
InstrProfStats &getPGOStats() { return PGOStats; }
llvm::IndexedInstrProfReader *getPGOReader() const { return PGOReader.get(); }
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 7dc2eaf1e9f75..1dbfe23cef127 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -2261,6 +2261,24 @@ CGCallee ItaniumCXXABI::getVirtualFunctionPointer(CodeGenFunction &CGF,
llvm::Type *PtrTy = CGM.GlobalsInt8PtrTy;
auto *MethodDecl = cast<CXXMethodDecl>(GD.getDecl());
llvm::Value *VTable = CGF.GetVTablePtr(This, PtrTy, MethodDecl->getParent());
+ /*
+ * For the translate of virtual functions we need to map the (potential) host vtable
+ * to the device vtable. This is done by calling the runtime function
+ * __llvm_omp_indirect_call_lookup.
+ */
+ if (CGM.getLangOpts().OpenMPIsTargetDevice) {
+ auto *NewPtrTy = CGM.VoidPtrTy;
+ llvm::Type *RtlFnArgs[] = {NewPtrTy};
+ llvm::FunctionCallee DeviceRtlFn = CGM.CreateRuntimeFunction(
+ llvm::FunctionType::get(NewPtrTy, RtlFnArgs, false),
+ "__llvm_omp_indirect_call_lookup");
+ auto *BackupTy = VTable->getType();
+ // Need to convert to generic address space
+ VTable = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(VTable, NewPtrTy);
+ VTable = CGF.EmitRuntimeCall(DeviceRtlFn, {VTable});
+ // convert to original address space
+ VTable = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(VTable, BackupTy);
+ }
uint64_t VTableIndex = CGM.getItaniumVTableContext().getMethodVTableIndex(GD);
llvm::Value *VFunc, *VTableSlotPtr = nullptr;
diff --git a/clang/test/OpenMP/target_vtable_codegen.cpp b/clang/test/OpenMP/target_vtable_codegen.cpp
new file mode 100644
index 0000000000000..276cef4eb8801
--- /dev/null
+++ b/clang/test/OpenMP/target_vtable_codegen.cpp
@@ -0,0 +1,280 @@
+///==========================================================================///
+// RUN: %clang_cc1 -DCK1 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK1 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK1
+//
+// RUN: %clang_cc1 -DCK2 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK2 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK2
+//
+// RUN: %clang_cc1 -DCK3 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK3 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK3
+//
+// RUN: %clang_cc1 -DCK4 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK4 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK4
+//
+// RUN: %clang_cc1 -DCK5 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52 -stdlib=libc++
+// RUN: %clang_cc1 -DCK5 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 -stdlib=libc++ | FileCheck %s --check-prefix=CK5
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+#ifdef CK1
+
+// Make sure both host and device compilation emit vtable for Dervied
+// CK1-DAG: $_ZN7DerivedD1Ev = comdat any
+// CK1-DAG: $_ZN7DerivedD0Ev = comdat any
+// CK1-DAG: $_ZN7Derived5BaseAEi = comdat any
+// CK1-DAG: $_ZN7Derived8DerivedBEv = comdat any
+// CK1-DAG: $_ZN7DerivedD2Ev = comdat any
+// CK1-DAG: $_ZN4BaseD2Ev = comdat any
+// CK1-DAG: $_ZTV7Derived = comdat any
+class Base {
+public:
+ virtual ~Base() = default;
+ virtual void BaseA(int a) { }
+};
+
+// CK1: @_ZTV7Derived = linkonce_odr unnamed_addr constant { [6 x ptr] }
+class Derived : public Base {
+public:
+ ~Derived(...
[truncated]
|
|
@llvm/pr-subscribers-offload Author: None (Jason-VanBeusekom) ChangesThis adds a feature to insert CPU-to-GPU function pointer translation at GPU How it works: This is the second out of Two PR's to implement this, commit 5247c1f is not a part of the Pull request and is handled in: #159856 Patch is 89.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159857.diff 21 Files Affected:
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index e6e4947882544..cc4c21a719f4c 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -6583,6 +6583,26 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType,
Address(Handle, Handle->getType(), CGM.getPointerAlign()));
Callee.setFunctionPointer(Stub);
}
+
+ // Check whether the associated CallExpr is in the set OMPTargetCalls.
+ // If YES, insert a call to devicertl function __llvm_omp_indirect_call_lookup
+ //
+ // This is used for the indriect function Case, virtual function case is
+ // handled in ItaniumCXXABI.cpp
+ if (getLangOpts().OpenMPIsTargetDevice && CGM.OMPTargetCalls.contains(E)) {
+ auto *PtrTy = CGM.VoidPtrTy;
+ llvm::Type *RtlFnArgs[] = {PtrTy};
+ llvm::FunctionCallee DeviceRtlFn = CGM.CreateRuntimeFunction(
+ llvm::FunctionType::get(PtrTy, RtlFnArgs, false),
+ "__llvm_omp_indirect_call_lookup");
+ llvm::Value *Func = Callee.getFunctionPointer();
+ llvm::Type *BackupTy = Func->getType();
+ Func = Builder.CreatePointerBitCastOrAddrSpaceCast(Func, PtrTy);
+ Func = EmitRuntimeCall(DeviceRtlFn, {Func});
+ Func = Builder.CreatePointerBitCastOrAddrSpaceCast(Func, BackupTy);
+ Callee.setFunctionPointer(Func);
+ }
+
llvm::CallBase *LocalCallOrInvoke = nullptr;
RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &LocalCallOrInvoke,
E == MustTailCall, E->getExprLoc());
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a503aaf613e30..ac1d467affc00 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -24,6 +24,7 @@
#include "clang/AST/OpenMPClause.h"
#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Basic/OpenMPKinds.h"
#include "clang/Basic/SourceManager.h"
#include "clang/CodeGen/ConstantInitBuilder.h"
@@ -1771,12 +1772,126 @@ void CGOpenMPRuntime::emitDeclareTargetFunction(const FunctionDecl *FD,
Addr->setVisibility(llvm::GlobalValue::ProtectedVisibility);
}
+ // Register the indirect Vtable:
+ // This is similar to OMPTargetGlobalVarEntryIndirect, except that the
+ // size field refers to the size of memory pointed to, not the size of
+ // the pointer symbol itself (which is implicitly the size of a pointer).
OMPBuilder.OffloadInfoManager.registerDeviceGlobalVarEntryInfo(
Name, Addr, CGM.GetTargetTypeStoreSize(CGM.VoidPtrTy).getQuantity(),
llvm::OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirect,
llvm::GlobalValue::WeakODRLinkage);
}
+void CGOpenMPRuntime::registerVTableOffloadEntry(llvm::GlobalVariable *VTable,
+ const VarDecl *VD) {
+ // TODO: add logic to avoid duplicate vtable registrations per
+ // translation unit; though for external linkage, this should no
+ // longer be an issue - or at least we can avoid the issue by
+ // checking for an existing offloading entry. But, perhaps the
+ // better approach is to defer emission of the vtables and offload
+ // entries until later (by tracking a list of items that need to be
+ // emitted).
+
+ llvm::OpenMPIRBuilder &OMPBuilder = CGM.getOpenMPRuntime().getOMPBuilder();
+
+ // Generate a new externally visible global to point to the
+ // internally visible vtable. Doing this allows us to keep the
+ // visibility and linkage of the associated vtable unchanged while
+ // allowing the runtime to access its value. The externally
+ // visible global var needs to be emitted with a unique mangled
+ // name that won't conflict with similarly named (internal)
+ // vtables in other translation units.
+
+ // Register vtable with source location of dynamic object in map
+ // clause.
+ llvm::TargetRegionEntryInfo EntryInfo = getEntryInfoFromPresumedLoc(
+ CGM, OMPBuilder, VD->getCanonicalDecl()->getBeginLoc(),
+ VTable->getName());
+
+ llvm::GlobalVariable *Addr = VTable;
+ size_t PointerSize = CGM.getDataLayout().getPointerSize();
+ SmallString<128> AddrName;
+ OMPBuilder.OffloadInfoManager.getTargetRegionEntryFnName(AddrName, EntryInfo);
+ AddrName.append("addr");
+
+ if (CGM.getLangOpts().OpenMPIsTargetDevice) {
+ Addr = new llvm::GlobalVariable(
+ CGM.getModule(), VTable->getType(),
+ /*isConstant=*/true, llvm::GlobalValue::ExternalLinkage, VTable,
+ AddrName,
+ /*InsertBefore*/ nullptr, llvm::GlobalValue::NotThreadLocal,
+ CGM.getModule().getDataLayout().getDefaultGlobalsAddressSpace());
+ Addr->setVisibility(llvm::GlobalValue::ProtectedVisibility);
+ }
+ OMPBuilder.OffloadInfoManager.registerDeviceGlobalVarEntryInfo(
+ AddrName, VTable,
+ CGM.getDataLayout().getTypeAllocSize(VTable->getInitializer()->getType()),
+ llvm::OffloadEntriesInfoManager::OMPTargetGlobalVarEntryIndirectVTable,
+ llvm::GlobalValue::WeakODRLinkage);
+}
+
+// Register VTable by scanning through the map clause of OpenMP target region.
+void CGOpenMPRuntime::registerVTable(const OMPExecutableDirective &D) {
+ // Get CXXRecordDecl and VarDecl from Expr.
+ auto getVTableDecl = [](const Expr *E) {
+ QualType VDTy = E->getType();
+ CXXRecordDecl *CXXRecord = nullptr;
+ if (const auto *RefType = VDTy->getAs<LValueReferenceType>())
+ VDTy = RefType->getPointeeType();
+ if (VDTy->isPointerType())
+ CXXRecord = VDTy->getPointeeType()->getAsCXXRecordDecl();
+ else
+ CXXRecord = VDTy->getAsCXXRecordDecl();
+
+ const VarDecl *VD = nullptr;
+ if (auto *DRE = dyn_cast<DeclRefExpr>(E))
+ VD = cast<VarDecl>(DRE->getDecl());
+ return std::pair<CXXRecordDecl *, const VarDecl *>(CXXRecord, VD);
+ };
+
+ // Emit VTable and register the VTable to OpenMP offload entry recursively.
+ std::function<void(CodeGenModule &, CXXRecordDecl *, const VarDecl *)>
+ emitAndRegisterVTable = [&emitAndRegisterVTable](CodeGenModule &CGM,
+ CXXRecordDecl *CXXRecord,
+ const VarDecl *VD) {
+ // Register C++ VTable to OpenMP Offload Entry if it's a new
+ // CXXRecordDecl.
+ if (CXXRecord && CXXRecord->isDynamicClass() &&
+ CGM.getOpenMPRuntime().VTableDeclMap.find(CXXRecord) ==
+ CGM.getOpenMPRuntime().VTableDeclMap.end()) {
+ CGM.getOpenMPRuntime().VTableDeclMap.try_emplace(CXXRecord, VD);
+ CGM.EmitVTable(CXXRecord);
+ auto VTables = CGM.getVTables();
+ auto *VTablesAddr = VTables.GetAddrOfVTable(CXXRecord);
+ if (VTablesAddr) {
+ CGM.getOpenMPRuntime().registerVTableOffloadEntry(VTablesAddr, VD);
+ }
+ // Emit VTable for all the fields containing dynamic CXXRecord
+ for (const FieldDecl *Field : CXXRecord->fields()) {
+ if (CXXRecordDecl *RecordDecl =
+ Field->getType()->getAsCXXRecordDecl()) {
+ emitAndRegisterVTable(CGM, RecordDecl, VD);
+ }
+ }
+ // Emit VTable for all dynamic parent class
+ for (CXXBaseSpecifier &Base : CXXRecord->bases()) {
+ if (CXXRecordDecl *BaseDecl =
+ Base.getType()->getAsCXXRecordDecl()) {
+ emitAndRegisterVTable(CGM, BaseDecl, VD);
+ }
+ }
+ }
+ };
+
+ // Collect VTable from OpenMP map clause.
+ for (const auto *C : D.getClausesOfKind<OMPMapClause>()) {
+ for (const auto *E : C->varlist()) {
+ auto DeclPair = getVTableDecl(E);
+ emitAndRegisterVTable(CGM, DeclPair.first, DeclPair.second);
+ }
+ }
+}
+
Address CGOpenMPRuntime::getAddrOfArtificialThreadPrivate(CodeGenFunction &CGF,
QualType VarType,
StringRef Name) {
@@ -6221,6 +6336,25 @@ void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
llvm::Function *&OutlinedFn, llvm::Constant *&OutlinedFnID,
bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
+ class OMPTargetCallCollector
+ : public RecursiveASTVisitor<OMPTargetCallCollector> {
+ public:
+ OMPTargetCallCollector(CodeGenFunction &CGF,
+ llvm::SmallPtrSetImpl<const CallExpr *> &TargetCalls)
+ : CGF(CGF), TargetCalls(TargetCalls) {}
+
+ bool VisitCallExpr(CallExpr *CE) {
+ if (!CE->getDirectCallee()) {
+ TargetCalls.insert(CE);
+ }
+ return true;
+ }
+
+ private:
+ CodeGenFunction &CGF;
+ llvm::SmallPtrSetImpl<const CallExpr *> &TargetCalls;
+ };
+
llvm::TargetRegionEntryInfo EntryInfo =
getEntryInfoFromPresumedLoc(CGM, OMPBuilder, D.getBeginLoc(), ParentName);
@@ -6229,6 +6363,16 @@ void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
[&CGF, &D, &CodeGen](StringRef EntryFnName) {
const CapturedStmt &CS = *D.getCapturedStmt(OMPD_target);
+ // Search Clang AST within "omp target" region for CallExprs.
+ // Store them in the set OMPTargetCalls (kept by CodeGenModule).
+ // This is used for the translation of indirect function calls.
+ const auto &LangOpts = CGF.getLangOpts();
+ if (LangOpts.OpenMPIsTargetDevice) {
+ // Search AST for target "CallExpr"s of "OMPTargetAutoLookup".
+ OMPTargetCallCollector Visitor(CGF, CGF.CGM.OMPTargetCalls);
+ Visitor.TraverseStmt(const_cast<Stmt*>(CS.getCapturedStmt()));
+ }
+
CGOpenMPTargetRegionInfo CGInfo(CS, CodeGen, EntryFnName);
CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);
return CGF.GenerateOpenMPCapturedStmtFunction(CS, D);
@@ -6249,6 +6393,7 @@ void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
CGM.handleAMDGPUWavesPerEUAttr(OutlinedFn, Attr);
}
}
+ registerVTable(D);
}
/// Checks if the expression is constant or does not have non-trivial function
@@ -9955,6 +10100,19 @@ void CGOpenMPRuntime::scanForTargetRegionsFunctions(const Stmt *S,
if (!S)
return;
+ // Register vtable from device for target data and target directives.
+ // Add this block here since scanForTargetRegionsFunctions ignores
+ // target data by checking if S is a executable directive (target).
+ if (isa<OMPExecutableDirective>(S) &&
+ isOpenMPTargetDataManagementDirective(
+ cast<OMPExecutableDirective>(S)->getDirectiveKind())) {
+ auto &E = *cast<OMPExecutableDirective>(S);
+ // Don't need to check if it's device compile
+ // since scanForTargetRegionsFunctions currently only called
+ // in device compilation.
+ registerVTable(E);
+ }
+
// Codegen OMP target directives that offload compute to the device.
bool RequiresDeviceCodegen =
isa<OMPExecutableDirective>(S) &&
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index eb04eceee236c..0f7937ae95c06 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -605,6 +605,9 @@ class CGOpenMPRuntime {
LValue PosLVal, const OMPTaskDataTy::DependData &Data,
Address DependenciesArray);
+ /// Keep track of VTable Declarations so we don't register duplicate VTable.
+ llvm::DenseMap<CXXRecordDecl*, const VarDecl*> VTableDeclMap;
+
public:
explicit CGOpenMPRuntime(CodeGenModule &CGM);
virtual ~CGOpenMPRuntime() {}
@@ -1111,6 +1114,16 @@ class CGOpenMPRuntime {
virtual void emitDeclareTargetFunction(const FunctionDecl *FD,
llvm::GlobalValue *GV);
+ /// Register VTable to OpenMP offload entry.
+ /// \param VTable VTable of the C++ class.
+ /// \param RD C++ class decl.
+ virtual void registerVTableOffloadEntry(llvm::GlobalVariable *VTable,
+ const VarDecl *VD);
+ /// Emit code for registering vtable by scanning through map clause
+ /// in OpenMP target region.
+ /// \param D OpenMP target directive.
+ virtual void registerVTable(const OMPExecutableDirective &D);
+
/// Creates artificial threadprivate variable with name \p Name and type \p
/// VarType.
/// \param VarType Type of the artificial threadprivate variable.
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index d72cd8fbfd608..582dd0f3ade65 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -7617,6 +7617,10 @@ void CodeGenFunction::EmitOMPUseDeviceAddrClause(
// Generate the instructions for '#pragma omp target data' directive.
void CodeGenFunction::EmitOMPTargetDataDirective(
const OMPTargetDataDirective &S) {
+ // Emit vtable only from host for target data directive.
+ if (!CGM.getLangOpts().OpenMPIsTargetDevice) {
+ CGM.getOpenMPRuntime().registerVTable(S);
+ }
CGOpenMPRuntime::TargetDataInfo Info(/*RequiresDevicePointerInfo=*/true,
/*SeparateBeginEndCalls=*/true);
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index e14e883a55ac5..de4a67db313ea 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -38,6 +38,12 @@ llvm::Constant *CodeGenModule::GetAddrOfThunk(StringRef Name, llvm::Type *FnTy,
/*DontDefer=*/true, /*IsThunk=*/true);
}
+llvm::GlobalVariable *CodeGenVTables::GetAddrOfVTable(const CXXRecordDecl *RD) {
+ llvm::GlobalVariable *VTable =
+ CGM.getCXXABI().getAddrOfVTable(RD, CharUnits());
+ return VTable;
+}
+
static void setThunkProperties(CodeGenModule &CGM, const ThunkInfo &Thunk,
llvm::Function *ThunkFn, bool ForVTable,
GlobalDecl GD) {
diff --git a/clang/lib/CodeGen/CGVTables.h b/clang/lib/CodeGen/CGVTables.h
index 5c45e355fb145..37458eee02e34 100644
--- a/clang/lib/CodeGen/CGVTables.h
+++ b/clang/lib/CodeGen/CGVTables.h
@@ -122,6 +122,10 @@ class CodeGenVTables {
llvm::GlobalVariable::LinkageTypes Linkage,
const CXXRecordDecl *RD);
+ /// GetAddrOfVTable - Get the address of the VTable for the given record
+ /// decl.
+ llvm::GlobalVariable *GetAddrOfVTable(const CXXRecordDecl *RD);
+
/// EmitThunks - Emit the associated thunks for the given global decl.
void EmitThunks(GlobalDecl GD);
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 3971b296b3f80..4ace1abcb5246 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -754,6 +754,9 @@ class CodeGenModule : public CodeGenTypeCache {
// i32 @__isPlatformVersionAtLeast(i32, i32, i32, i32)
llvm::FunctionCallee IsPlatformVersionAtLeastFn = nullptr;
+ // Store indirect CallExprs that are within an omp target region
+ llvm::SmallPtrSet<const CallExpr *, 16> OMPTargetCalls;
+
InstrProfStats &getPGOStats() { return PGOStats; }
llvm::IndexedInstrProfReader *getPGOReader() const { return PGOReader.get(); }
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 7dc2eaf1e9f75..1dbfe23cef127 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -2261,6 +2261,24 @@ CGCallee ItaniumCXXABI::getVirtualFunctionPointer(CodeGenFunction &CGF,
llvm::Type *PtrTy = CGM.GlobalsInt8PtrTy;
auto *MethodDecl = cast<CXXMethodDecl>(GD.getDecl());
llvm::Value *VTable = CGF.GetVTablePtr(This, PtrTy, MethodDecl->getParent());
+ /*
+ * For the translate of virtual functions we need to map the (potential) host vtable
+ * to the device vtable. This is done by calling the runtime function
+ * __llvm_omp_indirect_call_lookup.
+ */
+ if (CGM.getLangOpts().OpenMPIsTargetDevice) {
+ auto *NewPtrTy = CGM.VoidPtrTy;
+ llvm::Type *RtlFnArgs[] = {NewPtrTy};
+ llvm::FunctionCallee DeviceRtlFn = CGM.CreateRuntimeFunction(
+ llvm::FunctionType::get(NewPtrTy, RtlFnArgs, false),
+ "__llvm_omp_indirect_call_lookup");
+ auto *BackupTy = VTable->getType();
+ // Need to convert to generic address space
+ VTable = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(VTable, NewPtrTy);
+ VTable = CGF.EmitRuntimeCall(DeviceRtlFn, {VTable});
+ // convert to original address space
+ VTable = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(VTable, BackupTy);
+ }
uint64_t VTableIndex = CGM.getItaniumVTableContext().getMethodVTableIndex(GD);
llvm::Value *VFunc, *VTableSlotPtr = nullptr;
diff --git a/clang/test/OpenMP/target_vtable_codegen.cpp b/clang/test/OpenMP/target_vtable_codegen.cpp
new file mode 100644
index 0000000000000..276cef4eb8801
--- /dev/null
+++ b/clang/test/OpenMP/target_vtable_codegen.cpp
@@ -0,0 +1,280 @@
+///==========================================================================///
+// RUN: %clang_cc1 -DCK1 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK1 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK1
+//
+// RUN: %clang_cc1 -DCK2 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK2 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK2
+//
+// RUN: %clang_cc1 -DCK3 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK3 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK3
+//
+// RUN: %clang_cc1 -DCK4 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52
+// RUN: %clang_cc1 -DCK4 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 | FileCheck %s --check-prefix=CK4
+//
+// RUN: %clang_cc1 -DCK5 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=52 -stdlib=libc++
+// RUN: %clang_cc1 -DCK5 -verify -fopenmp -Wno-openmp-mapping -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-cuda-mode -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -debug-info-kind=limited -fopenmp-version=52 -stdlib=libc++ | FileCheck %s --check-prefix=CK5
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+#ifdef CK1
+
+// Make sure both host and device compilation emit vtable for Dervied
+// CK1-DAG: $_ZN7DerivedD1Ev = comdat any
+// CK1-DAG: $_ZN7DerivedD0Ev = comdat any
+// CK1-DAG: $_ZN7Derived5BaseAEi = comdat any
+// CK1-DAG: $_ZN7Derived8DerivedBEv = comdat any
+// CK1-DAG: $_ZN7DerivedD2Ev = comdat any
+// CK1-DAG: $_ZN4BaseD2Ev = comdat any
+// CK1-DAG: $_ZTV7Derived = comdat any
+class Base {
+public:
+ virtual ~Base() = default;
+ virtual void BaseA(int a) { }
+};
+
+// CK1: @_ZTV7Derived = linkonce_odr unnamed_addr constant { [6 x ptr] }
+class Derived : public Base {
+public:
+ ~Derived(...
[truncated]
|
jhuber6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but this is a pretty big feature addition so it should probably be brought up for group discussion. Unfortunately I think the next two meetings are cancelled because it's the IWOMP conference and F2F meeting.
clang/lib/CodeGen/CGExpr.cpp
Outdated
| llvm::Type *RtlFnArgs[] = {PtrTy}; | ||
| llvm::FunctionCallee DeviceRtlFn = CGM.CreateRuntimeFunction( | ||
| llvm::FunctionType::get(PtrTy, RtlFnArgs, false), | ||
| "__llvm_omp_indirect_call_lookup"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are going to be permanent API functions we should probably rename them to something more in-line with the other API functions. We use __kmpc_ as a prefix, though the name is pretty outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang/lib/CodeGen/CGExpr.cpp
Outdated
| } | ||
|
|
||
| // Check whether the associated CallExpr is in the set OMPTargetCalls. | ||
| // If YES, insert a call to devicertl function __llvm_omp_indirect_call_lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment could be more straightforward, just insert functoin pointer lookup if this is a target call or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced comment with suggested comment in 11b1f08.
| if (!CE->getDirectCallee()) { | ||
| TargetCalls.insert(CE); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!CE->getDirectCallee()) { | |
| TargetCalls.insert(CE); | |
| } | |
| if (!CE->getDirectCallee()) | |
| TargetCalls.insert(CE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 11b1f08
clang/lib/CodeGen/ItaniumCXXABI.cpp
Outdated
| /* | ||
| * For the translate of virtual functions we need to map the (potential) host vtable | ||
| * to the device vtable. This is done by calling the runtime function | ||
| * __llvm_omp_indirect_call_lookup. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect comment style and grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 11b1f08
This and #159856 were discussed in the OpenMP meeting on 10/22/25 with no objections. |
Key Changes: -Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations -Modified setupIndirectCallTable to support both VTable entries and indirect function pointers This is commit (1/3) to support indirect call and virtual function mapping to the device: Register Vtable PR (2/3): llvm#159856 Codegen / _llvm_omp_indirect_call_lookup PR (3/3): llvm#159857
…#167011) This is a branch off of #159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3 is not a part of this PR and is handled / reviewed in: #159856, This is PR (2/3) Register Vtable PR (1/3): #159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): #159857
…lls runtime (#167011) This is a branch off of llvm/llvm-project#159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3f20e166d4fb9328e6f0bc0742cd0afa31 is not a part of this PR and is handled / reviewed in: llvm/llvm-project#159856, This is PR (2/3) Register Vtable PR (1/3): llvm/llvm-project#159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): llvm/llvm-project#159857
…llvm#167011) This is a branch off of llvm#159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3 is not a part of this PR and is handled / reviewed in: llvm#159856, This is PR (2/3) Register Vtable PR (1/3): llvm#159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): llvm#159857
…llvm#167011) This is a branch off of llvm#159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3 is not a part of this PR and is handled / reviewed in: llvm#159856, This is PR (2/3) Register Vtable PR (1/3): llvm#159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): llvm#159857
This adds a feature to insert CPU-to-GPU function pointer translation at GPU
call-sites
How it works:
This is PR(3/3)
Runtime / indirect call table setup Pr(2/3): #167011
Register Vtable PR (1/3): #159856