-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[ThinLTO] Properly support targets that require importing all external functions #133588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…l functions `ForceImportAll` mostly works, but it uses `available_externally` linkage. For targets that don't support object linking, like AMDGPU, this approach doesn't work. With this PR, all imported functions use "default" linkage. I'm not entirely sure if that's the best approach, but for AMDGPU it's not really a problem since we internalize everything at a later stage.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-lto Author: Shilei Tian (shiltian) Changes
With this PR, all imported functions use "default" linkage. I'm not entirely Full diff: https://github.com/llvm/llvm-project/pull/133588.diff 5 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/FunctionImportUtils.h b/llvm/include/llvm/Transforms/Utils/FunctionImportUtils.h
index 6d83b615d5f13..2be0a20c8afa8 100644
--- a/llvm/include/llvm/Transforms/Utils/FunctionImportUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/FunctionImportUtils.h
@@ -16,10 +16,13 @@
#include "llvm/ADT/SetVector.h"
#include "llvm/IR/ModuleSummaryIndex.h"
+#include "llvm/Support/CommandLine.h"
namespace llvm {
class Module;
+extern cl::opt<bool> ForceImportAll;
+
/// Class to handle necessary GlobalValue changes required by ThinLTO
/// function importing, including linkage changes and any necessary renaming.
class FunctionImportGlobalProcessing {
diff --git a/llvm/lib/Transforms/IPO/FunctionImport.cpp b/llvm/lib/Transforms/IPO/FunctionImport.cpp
index f0daf1a558316..41b2b53959224 100644
--- a/llvm/lib/Transforms/IPO/FunctionImport.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionImport.cpp
@@ -79,10 +79,6 @@ static cl::opt<int> ImportCutoff(
"import-cutoff", cl::init(-1), cl::Hidden, cl::value_desc("N"),
cl::desc("Only import first N functions if N>=0 (default -1)"));
-static cl::opt<bool>
- ForceImportAll("force-import-all", cl::init(false), cl::Hidden,
- cl::desc("Import functions with noinline attribute"));
-
static cl::opt<float>
ImportInstrFactor("import-instr-evolution-factor", cl::init(0.7),
cl::Hidden, cl::value_desc("x"),
@@ -1663,7 +1659,8 @@ void llvm::thinLTOFinalizeInModule(Module &TheModule,
// interposable property and possibly get inlined. Simply drop
// the definition in that case.
if (GlobalValue::isAvailableExternallyLinkage(NewLinkage) &&
- GlobalValue::isInterposableLinkage(GV.getLinkage())) {
+ GlobalValue::isInterposableLinkage(GV.getLinkage()) &&
+ !ForceImportAll) {
if (!convertToDeclaration(GV))
// FIXME: Change this to collect replaced GVs and later erase
// them from the parent module once thinLTOResolvePrevailingGUID is
@@ -1680,11 +1677,12 @@ void llvm::thinLTOFinalizeInModule(Module &TheModule,
assert(GV.canBeOmittedFromSymbolTable());
GV.setVisibility(GlobalValue::HiddenVisibility);
}
-
- LLVM_DEBUG(dbgs() << "ODR fixing up linkage for `" << GV.getName()
- << "` from " << GV.getLinkage() << " to " << NewLinkage
- << "\n");
- GV.setLinkage(NewLinkage);
+ if (!ForceImportAll) {
+ LLVM_DEBUG(dbgs() << "ODR fixing up linkage for `" << GV.getName()
+ << "` from " << GV.getLinkage() << " to "
+ << NewLinkage << "\n");
+ GV.setLinkage(NewLinkage);
+ }
}
// Remove declarations from comdats, including available_externally
// as this is a declaration for the linker, and will be dropped eventually.
diff --git a/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp b/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp
index ae1af943bc11c..c93fa6f7fbfd8 100644
--- a/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp
+++ b/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp
@@ -12,7 +12,7 @@
//===----------------------------------------------------------------------===//
#include "llvm/Transforms/Utils/FunctionImportUtils.h"
-#include "llvm/Support/CommandLine.h"
+
using namespace llvm;
/// Uses the "source_filename" instead of a Module hash ID for the suffix of
@@ -24,6 +24,12 @@ static cl::opt<bool> UseSourceFilenameForPromotedLocals(
"This requires that the source filename has a unique name / "
"path to avoid name collisions."));
+namespace llvm {
+cl::opt<bool>
+ ForceImportAll("force-import-all", cl::init(false), cl::Hidden,
+ cl::desc("Import functions with noinline attribute"));
+}
+
/// Checks if we should import SGV as a definition, otherwise import as a
/// declaration.
bool FunctionImportGlobalProcessing::doImportAsDefinition(
@@ -147,7 +153,8 @@ FunctionImportGlobalProcessing::getLinkage(const GlobalValue *SGV,
// and/or optimization, but are turned into declarations later
// during the EliminateAvailableExternally pass.
if (doImportAsDefinition(SGV) && !isa<GlobalAlias>(SGV))
- return GlobalValue::AvailableExternallyLinkage;
+ return ForceImportAll ? GlobalValue::ExternalLinkage
+ : GlobalValue::AvailableExternallyLinkage;
// An imported external declaration stays external.
return SGV->getLinkage();
@@ -175,7 +182,7 @@ FunctionImportGlobalProcessing::getLinkage(const GlobalValue *SGV,
// equivalent, so the issue described above for weak_any does not exist,
// and the definition can be imported. It can be treated similarly
// to an imported externally visible global value.
- if (doImportAsDefinition(SGV) && !isa<GlobalAlias>(SGV))
+ if (doImportAsDefinition(SGV) && !isa<GlobalAlias>(SGV) && !ForceImportAll)
return GlobalValue::AvailableExternallyLinkage;
else
return GlobalValue::ExternalLinkage;
@@ -193,7 +200,8 @@ FunctionImportGlobalProcessing::getLinkage(const GlobalValue *SGV,
// If we are promoting the local to global scope, it is handled
// similarly to a normal externally visible global.
if (DoPromote) {
- if (doImportAsDefinition(SGV) && !isa<GlobalAlias>(SGV))
+ if (doImportAsDefinition(SGV) && !isa<GlobalAlias>(SGV) &&
+ !ForceImportAll)
return GlobalValue::AvailableExternallyLinkage;
else
return GlobalValue::ExternalLinkage;
diff --git a/llvm/test/Transforms/FunctionImport/adjustable_threshold.ll b/llvm/test/Transforms/FunctionImport/adjustable_threshold.ll
index f60758c3d3bf4..5a4a275f449d1 100644
--- a/llvm/test/Transforms/FunctionImport/adjustable_threshold.ll
+++ b/llvm/test/Transforms/FunctionImport/adjustable_threshold.ll
@@ -15,11 +15,11 @@
; RUN: opt -passes=function-import -summary-file %t3.thinlto.bc %t.bc \
; RUN: -import-instr-limit=1 -force-import-all -S \
; RUN: | FileCheck %s --check-prefix=IMPORTALL
-; IMPORTALL-DAG: define available_externally void @globalfunc1()
-; IMPORTALL-DAG: define available_externally void @trampoline()
-; IMPORTALL-DAG: define available_externally void @largefunction()
-; IMPORTALL-DAG: define available_externally hidden void @staticfunc2.llvm.0()
-; IMPORTALL-DAG: define available_externally void @globalfunc2()
+; IMPORTALL-DAG: define void @globalfunc1()
+; IMPORTALL-DAG: define void @trampoline()
+; IMPORTALL-DAG: define void @largefunction()
+; IMPORTALL-DAG: define hidden void @staticfunc2.llvm.0()
+; IMPORTALL-DAG: define void @globalfunc2()
declare void @globalfunc1()
declare void @globalfunc2()
diff --git a/llvm/test/Transforms/FunctionImport/noinline.ll b/llvm/test/Transforms/FunctionImport/noinline.ll
index 8687f1b3b5f7c..eef4d6b5c25c8 100644
--- a/llvm/test/Transforms/FunctionImport/noinline.ll
+++ b/llvm/test/Transforms/FunctionImport/noinline.ll
@@ -19,5 +19,5 @@ entry:
}
; NOIMPORT: declare void @foo(ptr)
-; IMPORT: define available_externally void @foo
+; IMPORT: define void @foo
declare void @foo(ptr) #1
|
|
https://reviews.llvm.org/D99683 added
Why is there such a limitation and would it be better to lift the limitation? |
|
Looks like this flag was added several years ago for AMDGPU (https://reviews.llvm.org/D99683). Do you know why it was sufficient then but not now? While this was originally added for AMDGPU I'm concerned about changing the linkage type as other use cases of this option may have appeared, and this is inconsistent with the linkage type normally used for ThinLTO importing, and would result in linker multiply defined errors in the usual case. If you aren't able to link objects, why not just use full LTO, which is what this essentially becomes? |
From what I understand, seeing all device code is necessary for resource usage analysis, which is required to compute occupancy. Occupancy determines the maximum number of threads a thread block can have. At runtime, when a kernel is launched, the thread block size must not exceed the limit based on that occupancy.
Ideally, yes, but in practice, probably not. Technically, we could support external function calls by conservatively assuming the callee uses all possible resources. But that would severely degrade performance. Since GPUs are primarily used for performance workloads, this kind of support may not be practically useful. |
I'm not sure if ThinLTO actually worked on AMDGPU back then. At least, it doesn't seem to be working at the moment. I'd assume @yxsamliu might have more insight on this.
Hmm, that's a valid concern, but as far as I can tell, unless someone explicitly sets this flag, it's only enabled for AMDGPU. I'm not sure if there are any other targets that support global variable object linking but not function object linking.
We have full LTO enabled by default in GPU RDC mode, but it's extremely expensive. If the entire linked module ends up as a single SCC, then there's really no benefit to using ThinLTO. However, if there are multiple SCCs, ThinLTO can be quite beneficial. There are also other related efforts, like module splitting, but some benchmark results contributed by the community suggest that ThinLTO can still outperform full LTO by up to 9X. |
There are 2 issues. The primary issue is reporting register and other resource usage. We need some kind of module summary in the linked module to avoid assuming an unacceptable worst case scenario in the presence of any external calls. A second issue is we don't have a way to codegen LDS which is referenced in external modules, but this is more of a niche usage that doesn't appear that often. |
It worked for some HIP apps. I am still not clear about what issue the linkage available_externally causes. Can you give an example? Thanks. |
|
Functions with |
what if the same callee is called by more than one callers in two different object files? If global linkage is used, will that cause duplicate symbols in the final executable? Can we use linkonce_odr linkage instead so that duplicate symbols are allowed or merged? |
|
After a second thought, I think the case mentioned above should not happen, if the optimization is right. For AMDGPU, we have internalization so even though if they are marked as However, to make this practically useful for GPU, we might want to mark all imported symbols as internal. We really want to see all the call sites, but I think that might be beyond the general scope of ThinLTO. |
Does this patch change the linkage of both functions and variables? Duplicating variables may not work. |
|
I'm gonna use something else to fix the issue. Close this one for now. |

ForceImportAllmostly works, but it usesavailable_externallylinkage. Fortargets that don't support object linking, like AMDGPU, this approach doesn't
work.
With this PR, all imported functions use "default" linkage. I'm not entirely
sure if that's the best approach, but for AMDGPU it's not really a problem since
we internalize everything at a later stage.