-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[Clang]: Support opt-in speculative devirtualization #159685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
1cb42ab to
e4ff7b3
Compare
shafik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a summary describing what this does, why it makes sense and how you are implementing at a high level? Also, if you are looking for some specific feedback, please specify.
9354a21 to
415b431
Compare
|
Hi @shafik |
|
@llvm/pr-subscribers-llvm-transforms Author: Hassnaa Hamdi (hassnaaHamdi) ChangesThis patch enables speculative devirtualization feature in Clang/frontend.
Patch is 37.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159685.diff 16 Files Affected:
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index a8bbf146431ea..241f374bcf4c3 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2313,6 +2313,13 @@ are listed below.
This enables better devirtualization. Turned off by default, because it is
still experimental.
+.. option:: -fdevirtualize-speculatively
+
+ Enable speculative devirtualization optimization, such as single-implementation
+ devirtualization. This optimization is used out of LTO mode for now.
+ Turned off by default.
+ TODO: Enable for LTO mode.
+
.. option:: -fwhole-program-vtables
Enable whole-program vtable optimizations, such as single-implementation
@@ -5161,6 +5168,8 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
+ -fdevirtualize-speculatively
+ Enables speculative devirtualization optimization.
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 872f73ebf3810..38174cf13cadf 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -358,6 +358,8 @@ VALUE_CODEGENOPT(WarnStackSize , 32, UINT_MAX, Benign) ///< Set via -fwarn-s
CODEGENOPT(NoStackArgProbe, 1, 0, Benign) ///< Set when -mno-stack-arg-probe is used
CODEGENOPT(EmitLLVMUseLists, 1, 0, Benign) ///< Control whether to serialize use-lists.
+CODEGENOPT(DevirtualizeSpeculatively, 1, 0, Benign) ///< Whether to apply the speculative
+ /// devirtualization optimization.
CODEGENOPT(WholeProgramVTables, 1, 0, Benign) ///< Whether to apply whole-program
/// vtable optimization.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 47d328f862e07..a3656b80cac32 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4432,6 +4432,13 @@ defm new_infallible : BoolFOption<"new-infallible",
BothFlags<[], [ClangOption, CC1Option],
" treating throwing global C++ operator new as always returning valid memory "
"(annotates with __attribute__((returns_nonnull)) and throw()). This is detectable in source.">>;
+defm devirtualize_speculatively
+ : BoolFOption<"devirtualize-speculatively",
+ CodeGenOpts<"DevirtualizeSpeculatively">, DefaultFalse,
+ PosFlag<SetTrue, [], [],
+ "Enables speculative devirtualization optimization.">,
+ NegFlag<SetFalse>,
+ BothFlags<[], [ClangOption, CLOption, CC1Option]>>;
defm whole_program_vtables : BoolFOption<"whole-program-vtables",
CodeGenOpts<"WholeProgramVTables">, DefaultFalse,
PosFlag<SetTrue, [], [ClangOption, CC1Option],
@@ -7012,9 +7019,8 @@ defm variable_expansion_in_unroller : BooleanFFlag<"variable-expansion-in-unroll
Group<clang_ignored_gcc_optimization_f_Group>;
defm web : BooleanFFlag<"web">, Group<clang_ignored_gcc_optimization_f_Group>;
defm whole_program : BooleanFFlag<"whole-program">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize : BooleanFFlag<"devirtualize">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize_speculatively : BooleanFFlag<"devirtualize-speculatively">,
- Group<clang_ignored_gcc_optimization_f_Group>;
+defm devirtualize : BooleanFFlag<"devirtualize">,
+ Group<clang_ignored_gcc_optimization_f_Group>;
// Generic gfortran options.
def A_DASH : Joined<["-"], "A-">, Group<gfortran_Group>;
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 8c99af2bdff83..790467dc557a5 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -907,6 +907,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+ PTO.DevirtualizeSpeculatively = CodeGenOpts.DevirtualizeSpeculatively;
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 8346ee3aa6a8d..bf1724e347a7f 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -2771,10 +2771,11 @@ void CodeGenFunction::EmitTypeMetadataCodeForVCall(const CXXRecordDecl *RD,
SourceLocation Loc) {
if (SanOpts.has(SanitizerKind::CFIVCall))
EmitVTablePtrCheckForCall(RD, VTable, CodeGenFunction::CFITCK_VCall, Loc);
- else if (CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type test assumes if we are forcing public
- // visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD)) {
+ else if ((CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type test assumes if we are forcing public
+ // visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CanQualType Ty = CGM.getContext().getCanonicalTagType(RD);
llvm::Metadata *MD = CGM.CreateMetadataIdentifierForType(Ty);
llvm::Value *TypeId =
@@ -2932,8 +2933,9 @@ void CodeGenFunction::EmitVTablePtrCheck(const CXXRecordDecl *RD,
}
bool CodeGenFunction::ShouldEmitVTableTypeCheckedLoad(const CXXRecordDecl *RD) {
- if (!CGM.getCodeGenOpts().WholeProgramVTables ||
- !CGM.HasHiddenLTOVisibility(RD))
+ if ((!CGM.getCodeGenOpts().WholeProgramVTables ||
+ !CGM.HasHiddenLTOVisibility(RD)) &&
+ !CGM.getCodeGenOpts().DevirtualizeSpeculatively)
return false;
if (CGM.getCodeGenOpts().VirtualFunctionElimination)
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index e14e883a55ac5..959ba2031acf4 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -1358,10 +1358,12 @@ llvm::GlobalObject::VCallVisibility CodeGenModule::GetVCallVisibilityLevel(
void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
llvm::GlobalVariable *VTable,
const VTableLayout &VTLayout) {
- // Emit type metadata on vtables with LTO or IR instrumentation.
+ // Emit type metadata on vtables with LTO or IR instrumentation or
+ // speculative devirtualization.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
- if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
+ if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
+ !getCodeGenOpts().DevirtualizeSpeculatively)
return;
CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 7dc2eaf1e9f75..71d36b5d6b153 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -717,9 +717,10 @@ CGCallee ItaniumCXXABI::EmitLoadOfMemberFunctionPointer(
bool ShouldEmitVFEInfo = CGM.getCodeGenOpts().VirtualFunctionElimination &&
CGM.HasHiddenLTOVisibility(RD);
bool ShouldEmitWPDInfo =
- CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type tests if we are forcing public visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD);
+ (CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type tests if we are forcing public visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively;
llvm::Value *VirtualFn = nullptr;
{
@@ -2114,13 +2115,15 @@ void ItaniumCXXABI::emitVTableDefinitions(CodeGenVTables &CGVT,
// definitions to ensure we associate derived classes with base classes
// defined in headers but with a strong definition only in a shared library.
if (!VTable->isDeclarationForLinker() ||
- CGM.getCodeGenOpts().WholeProgramVTables) {
+ CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CGM.EmitVTableTypeMetadata(RD, VTable, VTLayout);
// For available_externally definitions, add the vtable to
// @llvm.compiler.used so that it isn't deleted before whole program
// analysis.
if (VTable->isDeclarationForLinker()) {
- assert(CGM.getCodeGenOpts().WholeProgramVTables);
+ assert(CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively);
CGM.addCompilerUsedGlobal(VTable);
}
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 63efb0f02baa8..d91afcbf3fd6c 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7785,6 +7785,13 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
addOpenMPHostOffloadingArgs(C, JA, Args, CmdArgs);
+ // Temporarily disable this for LTO if it's not explicitly enabled.
+ // TODO: enable it by default for LTO also.
+ if (Args.hasFlag(options::OPT_fdevirtualize_speculatively,
+ options::OPT_fno_devirtualize_speculatively,
+ /*Default value*/ false))
+ CmdArgs.push_back("-fdevirtualize-speculatively");
+
bool VirtualFunctionElimination =
Args.hasFlag(options::OPT_fvirtual_function_elimination,
options::OPT_fno_virtual_function_elimination, false);
diff --git a/clang/test/CodeGenCXX/type-metadata.cpp b/clang/test/CodeGenCXX/type-metadata.cpp
index 1cb2fed8db3e6..61d36204942dc 100644
--- a/clang/test/CodeGenCXX/type-metadata.cpp
+++ b/clang/test/CodeGenCXX/type-metadata.cpp
@@ -14,6 +14,9 @@
// RUN: %clang_cc1 -O2 -flto -flto-unit -triple x86_64-unknown-linux -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=ITANIUM-OPT --check-prefix=ITANIUM-OPT-LAYOUT %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TT-MS %s
+// Test for the speculative devirtualization feature in nonlto mode:
+// RUN: %clang_cc1 -triple x86_64-unknown-linux -fdevirtualize-speculatively -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT %s
+
// Tests for cfi + whole-program-vtables:
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-unknown-linux -fvisibility=hidden -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=ITANIUM-HIDDEN --check-prefix=ITANIUM-COMMON-MD --check-prefix=TC-ITANIUM --check-prefix=ITANIUM-NO-RV-MD %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TC-MS %s
@@ -178,6 +181,7 @@ void af(A *a) {
// TT-ITANIUM-HIDDEN: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-ITANIUM-DEFAULT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-MS: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TC-ITANIUM: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-ITANIUM-RV: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-MS: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -212,6 +216,7 @@ void df1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -224,6 +229,7 @@ void dg1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUB@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTS1B")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTS1B")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUB@@")
@@ -236,6 +242,7 @@ void dh1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 16, metadata ![[DTYPE]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE:[0-9]+]])
@@ -297,6 +304,7 @@ void f(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@test2@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTSN5test21DE")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTSN5test21DE")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@test2@@")
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index eb3994ddabcd3..1e6e1253a7a09 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -371,7 +371,6 @@
// RUN: -ftree-ter \
// RUN: -ftree-vrp \
// RUN: -fno-devirtualize \
-// RUN: -fno-devirtualize-speculatively \
// RUN: -fslp-vectorize-aggressive \
// RUN: -fno-slp-vectorize-aggressive \
// RUN: %s 2>&1 | FileCheck --check-prefix=CHECK-WARNING %s
@@ -430,7 +429,6 @@
// CHECK-WARNING-DAG: optimization flag '-ftree-ter' is not supported
// CHECK-WARNING-DAG: optimization flag '-ftree-vrp' is not supported
// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize' is not supported
-// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize-speculatively' is not supported
// CHECK-WARNING-DAG: the flag '-fslp-vectorize-aggressive' has been deprecated and will be ignored
// CHECK-WARNING-DAG: the flag '-fno-slp-vectorize-aggressive' has been deprecated and will be ignored
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 2742ec1b71b7e..c1fcde24cc420 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -102,6 +102,10 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;
+
+ // Tuning option to enable/disable speculative devirtualization.
+ // Its default value is false.
+ bool DevirtualizeSpeculatively;
};
/// This class provides access to building LLVM's passes.
diff --git a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
index 7a03405b4f462..2e33a4098be1b 100644
--- a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
+++ b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
@@ -226,11 +226,14 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
+ bool DevirtSpeculatively = false;
WholeProgramDevirtPass()
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
- const ModuleSummaryIndex *ImportSummary)
- : ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
+ const ModuleSummaryIndex *ImportSummary,
+ bool DevirtSpeculatively = false)
+ : ExportSummary(ExportSummary), ImportSummary(ImportSummary),
+ DevirtSpeculatively(DevirtSpeculatively) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 79642e650ac83..bbaa7493445a3 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -322,6 +322,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
+ DevirtualizeSpeculatively = false;
}
namespace llvm {
@@ -1635,6 +1636,24 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());
+ if (PTO.DevirtualizeSpeculatively && LTOPhase == ThinOrFullLTOPhase::None) {
+ MPM.addPass(WholeProgramDevirtPass(
+ /*ExportSummary*/ nullptr,
+ /*ImportSummary*/ nullptr,
+ /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively));
+ MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
+ lowertypetests::DropTestKind::Assume));
+ if (EnableModuleInliner) {
+ MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
+ UseInlineAdvisor,
+ ThinOrFullLTOPhase::None));
+ } else {
+ MPM.addPass(ModuleInlinerWrapperPass(
+ getInlineParamsFromOp...
[truncated]
|
|
@llvm/pr-subscribers-clang-codegen Author: Hassnaa Hamdi (hassnaaHamdi) ChangesThis patch enables speculative devirtualization feature in Clang/frontend.
Patch is 37.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159685.diff 16 Files Affected:
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index a8bbf146431ea..241f374bcf4c3 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2313,6 +2313,13 @@ are listed below.
This enables better devirtualization. Turned off by default, because it is
still experimental.
+.. option:: -fdevirtualize-speculatively
+
+ Enable speculative devirtualization optimization, such as single-implementation
+ devirtualization. This optimization is used out of LTO mode for now.
+ Turned off by default.
+ TODO: Enable for LTO mode.
+
.. option:: -fwhole-program-vtables
Enable whole-program vtable optimizations, such as single-implementation
@@ -5161,6 +5168,8 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
+ -fdevirtualize-speculatively
+ Enables speculative devirtualization optimization.
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 872f73ebf3810..38174cf13cadf 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -358,6 +358,8 @@ VALUE_CODEGENOPT(WarnStackSize , 32, UINT_MAX, Benign) ///< Set via -fwarn-s
CODEGENOPT(NoStackArgProbe, 1, 0, Benign) ///< Set when -mno-stack-arg-probe is used
CODEGENOPT(EmitLLVMUseLists, 1, 0, Benign) ///< Control whether to serialize use-lists.
+CODEGENOPT(DevirtualizeSpeculatively, 1, 0, Benign) ///< Whether to apply the speculative
+ /// devirtualization optimization.
CODEGENOPT(WholeProgramVTables, 1, 0, Benign) ///< Whether to apply whole-program
/// vtable optimization.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 47d328f862e07..a3656b80cac32 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4432,6 +4432,13 @@ defm new_infallible : BoolFOption<"new-infallible",
BothFlags<[], [ClangOption, CC1Option],
" treating throwing global C++ operator new as always returning valid memory "
"(annotates with __attribute__((returns_nonnull)) and throw()). This is detectable in source.">>;
+defm devirtualize_speculatively
+ : BoolFOption<"devirtualize-speculatively",
+ CodeGenOpts<"DevirtualizeSpeculatively">, DefaultFalse,
+ PosFlag<SetTrue, [], [],
+ "Enables speculative devirtualization optimization.">,
+ NegFlag<SetFalse>,
+ BothFlags<[], [ClangOption, CLOption, CC1Option]>>;
defm whole_program_vtables : BoolFOption<"whole-program-vtables",
CodeGenOpts<"WholeProgramVTables">, DefaultFalse,
PosFlag<SetTrue, [], [ClangOption, CC1Option],
@@ -7012,9 +7019,8 @@ defm variable_expansion_in_unroller : BooleanFFlag<"variable-expansion-in-unroll
Group<clang_ignored_gcc_optimization_f_Group>;
defm web : BooleanFFlag<"web">, Group<clang_ignored_gcc_optimization_f_Group>;
defm whole_program : BooleanFFlag<"whole-program">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize : BooleanFFlag<"devirtualize">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize_speculatively : BooleanFFlag<"devirtualize-speculatively">,
- Group<clang_ignored_gcc_optimization_f_Group>;
+defm devirtualize : BooleanFFlag<"devirtualize">,
+ Group<clang_ignored_gcc_optimization_f_Group>;
// Generic gfortran options.
def A_DASH : Joined<["-"], "A-">, Group<gfortran_Group>;
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 8c99af2bdff83..790467dc557a5 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -907,6 +907,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+ PTO.DevirtualizeSpeculatively = CodeGenOpts.DevirtualizeSpeculatively;
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 8346ee3aa6a8d..bf1724e347a7f 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -2771,10 +2771,11 @@ void CodeGenFunction::EmitTypeMetadataCodeForVCall(const CXXRecordDecl *RD,
SourceLocation Loc) {
if (SanOpts.has(SanitizerKind::CFIVCall))
EmitVTablePtrCheckForCall(RD, VTable, CodeGenFunction::CFITCK_VCall, Loc);
- else if (CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type test assumes if we are forcing public
- // visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD)) {
+ else if ((CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type test assumes if we are forcing public
+ // visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CanQualType Ty = CGM.getContext().getCanonicalTagType(RD);
llvm::Metadata *MD = CGM.CreateMetadataIdentifierForType(Ty);
llvm::Value *TypeId =
@@ -2932,8 +2933,9 @@ void CodeGenFunction::EmitVTablePtrCheck(const CXXRecordDecl *RD,
}
bool CodeGenFunction::ShouldEmitVTableTypeCheckedLoad(const CXXRecordDecl *RD) {
- if (!CGM.getCodeGenOpts().WholeProgramVTables ||
- !CGM.HasHiddenLTOVisibility(RD))
+ if ((!CGM.getCodeGenOpts().WholeProgramVTables ||
+ !CGM.HasHiddenLTOVisibility(RD)) &&
+ !CGM.getCodeGenOpts().DevirtualizeSpeculatively)
return false;
if (CGM.getCodeGenOpts().VirtualFunctionElimination)
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index e14e883a55ac5..959ba2031acf4 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -1358,10 +1358,12 @@ llvm::GlobalObject::VCallVisibility CodeGenModule::GetVCallVisibilityLevel(
void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
llvm::GlobalVariable *VTable,
const VTableLayout &VTLayout) {
- // Emit type metadata on vtables with LTO or IR instrumentation.
+ // Emit type metadata on vtables with LTO or IR instrumentation or
+ // speculative devirtualization.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
- if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
+ if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
+ !getCodeGenOpts().DevirtualizeSpeculatively)
return;
CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 7dc2eaf1e9f75..71d36b5d6b153 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -717,9 +717,10 @@ CGCallee ItaniumCXXABI::EmitLoadOfMemberFunctionPointer(
bool ShouldEmitVFEInfo = CGM.getCodeGenOpts().VirtualFunctionElimination &&
CGM.HasHiddenLTOVisibility(RD);
bool ShouldEmitWPDInfo =
- CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type tests if we are forcing public visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD);
+ (CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type tests if we are forcing public visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively;
llvm::Value *VirtualFn = nullptr;
{
@@ -2114,13 +2115,15 @@ void ItaniumCXXABI::emitVTableDefinitions(CodeGenVTables &CGVT,
// definitions to ensure we associate derived classes with base classes
// defined in headers but with a strong definition only in a shared library.
if (!VTable->isDeclarationForLinker() ||
- CGM.getCodeGenOpts().WholeProgramVTables) {
+ CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CGM.EmitVTableTypeMetadata(RD, VTable, VTLayout);
// For available_externally definitions, add the vtable to
// @llvm.compiler.used so that it isn't deleted before whole program
// analysis.
if (VTable->isDeclarationForLinker()) {
- assert(CGM.getCodeGenOpts().WholeProgramVTables);
+ assert(CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively);
CGM.addCompilerUsedGlobal(VTable);
}
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 63efb0f02baa8..d91afcbf3fd6c 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7785,6 +7785,13 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
addOpenMPHostOffloadingArgs(C, JA, Args, CmdArgs);
+ // Temporarily disable this for LTO if it's not explicitly enabled.
+ // TODO: enable it by default for LTO also.
+ if (Args.hasFlag(options::OPT_fdevirtualize_speculatively,
+ options::OPT_fno_devirtualize_speculatively,
+ /*Default value*/ false))
+ CmdArgs.push_back("-fdevirtualize-speculatively");
+
bool VirtualFunctionElimination =
Args.hasFlag(options::OPT_fvirtual_function_elimination,
options::OPT_fno_virtual_function_elimination, false);
diff --git a/clang/test/CodeGenCXX/type-metadata.cpp b/clang/test/CodeGenCXX/type-metadata.cpp
index 1cb2fed8db3e6..61d36204942dc 100644
--- a/clang/test/CodeGenCXX/type-metadata.cpp
+++ b/clang/test/CodeGenCXX/type-metadata.cpp
@@ -14,6 +14,9 @@
// RUN: %clang_cc1 -O2 -flto -flto-unit -triple x86_64-unknown-linux -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=ITANIUM-OPT --check-prefix=ITANIUM-OPT-LAYOUT %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TT-MS %s
+// Test for the speculative devirtualization feature in nonlto mode:
+// RUN: %clang_cc1 -triple x86_64-unknown-linux -fdevirtualize-speculatively -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT %s
+
// Tests for cfi + whole-program-vtables:
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-unknown-linux -fvisibility=hidden -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=ITANIUM-HIDDEN --check-prefix=ITANIUM-COMMON-MD --check-prefix=TC-ITANIUM --check-prefix=ITANIUM-NO-RV-MD %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TC-MS %s
@@ -178,6 +181,7 @@ void af(A *a) {
// TT-ITANIUM-HIDDEN: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-ITANIUM-DEFAULT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-MS: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TC-ITANIUM: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-ITANIUM-RV: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-MS: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -212,6 +216,7 @@ void df1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -224,6 +229,7 @@ void dg1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUB@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTS1B")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTS1B")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUB@@")
@@ -236,6 +242,7 @@ void dh1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 16, metadata ![[DTYPE]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE:[0-9]+]])
@@ -297,6 +304,7 @@ void f(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@test2@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTSN5test21DE")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTSN5test21DE")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@test2@@")
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index eb3994ddabcd3..1e6e1253a7a09 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -371,7 +371,6 @@
// RUN: -ftree-ter \
// RUN: -ftree-vrp \
// RUN: -fno-devirtualize \
-// RUN: -fno-devirtualize-speculatively \
// RUN: -fslp-vectorize-aggressive \
// RUN: -fno-slp-vectorize-aggressive \
// RUN: %s 2>&1 | FileCheck --check-prefix=CHECK-WARNING %s
@@ -430,7 +429,6 @@
// CHECK-WARNING-DAG: optimization flag '-ftree-ter' is not supported
// CHECK-WARNING-DAG: optimization flag '-ftree-vrp' is not supported
// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize' is not supported
-// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize-speculatively' is not supported
// CHECK-WARNING-DAG: the flag '-fslp-vectorize-aggressive' has been deprecated and will be ignored
// CHECK-WARNING-DAG: the flag '-fno-slp-vectorize-aggressive' has been deprecated and will be ignored
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 2742ec1b71b7e..c1fcde24cc420 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -102,6 +102,10 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;
+
+ // Tuning option to enable/disable speculative devirtualization.
+ // Its default value is false.
+ bool DevirtualizeSpeculatively;
};
/// This class provides access to building LLVM's passes.
diff --git a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
index 7a03405b4f462..2e33a4098be1b 100644
--- a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
+++ b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
@@ -226,11 +226,14 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
+ bool DevirtSpeculatively = false;
WholeProgramDevirtPass()
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
- const ModuleSummaryIndex *ImportSummary)
- : ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
+ const ModuleSummaryIndex *ImportSummary,
+ bool DevirtSpeculatively = false)
+ : ExportSummary(ExportSummary), ImportSummary(ImportSummary),
+ DevirtSpeculatively(DevirtSpeculatively) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 79642e650ac83..bbaa7493445a3 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -322,6 +322,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
+ DevirtualizeSpeculatively = false;
}
namespace llvm {
@@ -1635,6 +1636,24 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());
+ if (PTO.DevirtualizeSpeculatively && LTOPhase == ThinOrFullLTOPhase::None) {
+ MPM.addPass(WholeProgramDevirtPass(
+ /*ExportSummary*/ nullptr,
+ /*ImportSummary*/ nullptr,
+ /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively));
+ MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
+ lowertypetests::DropTestKind::Assume));
+ if (EnableModuleInliner) {
+ MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
+ UseInlineAdvisor,
+ ThinOrFullLTOPhase::None));
+ } else {
+ MPM.addPass(ModuleInlinerWrapperPass(
+ getInlineParamsFromOp...
[truncated]
|
|
@llvm/pr-subscribers-clang Author: Hassnaa Hamdi (hassnaaHamdi) ChangesThis patch enables speculative devirtualization feature in Clang/frontend.
Patch is 37.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159685.diff 16 Files Affected:
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index a8bbf146431ea..241f374bcf4c3 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2313,6 +2313,13 @@ are listed below.
This enables better devirtualization. Turned off by default, because it is
still experimental.
+.. option:: -fdevirtualize-speculatively
+
+ Enable speculative devirtualization optimization, such as single-implementation
+ devirtualization. This optimization is used out of LTO mode for now.
+ Turned off by default.
+ TODO: Enable for LTO mode.
+
.. option:: -fwhole-program-vtables
Enable whole-program vtable optimizations, such as single-implementation
@@ -5161,6 +5168,8 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
+ -fdevirtualize-speculatively
+ Enables speculative devirtualization optimization.
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 872f73ebf3810..38174cf13cadf 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -358,6 +358,8 @@ VALUE_CODEGENOPT(WarnStackSize , 32, UINT_MAX, Benign) ///< Set via -fwarn-s
CODEGENOPT(NoStackArgProbe, 1, 0, Benign) ///< Set when -mno-stack-arg-probe is used
CODEGENOPT(EmitLLVMUseLists, 1, 0, Benign) ///< Control whether to serialize use-lists.
+CODEGENOPT(DevirtualizeSpeculatively, 1, 0, Benign) ///< Whether to apply the speculative
+ /// devirtualization optimization.
CODEGENOPT(WholeProgramVTables, 1, 0, Benign) ///< Whether to apply whole-program
/// vtable optimization.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 47d328f862e07..a3656b80cac32 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4432,6 +4432,13 @@ defm new_infallible : BoolFOption<"new-infallible",
BothFlags<[], [ClangOption, CC1Option],
" treating throwing global C++ operator new as always returning valid memory "
"(annotates with __attribute__((returns_nonnull)) and throw()). This is detectable in source.">>;
+defm devirtualize_speculatively
+ : BoolFOption<"devirtualize-speculatively",
+ CodeGenOpts<"DevirtualizeSpeculatively">, DefaultFalse,
+ PosFlag<SetTrue, [], [],
+ "Enables speculative devirtualization optimization.">,
+ NegFlag<SetFalse>,
+ BothFlags<[], [ClangOption, CLOption, CC1Option]>>;
defm whole_program_vtables : BoolFOption<"whole-program-vtables",
CodeGenOpts<"WholeProgramVTables">, DefaultFalse,
PosFlag<SetTrue, [], [ClangOption, CC1Option],
@@ -7012,9 +7019,8 @@ defm variable_expansion_in_unroller : BooleanFFlag<"variable-expansion-in-unroll
Group<clang_ignored_gcc_optimization_f_Group>;
defm web : BooleanFFlag<"web">, Group<clang_ignored_gcc_optimization_f_Group>;
defm whole_program : BooleanFFlag<"whole-program">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize : BooleanFFlag<"devirtualize">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize_speculatively : BooleanFFlag<"devirtualize-speculatively">,
- Group<clang_ignored_gcc_optimization_f_Group>;
+defm devirtualize : BooleanFFlag<"devirtualize">,
+ Group<clang_ignored_gcc_optimization_f_Group>;
// Generic gfortran options.
def A_DASH : Joined<["-"], "A-">, Group<gfortran_Group>;
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 8c99af2bdff83..790467dc557a5 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -907,6 +907,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+ PTO.DevirtualizeSpeculatively = CodeGenOpts.DevirtualizeSpeculatively;
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 8346ee3aa6a8d..bf1724e347a7f 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -2771,10 +2771,11 @@ void CodeGenFunction::EmitTypeMetadataCodeForVCall(const CXXRecordDecl *RD,
SourceLocation Loc) {
if (SanOpts.has(SanitizerKind::CFIVCall))
EmitVTablePtrCheckForCall(RD, VTable, CodeGenFunction::CFITCK_VCall, Loc);
- else if (CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type test assumes if we are forcing public
- // visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD)) {
+ else if ((CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type test assumes if we are forcing public
+ // visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CanQualType Ty = CGM.getContext().getCanonicalTagType(RD);
llvm::Metadata *MD = CGM.CreateMetadataIdentifierForType(Ty);
llvm::Value *TypeId =
@@ -2932,8 +2933,9 @@ void CodeGenFunction::EmitVTablePtrCheck(const CXXRecordDecl *RD,
}
bool CodeGenFunction::ShouldEmitVTableTypeCheckedLoad(const CXXRecordDecl *RD) {
- if (!CGM.getCodeGenOpts().WholeProgramVTables ||
- !CGM.HasHiddenLTOVisibility(RD))
+ if ((!CGM.getCodeGenOpts().WholeProgramVTables ||
+ !CGM.HasHiddenLTOVisibility(RD)) &&
+ !CGM.getCodeGenOpts().DevirtualizeSpeculatively)
return false;
if (CGM.getCodeGenOpts().VirtualFunctionElimination)
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index e14e883a55ac5..959ba2031acf4 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -1358,10 +1358,12 @@ llvm::GlobalObject::VCallVisibility CodeGenModule::GetVCallVisibilityLevel(
void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
llvm::GlobalVariable *VTable,
const VTableLayout &VTLayout) {
- // Emit type metadata on vtables with LTO or IR instrumentation.
+ // Emit type metadata on vtables with LTO or IR instrumentation or
+ // speculative devirtualization.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
- if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
+ if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
+ !getCodeGenOpts().DevirtualizeSpeculatively)
return;
CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 7dc2eaf1e9f75..71d36b5d6b153 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -717,9 +717,10 @@ CGCallee ItaniumCXXABI::EmitLoadOfMemberFunctionPointer(
bool ShouldEmitVFEInfo = CGM.getCodeGenOpts().VirtualFunctionElimination &&
CGM.HasHiddenLTOVisibility(RD);
bool ShouldEmitWPDInfo =
- CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type tests if we are forcing public visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD);
+ (CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type tests if we are forcing public visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively;
llvm::Value *VirtualFn = nullptr;
{
@@ -2114,13 +2115,15 @@ void ItaniumCXXABI::emitVTableDefinitions(CodeGenVTables &CGVT,
// definitions to ensure we associate derived classes with base classes
// defined in headers but with a strong definition only in a shared library.
if (!VTable->isDeclarationForLinker() ||
- CGM.getCodeGenOpts().WholeProgramVTables) {
+ CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CGM.EmitVTableTypeMetadata(RD, VTable, VTLayout);
// For available_externally definitions, add the vtable to
// @llvm.compiler.used so that it isn't deleted before whole program
// analysis.
if (VTable->isDeclarationForLinker()) {
- assert(CGM.getCodeGenOpts().WholeProgramVTables);
+ assert(CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively);
CGM.addCompilerUsedGlobal(VTable);
}
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 63efb0f02baa8..d91afcbf3fd6c 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7785,6 +7785,13 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
addOpenMPHostOffloadingArgs(C, JA, Args, CmdArgs);
+ // Temporarily disable this for LTO if it's not explicitly enabled.
+ // TODO: enable it by default for LTO also.
+ if (Args.hasFlag(options::OPT_fdevirtualize_speculatively,
+ options::OPT_fno_devirtualize_speculatively,
+ /*Default value*/ false))
+ CmdArgs.push_back("-fdevirtualize-speculatively");
+
bool VirtualFunctionElimination =
Args.hasFlag(options::OPT_fvirtual_function_elimination,
options::OPT_fno_virtual_function_elimination, false);
diff --git a/clang/test/CodeGenCXX/type-metadata.cpp b/clang/test/CodeGenCXX/type-metadata.cpp
index 1cb2fed8db3e6..61d36204942dc 100644
--- a/clang/test/CodeGenCXX/type-metadata.cpp
+++ b/clang/test/CodeGenCXX/type-metadata.cpp
@@ -14,6 +14,9 @@
// RUN: %clang_cc1 -O2 -flto -flto-unit -triple x86_64-unknown-linux -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=ITANIUM-OPT --check-prefix=ITANIUM-OPT-LAYOUT %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TT-MS %s
+// Test for the speculative devirtualization feature in nonlto mode:
+// RUN: %clang_cc1 -triple x86_64-unknown-linux -fdevirtualize-speculatively -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT %s
+
// Tests for cfi + whole-program-vtables:
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-unknown-linux -fvisibility=hidden -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=ITANIUM-HIDDEN --check-prefix=ITANIUM-COMMON-MD --check-prefix=TC-ITANIUM --check-prefix=ITANIUM-NO-RV-MD %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TC-MS %s
@@ -178,6 +181,7 @@ void af(A *a) {
// TT-ITANIUM-HIDDEN: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-ITANIUM-DEFAULT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-MS: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TC-ITANIUM: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-ITANIUM-RV: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-MS: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -212,6 +216,7 @@ void df1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -224,6 +229,7 @@ void dg1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUB@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTS1B")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTS1B")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUB@@")
@@ -236,6 +242,7 @@ void dh1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 16, metadata ![[DTYPE]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE:[0-9]+]])
@@ -297,6 +304,7 @@ void f(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@test2@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTSN5test21DE")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTSN5test21DE")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@test2@@")
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index eb3994ddabcd3..1e6e1253a7a09 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -371,7 +371,6 @@
// RUN: -ftree-ter \
// RUN: -ftree-vrp \
// RUN: -fno-devirtualize \
-// RUN: -fno-devirtualize-speculatively \
// RUN: -fslp-vectorize-aggressive \
// RUN: -fno-slp-vectorize-aggressive \
// RUN: %s 2>&1 | FileCheck --check-prefix=CHECK-WARNING %s
@@ -430,7 +429,6 @@
// CHECK-WARNING-DAG: optimization flag '-ftree-ter' is not supported
// CHECK-WARNING-DAG: optimization flag '-ftree-vrp' is not supported
// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize' is not supported
-// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize-speculatively' is not supported
// CHECK-WARNING-DAG: the flag '-fslp-vectorize-aggressive' has been deprecated and will be ignored
// CHECK-WARNING-DAG: the flag '-fno-slp-vectorize-aggressive' has been deprecated and will be ignored
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 2742ec1b71b7e..c1fcde24cc420 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -102,6 +102,10 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;
+
+ // Tuning option to enable/disable speculative devirtualization.
+ // Its default value is false.
+ bool DevirtualizeSpeculatively;
};
/// This class provides access to building LLVM's passes.
diff --git a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
index 7a03405b4f462..2e33a4098be1b 100644
--- a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
+++ b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
@@ -226,11 +226,14 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
+ bool DevirtSpeculatively = false;
WholeProgramDevirtPass()
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
- const ModuleSummaryIndex *ImportSummary)
- : ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
+ const ModuleSummaryIndex *ImportSummary,
+ bool DevirtSpeculatively = false)
+ : ExportSummary(ExportSummary), ImportSummary(ImportSummary),
+ DevirtSpeculatively(DevirtSpeculatively) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 79642e650ac83..bbaa7493445a3 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -322,6 +322,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
+ DevirtualizeSpeculatively = false;
}
namespace llvm {
@@ -1635,6 +1636,24 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());
+ if (PTO.DevirtualizeSpeculatively && LTOPhase == ThinOrFullLTOPhase::None) {
+ MPM.addPass(WholeProgramDevirtPass(
+ /*ExportSummary*/ nullptr,
+ /*ImportSummary*/ nullptr,
+ /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively));
+ MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
+ lowertypetests::DropTestKind::Assume));
+ if (EnableModuleInliner) {
+ MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
+ UseInlineAdvisor,
+ ThinOrFullLTOPhase::None));
+ } else {
+ MPM.addPass(ModuleInlinerWrapperPass(
+ getInlineParamsFromOp...
[truncated]
|
|
@llvm/pr-subscribers-clang-driver Author: Hassnaa Hamdi (hassnaaHamdi) ChangesThis patch enables speculative devirtualization feature in Clang/frontend.
Patch is 37.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159685.diff 16 Files Affected:
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index a8bbf146431ea..241f374bcf4c3 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2313,6 +2313,13 @@ are listed below.
This enables better devirtualization. Turned off by default, because it is
still experimental.
+.. option:: -fdevirtualize-speculatively
+
+ Enable speculative devirtualization optimization, such as single-implementation
+ devirtualization. This optimization is used out of LTO mode for now.
+ Turned off by default.
+ TODO: Enable for LTO mode.
+
.. option:: -fwhole-program-vtables
Enable whole-program vtable optimizations, such as single-implementation
@@ -5161,6 +5168,8 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
+ -fdevirtualize-speculatively
+ Enables speculative devirtualization optimization.
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 872f73ebf3810..38174cf13cadf 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -358,6 +358,8 @@ VALUE_CODEGENOPT(WarnStackSize , 32, UINT_MAX, Benign) ///< Set via -fwarn-s
CODEGENOPT(NoStackArgProbe, 1, 0, Benign) ///< Set when -mno-stack-arg-probe is used
CODEGENOPT(EmitLLVMUseLists, 1, 0, Benign) ///< Control whether to serialize use-lists.
+CODEGENOPT(DevirtualizeSpeculatively, 1, 0, Benign) ///< Whether to apply the speculative
+ /// devirtualization optimization.
CODEGENOPT(WholeProgramVTables, 1, 0, Benign) ///< Whether to apply whole-program
/// vtable optimization.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 47d328f862e07..a3656b80cac32 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4432,6 +4432,13 @@ defm new_infallible : BoolFOption<"new-infallible",
BothFlags<[], [ClangOption, CC1Option],
" treating throwing global C++ operator new as always returning valid memory "
"(annotates with __attribute__((returns_nonnull)) and throw()). This is detectable in source.">>;
+defm devirtualize_speculatively
+ : BoolFOption<"devirtualize-speculatively",
+ CodeGenOpts<"DevirtualizeSpeculatively">, DefaultFalse,
+ PosFlag<SetTrue, [], [],
+ "Enables speculative devirtualization optimization.">,
+ NegFlag<SetFalse>,
+ BothFlags<[], [ClangOption, CLOption, CC1Option]>>;
defm whole_program_vtables : BoolFOption<"whole-program-vtables",
CodeGenOpts<"WholeProgramVTables">, DefaultFalse,
PosFlag<SetTrue, [], [ClangOption, CC1Option],
@@ -7012,9 +7019,8 @@ defm variable_expansion_in_unroller : BooleanFFlag<"variable-expansion-in-unroll
Group<clang_ignored_gcc_optimization_f_Group>;
defm web : BooleanFFlag<"web">, Group<clang_ignored_gcc_optimization_f_Group>;
defm whole_program : BooleanFFlag<"whole-program">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize : BooleanFFlag<"devirtualize">, Group<clang_ignored_gcc_optimization_f_Group>;
-defm devirtualize_speculatively : BooleanFFlag<"devirtualize-speculatively">,
- Group<clang_ignored_gcc_optimization_f_Group>;
+defm devirtualize : BooleanFFlag<"devirtualize">,
+ Group<clang_ignored_gcc_optimization_f_Group>;
// Generic gfortran options.
def A_DASH : Joined<["-"], "A-">, Group<gfortran_Group>;
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 8c99af2bdff83..790467dc557a5 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -907,6 +907,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
+ PTO.DevirtualizeSpeculatively = CodeGenOpts.DevirtualizeSpeculatively;
LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 8346ee3aa6a8d..bf1724e347a7f 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -2771,10 +2771,11 @@ void CodeGenFunction::EmitTypeMetadataCodeForVCall(const CXXRecordDecl *RD,
SourceLocation Loc) {
if (SanOpts.has(SanitizerKind::CFIVCall))
EmitVTablePtrCheckForCall(RD, VTable, CodeGenFunction::CFITCK_VCall, Loc);
- else if (CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type test assumes if we are forcing public
- // visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD)) {
+ else if ((CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type test assumes if we are forcing public
+ // visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CanQualType Ty = CGM.getContext().getCanonicalTagType(RD);
llvm::Metadata *MD = CGM.CreateMetadataIdentifierForType(Ty);
llvm::Value *TypeId =
@@ -2932,8 +2933,9 @@ void CodeGenFunction::EmitVTablePtrCheck(const CXXRecordDecl *RD,
}
bool CodeGenFunction::ShouldEmitVTableTypeCheckedLoad(const CXXRecordDecl *RD) {
- if (!CGM.getCodeGenOpts().WholeProgramVTables ||
- !CGM.HasHiddenLTOVisibility(RD))
+ if ((!CGM.getCodeGenOpts().WholeProgramVTables ||
+ !CGM.HasHiddenLTOVisibility(RD)) &&
+ !CGM.getCodeGenOpts().DevirtualizeSpeculatively)
return false;
if (CGM.getCodeGenOpts().VirtualFunctionElimination)
diff --git a/clang/lib/CodeGen/CGVTables.cpp b/clang/lib/CodeGen/CGVTables.cpp
index e14e883a55ac5..959ba2031acf4 100644
--- a/clang/lib/CodeGen/CGVTables.cpp
+++ b/clang/lib/CodeGen/CGVTables.cpp
@@ -1358,10 +1358,12 @@ llvm::GlobalObject::VCallVisibility CodeGenModule::GetVCallVisibilityLevel(
void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
llvm::GlobalVariable *VTable,
const VTableLayout &VTLayout) {
- // Emit type metadata on vtables with LTO or IR instrumentation.
+ // Emit type metadata on vtables with LTO or IR instrumentation or
+ // speculative devirtualization.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
- if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
+ if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
+ !getCodeGenOpts().DevirtualizeSpeculatively)
return;
CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
diff --git a/clang/lib/CodeGen/ItaniumCXXABI.cpp b/clang/lib/CodeGen/ItaniumCXXABI.cpp
index 7dc2eaf1e9f75..71d36b5d6b153 100644
--- a/clang/lib/CodeGen/ItaniumCXXABI.cpp
+++ b/clang/lib/CodeGen/ItaniumCXXABI.cpp
@@ -717,9 +717,10 @@ CGCallee ItaniumCXXABI::EmitLoadOfMemberFunctionPointer(
bool ShouldEmitVFEInfo = CGM.getCodeGenOpts().VirtualFunctionElimination &&
CGM.HasHiddenLTOVisibility(RD);
bool ShouldEmitWPDInfo =
- CGM.getCodeGenOpts().WholeProgramVTables &&
- // Don't insert type tests if we are forcing public visibility.
- !CGM.AlwaysHasLTOVisibilityPublic(RD);
+ (CGM.getCodeGenOpts().WholeProgramVTables &&
+ // Don't insert type tests if we are forcing public visibility.
+ !CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively;
llvm::Value *VirtualFn = nullptr;
{
@@ -2114,13 +2115,15 @@ void ItaniumCXXABI::emitVTableDefinitions(CodeGenVTables &CGVT,
// definitions to ensure we associate derived classes with base classes
// defined in headers but with a strong definition only in a shared library.
if (!VTable->isDeclarationForLinker() ||
- CGM.getCodeGenOpts().WholeProgramVTables) {
+ CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CGM.EmitVTableTypeMetadata(RD, VTable, VTLayout);
// For available_externally definitions, add the vtable to
// @llvm.compiler.used so that it isn't deleted before whole program
// analysis.
if (VTable->isDeclarationForLinker()) {
- assert(CGM.getCodeGenOpts().WholeProgramVTables);
+ assert(CGM.getCodeGenOpts().WholeProgramVTables ||
+ CGM.getCodeGenOpts().DevirtualizeSpeculatively);
CGM.addCompilerUsedGlobal(VTable);
}
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 63efb0f02baa8..d91afcbf3fd6c 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7785,6 +7785,13 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
addOpenMPHostOffloadingArgs(C, JA, Args, CmdArgs);
+ // Temporarily disable this for LTO if it's not explicitly enabled.
+ // TODO: enable it by default for LTO also.
+ if (Args.hasFlag(options::OPT_fdevirtualize_speculatively,
+ options::OPT_fno_devirtualize_speculatively,
+ /*Default value*/ false))
+ CmdArgs.push_back("-fdevirtualize-speculatively");
+
bool VirtualFunctionElimination =
Args.hasFlag(options::OPT_fvirtual_function_elimination,
options::OPT_fno_virtual_function_elimination, false);
diff --git a/clang/test/CodeGenCXX/type-metadata.cpp b/clang/test/CodeGenCXX/type-metadata.cpp
index 1cb2fed8db3e6..61d36204942dc 100644
--- a/clang/test/CodeGenCXX/type-metadata.cpp
+++ b/clang/test/CodeGenCXX/type-metadata.cpp
@@ -14,6 +14,9 @@
// RUN: %clang_cc1 -O2 -flto -flto-unit -triple x86_64-unknown-linux -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=ITANIUM-OPT --check-prefix=ITANIUM-OPT-LAYOUT %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TT-MS %s
+// Test for the speculative devirtualization feature in nonlto mode:
+// RUN: %clang_cc1 -triple x86_64-unknown-linux -fdevirtualize-speculatively -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT %s
+
// Tests for cfi + whole-program-vtables:
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-unknown-linux -fvisibility=hidden -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=ITANIUM-HIDDEN --check-prefix=ITANIUM-COMMON-MD --check-prefix=TC-ITANIUM --check-prefix=ITANIUM-NO-RV-MD %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TC-MS %s
@@ -178,6 +181,7 @@ void af(A *a) {
// TT-ITANIUM-HIDDEN: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-ITANIUM-DEFAULT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-MS: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TC-ITANIUM: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-ITANIUM-RV: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-MS: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -212,6 +216,7 @@ void df1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
@@ -224,6 +229,7 @@ void dg1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUB@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTS1B")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTS1B")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUB@@")
@@ -236,6 +242,7 @@ void dh1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 16, metadata ![[DTYPE]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE:[0-9]+]])
@@ -297,6 +304,7 @@ void f(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@test2@@")
+ // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTSN5test21DE")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTSN5test21DE")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@test2@@")
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index eb3994ddabcd3..1e6e1253a7a09 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -371,7 +371,6 @@
// RUN: -ftree-ter \
// RUN: -ftree-vrp \
// RUN: -fno-devirtualize \
-// RUN: -fno-devirtualize-speculatively \
// RUN: -fslp-vectorize-aggressive \
// RUN: -fno-slp-vectorize-aggressive \
// RUN: %s 2>&1 | FileCheck --check-prefix=CHECK-WARNING %s
@@ -430,7 +429,6 @@
// CHECK-WARNING-DAG: optimization flag '-ftree-ter' is not supported
// CHECK-WARNING-DAG: optimization flag '-ftree-vrp' is not supported
// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize' is not supported
-// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize-speculatively' is not supported
// CHECK-WARNING-DAG: the flag '-fslp-vectorize-aggressive' has been deprecated and will be ignored
// CHECK-WARNING-DAG: the flag '-fno-slp-vectorize-aggressive' has been deprecated and will be ignored
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 2742ec1b71b7e..c1fcde24cc420 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -102,6 +102,10 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;
+
+ // Tuning option to enable/disable speculative devirtualization.
+ // Its default value is false.
+ bool DevirtualizeSpeculatively;
};
/// This class provides access to building LLVM's passes.
diff --git a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
index 7a03405b4f462..2e33a4098be1b 100644
--- a/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
+++ b/llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
@@ -226,11 +226,14 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
+ bool DevirtSpeculatively = false;
WholeProgramDevirtPass()
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
- const ModuleSummaryIndex *ImportSummary)
- : ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
+ const ModuleSummaryIndex *ImportSummary,
+ bool DevirtSpeculatively = false)
+ : ExportSummary(ExportSummary), ImportSummary(ImportSummary),
+ DevirtSpeculatively(DevirtSpeculatively) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 79642e650ac83..bbaa7493445a3 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -322,6 +322,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
+ DevirtualizeSpeculatively = false;
}
namespace llvm {
@@ -1635,6 +1636,24 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());
+ if (PTO.DevirtualizeSpeculatively && LTOPhase == ThinOrFullLTOPhase::None) {
+ MPM.addPass(WholeProgramDevirtPass(
+ /*ExportSummary*/ nullptr,
+ /*ImportSummary*/ nullptr,
+ /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively));
+ MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
+ lowertypetests::DropTestKind::Assume));
+ if (EnableModuleInliner) {
+ MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
+ UseInlineAdvisor,
+ ThinOrFullLTOPhase::None));
+ } else {
+ MPM.addPass(ModuleInlinerWrapperPass(
+ getInlineParamsFromOp...
[truncated]
|
|
Right now this PR includes all the changes from base PR159048. If you aren't using stacked PRs like with graphite, it will probably be better to review once that one is in and this can be rebased. |
|
This patch is ready for review. Thanks in advance. |
erichkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty unconvinced as to the value here or the soundness here of the optimization.
clang/docs/UsersManual.rst
Outdated
|
|
||
| .. option:: -fdevirtualize-speculatively | ||
|
|
||
| Enable speculative devirtualization optimization, such as single-implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option explaination needs to be much better. it doesn't explain what it is doing at all, and seems to be conflicting with itself as to what happens with LTO mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that this could be improved with a better description of the optimization. Regarding the LTO note, I think the confusion stems from the wording "used out of LTO mode". Maybe "This optimization is not used in LTO mode for now"
| // Don't insert type test assumes if we are forcing public | ||
| // visibility. | ||
| !CGM.AlwaysHasLTOVisibilityPublic(RD)) || | ||
| CGM.getCodeGenOpts().DevirtualizeSpeculatively) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we document the implications that this has? This seems like it would break linking, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking? I think linking is not related here, maybe you got confused by the LTO related code ?
LTO related code is old code, I just modified it to also emit the type test assumes when the flag of devirt is enabled. But the change here is not related to LTO or linking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with devirtualization is always that we don't know what is going on in other TUs, right? There could be other variants of the function in other types in the same virtual tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right. That's why this is only 'speculative' so it should be safe regardless of how other modules could affect the inheritance tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain "should be safe"? Do you mean: "The user is promising us that it is safe", or "we are proving it is safe"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speculative devirtualization transforms an indirect call to a guarded direct call. It is guarded by a comparison of the virtual function pointer to the expected target. So it is always safe. We also do this with profile information (in that case aka IndirectCallPromotion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speculative devirtualization transforms an indirect call to a guarded direct call. It is guarded by a comparison of the virtual function pointer to the expected target. So it is always safe. We also do this with profile information (in that case aka IndirectCallPromotion).
Ok, this seems interesting and more sound. Can you share some IR of the 'after' here that shows that? More importantly, can we see some tests to that effect?
| !CGM.HasHiddenLTOVisibility(RD)) | ||
| if ((!CGM.getCodeGenOpts().WholeProgramVTables || | ||
| !CGM.HasHiddenLTOVisibility(RD)) && | ||
| !CGM.getCodeGenOpts().DevirtualizeSpeculatively) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, is this devirtualizing ALL calls? How can that work? That doesn't seem like an appropriate implementation?
I find myself wondering if this needs to be a 'smaller hammer' here other than TU level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @erichkeane
Thanks for reviewing :)
This patch enables speculative devirtualization for all calls 👀 but ONLY as long as there is a single possible callee equivalent to that virtual function (when there is a single implementation of the virtual function or there is a single initialisation of its related virtual table).
Here is an example in gcc:
https://godbolt.org/z/f4z5Wofaf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how can we KNOW there is only a single implementation of the virtual function cross TU? thats the point about linking above. In the GCC case, we know the concrete type of ArrayVectorBase, so doing a devirtualization should be a pretty much no-brainer (and I think we already do something similar?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang does not do the same: https://godbolt.org/z/fE48b3eK3
Here are other cases that the type is not clear:
https://godbolt.org/z/Gs73WTvbM
https://godbolt.org/z/zrM6svM8r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other answer, but basically it is guarded by a function pointer comparison.
@hassnaaHamdi I'll take a look at the code changes later today, but perhaps the PR description needs to have an explanation of what is meant by speculative devirtualization, with a short before and after code snippet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are different cases of usages of virtual function, there are cpp code which shows the c++ test case, another tap for IR without enabling devirtualization and another tap for the IR when devietualization is enabled.
There are some comments in the CPP code and the generated Vtables in the IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks! The branch based on the equality of the vtables wasn't something that was clear to me, and I would love to see a clang-level test that shows us generating that without the pass happening? (or am I missing something else?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang here only emits the the intrinsics of (public_type_test and assume) and type metadata for each related vtable (as part of the CGVtables.cpp logic). You can see those metadata nodes at the end of the IR.
For example here the vtable of Derived class without the devirt flag:
@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI7Derived, ptr @_ZN4Base16virtual_functionEv] }, comdat, align 8
it doesn't refer to any metadata nodes.
While when we use the devirt flag, it does:
@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI7Derived, ptr @_ZN4Base16virtual_functionEv] }, comdat, align 8, !type !0, !type !1, !type !2, !type !3
That logic already exists in Clang for the LTO/wholeprogramdevirt feature, in this patch I just ask Clang to also emit it when the speculative_devirt flag is enabled.
So, Clang only emits those data, while the backend (WholeProgramDevirt.cpp pass) makes use of those data and decide if we devirtualize or not.
I would love to see a clang-level test that shows us generating that without the pass happening?
Does this test fulfil your ask ? clang/test/CodeGenCXX/type-metadata.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test isn't quite 'clear' enough? Though perhaps it is good enough for lit.
Either way, I think I understand enought o be happy, thanks for your patience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there are a lot of other test cases in that test file. I will create another test dedicated to show the clang effects after enabling the flag.
erichkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve docs in UsersManual + add a release note, and the FE stuff is good. Otherwise we need someone to review the LLVM parts.
| !CGM.HasHiddenLTOVisibility(RD)) | ||
| if ((!CGM.getCodeGenOpts().WholeProgramVTables || | ||
| !CGM.HasHiddenLTOVisibility(RD)) && | ||
| !CGM.getCodeGenOpts().DevirtualizeSpeculatively) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test isn't quite 'clear' enough? Though perhaps it is good enough for lit.
Either way, I think I understand enought o be happy, thanks for your patience.
teresajohnson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! Some comments below, mostly minor.
|
|
||
| addOpenMPHostOffloadingArgs(C, JA, Args, CmdArgs); | ||
|
|
||
| // Temporarily disable this for LTO if it's not explicitly enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this being disabled for LTO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think it is happening in the LLVM side? If that's what you are referring to you should probably clarify that here (e.g. that on the LLVM side this is being temporarily ignored)
| @@ -1641,6 +1642,24 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level, | |||
| if (!LTOPreLink) | |||
| MPM.addPass(RelLookupTableConverterPass()); | |||
|
|
|||
| if (PTO.DevirtualizeSpeculatively && LTOPhase == ThinOrFullLTOPhase::None) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO about LTO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, TODO about what ? This check here is because that if LTO is enabled, the WPD pass will be already added to the LTO pipeline, so we should not add it again here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the late reply, I initially thought I understood the comment, but now not 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, this isn't the right place to add the LTO speculative devirt support so a TODO doesn't make sense. Current comment is fine.
| /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively)); | ||
| MPM.addPass(LowerTypeTestsPass(nullptr, nullptr, | ||
| lowertypetests::DropTestKind::Assume)); | ||
| if (EnableModuleInliner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me why you need another round of inlining vs doing this before the earlier inline pass? Is this an optimization for a specific use case that needs 2 rounds of inlining? I'm concerned about the potential side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added comments about this:
// Given that the devirtualization creates more opportunities for inlining,
// we run the Inliner again here to maximize the optimization gain we
// get from devirtualization.
// Also, we can't run devirtualization before inlining because the
// devirtualization depends on the passes optimizing/eliminating vtable GVs
// and those passes are only effective after inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what vtable GV optimization is needed to expose these opportunities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is there a test being added that would fail without this inliner invocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what vtable GV optimization is needed to expose these opportunities?
I mean optimisation passes like GlobalOpt.cpp pass which eliminates the unused GVs, and then when devirtualization works, it finds a single GV that is refering to the virtual function, so it can devirtualize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look at how the DevirtSCCRepeatedPass is invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the compile-time on ct-mark, and the compilation overhead seems big for each of the inliner and the WPD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see that integrating the WPD pass into another pipeline is non-trivial and would require changes across multiple places. Additionally, I'm uncertain if this approach will succeed given that WPD is a module pass.
And maybe (maybe) it could be better to have a different pass for non-to speculative devirtualization isolated from all the LTO stuff and in that case it's easier to add any needed changes into the pass without touching LTO things.
But anyway, I'll need to investigate the suggested approach further before implementing.
Does it make sense to keep the current opt-in speculative devirtualization changes for now, and revisit the pipeline integration as a follow-up?
Of course, I'm still open to pursuing the correct approach if the current implementation is not good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to putting it in as is for now, since it is disabled by default, and adding a TODO to explore a better pipeline configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the TODO.
clang/docs/UsersManual.rst
Outdated
|
|
||
| .. option:: -fdevirtualize-speculatively | ||
|
|
||
| Enable speculative devirtualization optimization, such as single-implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that this could be improved with a better description of the optimization. Regarding the LTO note, I think the confusion stems from the wording "used out of LTO mode". Maybe "This optimization is not used in LTO mode for now"
clang/lib/CodeGen/ItaniumCXXABI.cpp
Outdated
|
|
||
| // Always emit type metadata on non-available_externally definitions, and on | ||
| // available_externally definitions if we are performing whole program | ||
| // devirtualization. For WPD we need the type metadata on all vtable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs an update to note we now use this for speculative devirtualization too
|
|
||
| bool ShouldEmitVFEInfo = CGM.getCodeGenOpts().VirtualFunctionElimination && | ||
| CGM.HasHiddenLTOVisibility(RD); | ||
| bool ShouldEmitWPDInfo = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that the WPD terminology is legacy and we now also perform speculative devirtualization using the same info.
| @@ -0,0 +1,78 @@ | |||
| // Test that the vtable metadata that are emitted by Clang when speculative devirtualization | |||
| // is enabled can be used by the WholeProgramDevirt pass without being dropped on the way. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will fail just for now until the patch gets updated. I will do the update after receiving all reviews because I'll have to force push.
This test fails because the DropUnnecessaryAssumes pass is dropping public_type_test intrinsic, and then WPD doesn't get it. But that pass got updated recently to keep the intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is testing LLVM behavior, it should be an llvm side test - just take the LLVM IR generated from clang and move it under llvm/test/Transforms/DropUnnecessaryAssumes and use "opt" to test. You could test "opt -O3" as well as "opt -passes=drop-unnecessary-assumes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to test the DropUnnecessaryAssumes pass specifically, because the intrinsics could be dropped by any other pass but I moved it to llvm/test/Transforms/WholeProgramDevirt. Does that make sense ?
erichkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy when @teresajohnson is :)
clang/docs/UsersManual.rst
Outdated
| if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ | ||
| ret void | ||
| This feature is temporarily ignored at the LLVM side when LTO is enabled. | ||
| TODO: Update the comment when the LLVM side supports it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Update the comment when LLVM supports this feature for LTO." ?
clang/docs/UsersManual.rst
Outdated
| the assumption before making the direct call, and if the check fails, | ||
| the original virtual call is made instead. This optimization can enable | ||
| more inlining opportunities and better optimization of the direct call. | ||
| This is different from other whole program devirtualization optimizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "other" since this isn't a whole program optimization.
clang/docs/UsersManual.rst
Outdated
| that the object is always of a particular type at a virtual call site. | ||
| This optimization doesn't require global analysis or hidden visibility. | ||
| This optimization doesn't devirtualize all virtual calls, but only | ||
| when there's a single implementation of the virtual function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "in the module" ?
clang/docs/UsersManual.rst
Outdated
| This optimization doesn't require global analysis or hidden visibility. | ||
| This optimization doesn't devirtualize all virtual calls, but only | ||
| when there's a single implementation of the virtual function. | ||
| There could be a single implementaiton of the virtual function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "implementaiton"
clang/docs/UsersManual.rst
Outdated
| when there's a single implementation of the virtual function. | ||
| There could be a single implementaiton of the virtual function | ||
| either because the function is not overridden in any derived class, | ||
| or because there is a sinlge instantiated object that is using the funciton. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos "sinlge" and "funciton". Also, I don't understand the second case here after the "or" - why does it matter if there is a single instantiated object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm explaining the cases at which the devirtualization will be able to devirtualize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really mean it must be a singleton object though? Or do you mean all objects must be instantiated from a single type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not expressing it well.
I mean that the devirtualization would work if the virtual function is not overridden in the inheritance tree, or the second case when the vfunction is overridden but single class in the inheritance tree is instantiated.
ex for the first case:
struct Base {
__attribute__((noinline))
virtual void virtual_function1() { asm volatile("NOP"); }
};
struct Derived : Base {
};
__attribute__((noinline))
void foo(Base *BV) {
BV->virtual_function1();
}
void bar() {
Base *d = new Derived();
Base *b = new Base();
foo(d);
foo(b);
}
second case:
struct Base {
__attribute__((noinline))
virtual void virtual_function1() { asm volatile("NOP"); }
};
struct Derived : Base {
void virtual_function1() override { asm volatile("NOP"); }
};
__attribute__((noinline))
void foo(Base *BV) {
BV->virtual_function1();
}
void bar() {
Base *d = new Derived();
foo(d);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think the confusion is that you are saying a single instantiated "object" in the manual, but what you mean is all objects are a single instantiated class / type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored.
| @@ -1641,6 +1642,24 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level, | |||
| if (!LTOPreLink) | |||
| MPM.addPass(RelLookupTableConverterPass()); | |||
|
|
|||
| if (PTO.DevirtualizeSpeculatively && LTOPhase == ThinOrFullLTOPhase::None) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, this isn't the right place to add the LTO speculative devirt support so a TODO doesn't make sense. Current comment is fine.
| /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively)); | ||
| MPM.addPass(LowerTypeTestsPass(nullptr, nullptr, | ||
| lowertypetests::DropTestKind::Assume)); | ||
| if (EnableModuleInliner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what vtable GV optimization is needed to expose these opportunities?
| } | ||
|
|
||
| void af(A *a) { | ||
| // TT-ITANIUM-DEFAULT-NOLTO-SPECULATIVE-DEVIRT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: does this check label need to be so long? Does it even need to be distinct from VTABLE-OPT since they must be used together? They could both simply be CHECK without any loss of clarity I think. Actually that would probably be clearer.
| @@ -0,0 +1,78 @@ | |||
| // Test that the vtable metadata that are emitted by Clang when speculative devirtualization | |||
| // is enabled can be used by the WholeProgramDevirt pass without being dropped on the way. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is testing LLVM behavior, it should be an llvm side test - just take the LLVM IR generated from clang and move it under llvm/test/Transforms/DropUnnecessaryAssumes and use "opt" to test. You could test "opt -O3" as well as "opt -passes=drop-unnecessary-assumes".
| /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively)); | ||
| MPM.addPass(LowerTypeTestsPass(nullptr, nullptr, | ||
| lowertypetests::DropTestKind::Assume)); | ||
| if (EnableModuleInliner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is there a test being added that would fail without this inliner invocation?
| static cl::opt<bool> EnableDevirtualizeSpeculatively( | ||
| "enable-devirtualize-speculatively", | ||
| cl::desc("Enable speculative devirtualization optimization"), | ||
| cl::init(false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable PhaseOrdering testing, I created that flag since devirtualization is conditionally added to the pipeline. However, since a similar flag exists in the WholeProgramDevirt pass itself, the optimal solution would be to: remove both flags, add flag parsing logic to PassBuilder (similar to some other passes), and update the WholeProgramDevirt pass accordingly. This would provide single flag handling across both PassBuilderPipeline and the pass implementation.
So, I added a TODO to add that change later.
clang/lib/CodeGen/CGClass.cpp
Outdated
| // Emit the type test assumes for the features of WPD (only when LTO | ||
| // visibility is NOT public) and speculative devirtualization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this lacks punctuation, or the grammar is off - I can't quite make sense of the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment says:
// Emit the intrinsics of (type_test and assume) for the features of WPD and speculative devirtualization.
// For WPD, emit the intrinsics only for the case of non_public LTO visibility)
If the new phrase makes sense to you, I will update it.
| if ((!CGM.getCodeGenOpts().WholeProgramVTables || | ||
| !CGM.HasHiddenLTOVisibility(RD)) && | ||
| !CGM.getCodeGenOpts().DevirtualizeSpeculatively) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to do that check in multiple places - maybe a ShouldPerformSpeculativeDevirtualization() function would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which check specifically? this one: CGM.getCodeGenOpts().DevirtualizeSpeculatively? or the whole if-statement ?
If you mean the whole if-statement, this patch is not related to the part of WPD, but only the speculative devirtualization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to refactor the whole if statement, although there are a few different variations of the combination of conditions being checked. But yes, it should be named something more general, like shouldEmitVTableTypeInfo or something like that.
fhahn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the title? Currently it sounds like speculative devirtualization gets enabled by default, but more precisely the patch adds an option to enable it, but it is off by default
| /*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively)); | ||
| MPM.addPass(LowerTypeTestsPass(nullptr, nullptr, | ||
| lowertypetests::DropTestKind::Assume)); | ||
| if (EnableModuleInliner) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing the data. So it can have quite a big compile-time impact, curious if this could be improved, as otherwise it's hard to see a path to enabling this by default.
teresajohnson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Thanks so much everyone :)) @teresajohnson This is the order of the follow-up patches I will publish, please let me know if you have a different opinion or something is missing, thank you :)
|
- Build ExportSummary locally when they are not given.
- Improve documentation and comments - Update release notes.
…ay to the WPD at backend
|
Latest commit is Rebasing because the path of |
This patch adds Clang support for speculative devirtualization and integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR #159048.
Speculative devirtualization transforms an indirect call (the virtual function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the expected target.
This optimization is still safe without LTO because it doesn't do direct calls, it's conditional according to the function ptr.
This optimization:
-fdevirtualize-speculativelyFor this C++ example:
Here is the IR without enabling speculative devirtualization:
IR after enabling speculative devirtualization: