Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions clang/docs/UsersManual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2275,9 +2275,13 @@ are listed below.

.. option:: -fwhole-program-vtables

In LTO mode:
Enable whole-program vtable optimizations, such as single-implementation
devirtualization and virtual constant propagation, for classes with
:doc:`hidden LTO visibility <LTOVisibility>`. Requires ``-flto``.
:doc:`hidden LTO visibility <LTOVisibility>`.
In non-LTO mode:
Enables speculative devirtualization only without other features.
Doesn't require ``-flto`` or visibility.

.. option:: -f[no]split-lto-unit

Expand Down Expand Up @@ -5170,7 +5174,7 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-fwhole-program-vtables Enables whole-program vtable optimization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, the option name doesn't really make sense in this case. Perhaps there should be a separate option controlling the non-LTO behavior. Let me think more about this part...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcc has an option called -fdevirtualize-speculatively, that describes what is being enabled here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fdevirtualize-speculatively

How about adding that to clang and using it to enable this instead of -fwhole-program-vtables, which doesn't really make sense as what is being enabled isn't whole program in this case.

-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
-gline-directives-only Emit debug line info directives only
Expand Down
1 change: 1 addition & 0 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -902,6 +902,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
PTO.WholeProgramDevirt = CodeGenOpts.WholeProgramVTables;

LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/CodeGen/CGVTables.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1359,7 +1359,8 @@ void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
// Emit type metadata on vtables with LTO or IR instrumentation.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
!getCodeGenOpts().WholeProgramVTables)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Teresa, thanks for the review.
Sorry about late reply, I was on holiday.

Without checking WholeProgramVTables, the needed metadata will not be emitted.

return;

CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
Expand Down
17 changes: 6 additions & 11 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7837,24 +7837,19 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
// ThinLTO mode.
bool IsPS4 = getToolChain().getTriple().isPS4();

// Check if we passed LTO options but they were suppressed because this is a
// device offloading action, or we passed device offload LTO options which
// were suppressed because this is not the device offload action.
// Check if we are using PS4 in regular LTO mode.
// Otherwise, issue an error.

auto OtherLTOMode =
IsDeviceOffloadAction ? D.getLTOMode() : D.getOffloadLTOMode();
auto OtherIsUsingLTO = OtherLTOMode != LTOK_None;

if ((!IsUsingLTO && !OtherIsUsingLTO) ||
(IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full)))
if (IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full))
D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"
<< ((IsPS4 && !UnifiedLTO) ? "-flto=full" : "-flto");

// Propagate -fwhole-program-vtables if this is an LTO compile.
if (IsUsingLTO)
// Propagate -fwhole-program-vtables if this is an LTO compile or at least
// optimization level is O1 so that the devirtualization be effective
// outside LTO.
if (const Arg *A = Args.getLastArg(options::OPT_O_Group);
(A && !A->getOption().matches(options::OPT_O0)) || IsUsingLTO)
CmdArgs.push_back("-fwhole-program-vtables");
}

Expand Down
8 changes: 8 additions & 0 deletions clang/test/CodeGenCXX/type-metadata.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@
// RUN: %clang_cc1 -O2 -flto -flto-unit -triple x86_64-unknown-linux -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=ITANIUM-OPT --check-prefix=ITANIUM-OPT-LAYOUT %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TT-MS %s

// Test for the whole-program-vtables feature in nonlto mode:
// RUN: %clang_cc1 -triple x86_64-unknown-linux -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=VTABLE-OPT --check-prefix=TT-ITANIUM-DEFAULT-NOLTO %s

// Tests for cfi + whole-program-vtables:
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-unknown-linux -fvisibility=hidden -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=ITANIUM-HIDDEN --check-prefix=ITANIUM-COMMON-MD --check-prefix=TC-ITANIUM --check-prefix=ITANIUM-NO-RV-MD %s
// RUN: %clang_cc1 -flto -flto-unit -triple x86_64-pc-windows-msvc -fsanitize=cfi-vcall -fsanitize-trap=cfi-vcall -fwhole-program-vtables -emit-llvm -o - %s | FileCheck --check-prefix=CFI --check-prefix=CFI-VT --check-prefix=MS --check-prefix=MS-TYPEMETADATA --check-prefix=TC-MS %s
Expand Down Expand Up @@ -178,6 +181,7 @@ void af(A *a) {
// TT-ITANIUM-HIDDEN: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-ITANIUM-DEFAULT: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TT-MS: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !"?AUA@@")
// TT-ITANIUM-DEFAULT-NOLTO: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// TC-ITANIUM: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-ITANIUM-RV: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata !"_ZTS1A")
// TC-MS: [[PAIR:%[^ ]*]] = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
Expand Down Expand Up @@ -212,6 +216,7 @@ void df1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@@")
// TT-ITANIUM-DEFAULT-NOLTO: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 0, metadata ![[DTYPE:[0-9]+]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@@")
Expand All @@ -224,6 +229,7 @@ void dg1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUB@@")
// TT-ITANIUM-DEFAULT-NOLTO: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTS1B")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTS1B")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTS1B")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUB@@")
Expand All @@ -236,6 +242,7 @@ void dh1(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE:[0-9]+]])
// TT-ITANIUM-DEFAULT-NOLTO: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata ![[DTYPE]])
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 16, metadata ![[DTYPE]])
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE]])
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata ![[DTYPE:[0-9]+]])
Expand Down Expand Up @@ -297,6 +304,7 @@ void f(D *d) {
// TT-ITANIUM-HIDDEN: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-ITANIUM-DEFAULT: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TT-MS: {{%[^ ]*}} = call i1 @llvm.type.test(ptr {{%[^ ]*}}, metadata !"?AUA@test2@@")
// TT-ITANIUM-DEFAULT-NOLTO: {{%[^ ]*}} = call i1 @llvm.public.type.test(ptr {{%[^ ]*}}, metadata !"_ZTSN5test21DE")
// TC-ITANIUM: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 8, metadata !"_ZTSN5test21DE")
// TC-ITANIUM-RV: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load.relative(ptr {{%[^ ]*}}, i32 4, metadata !"_ZTSN5test21DE")
// TC-MS: {{%[^ ]*}} = call { ptr, i1 } @llvm.type.checked.load(ptr {{%[^ ]*}}, i32 0, metadata !"?AUA@test2@@")
Expand Down
10 changes: 3 additions & 7 deletions clang/test/Driver/whole-program-vtables.c
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
// RUN: not %clang -target x86_64-unknown-linux -fwhole-program-vtables -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -### -- %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
// NO-LTO: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -O1 -### %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
// RUN: %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -O1 -### -- %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
// WPD-NO-LTO: "-fwhole-program-vtables"

// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO %s
// LTO: "-fwhole-program-vtables"

/// -funified-lto does not imply -flto, so we still get an error that fwhole-program-vtables has no effect without -flto
// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -funified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -fno-unified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s

// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -fno-whole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -fno-whole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// LTO-DISABLE-NOT: "-fwhole-program-vtables"
6 changes: 6 additions & 0 deletions llvm/include/llvm/Passes/PassBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,12 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;

/// Tuning option to enable/disable whole program devirtualization.
/// Its default value is false.
/// This is controlled by the `-whole-program-vtables` flag.
/// Used only in non-LTO mode.
bool WholeProgramDevirt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to match the proposed new option, and make it independent of LTO. We can presumably use this to do the speculative devirtualization when we are in LTO mode but don't have whole program visibility too.

};

/// This class provides access to building LLVM's passes.
Expand Down
10 changes: 7 additions & 3 deletions llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
Original file line number Diff line number Diff line change
Expand Up @@ -226,11 +226,15 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
const bool InLTOMode;
WholeProgramDevirtPass()
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true),
InLTOMode(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
const ModuleSummaryIndex *ImportSummary)
: ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
const ModuleSummaryIndex *ImportSummary,
bool InLTOMode = true)
: ExportSummary(ExportSummary), ImportSummary(ImportSummary),
InLTOMode(InLTOMode) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
Expand Down
18 changes: 18 additions & 0 deletions llvm/lib/Passes/PassBuilderPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
WholeProgramDevirt = false;
}

namespace llvm {
Expand Down Expand Up @@ -1629,6 +1630,23 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());

if (PTO.WholeProgramDevirt && LTOPhase == ThinOrFullLTOPhase::None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be guarded by whether whole program vtables was enabled? We don't for LTO. What happens if it is simply always invoked for non-LTO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Teresa, thanks for looking at the patch.
It's okay to enable it by default. I just added the flag as a first step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least WPD and LTT should not do anything if there is no type metadata (the inliner invocation is a separate story, more comments about that below).

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have that check to avoid running passes that are not needed if the option is not enabled, not only the Inliner pass, but also the WPD and LowerTypeTests pass.

MPM.addPass(WholeProgramDevirtPass(/*ExportSummary*/ nullptr,
/*ImportSummary*/ nullptr,
/*InLTOMode=*/false));
MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
lowertypetests::DropTestKind::Assume));
if (EnableModuleInliner) {
MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to reinvoke the inliner. Better to invoke WPD from the module simplifier right before the inliner is invoked there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the passes that eliminate the unused GVs are depending on the inliner where constructors get inlined so it becomes clear which GVs are used and which are not.
And the WPD needs single GV user of the vtable that contains the virtual function we need to devirtualize.
So WPD depends on (GlobalOpt/GlobalDCE) which depend on the inliner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in LTO mode we effectively get 2 rounds of inlining too: one during the pre-LTO compile, which must be what is inlining the constructors, and then one in the LTO backends after WPD. This is unfortunately a bigger change, adding another full round of inlining. But I guess ok because this won't be on by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you summarize this info in a comment (i.e. specifically why another round of inlining is needed in non-LTO mode but effectively already exists elsewhere in LTO mode)?

UseInlineAdvisor,
ThinOrFullLTOPhase::None));
} else {
MPM.addPass(ModuleInlinerWrapperPass(
getInlineParamsFromOptLevel(Level),
/* MandatoryFirst */ true,
InlineContext{ThinOrFullLTOPhase::None, InlinePass::CGSCCInliner}));
}
}
return MPM;
}

Expand Down
Loading