Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,7 @@ Modified Compiler Flags
-----------------------
- The `-gkey-instructions` compiler flag is now enabled by default when DWARF is emitted for plain C/C++ and optimizations are enabled. (#GH149509)
- The `-fconstexpr-steps` compiler flag now accepts value `0` to opt out of this limit. (#GH160440)
- The `-fdevirtualize-speculatively` compiler flag is now supported to enable speculative devirtualization of virtual function calls, it's disabled by default. (#GH159685)

Removed Compiler Flags
-------------------------
Expand Down
52 changes: 52 additions & 0 deletions clang/docs/UsersManual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2352,6 +2352,56 @@ are listed below.
pure ThinLTO, as all split regular LTO modules are merged and LTO linked
with regular LTO.

.. option:: -fdevirtualize-speculatively

Enable speculative devirtualization optimization where a virtual call
can be transformed into a direct call under the assumption that its
object is of a particular type. A runtime check is inserted to validate
the assumption before making the direct call, and if the check fails,
the original virtual call is made instead. This optimization can enable
more inlining opportunities and better optimization of the direct call.
This is different from whole program devirtualization optimization
that rely on global analysis and hidden visibility of the objects to prove
that the object is always of a particular type at a virtual call site.
This optimization doesn't require global analysis or hidden visibility.
This optimization doesn't devirtualize all virtual calls, but only
when there's a single implementation of the virtual function in the module.
There could be a single implementation of the virtual function
either because the function is not overridden in any derived class,
or because all objects are instances of the same class/type.

Ex of IR before the optimization:

.. code-block:: llvm

%vtable = load ptr, ptr %BV, align 8, !tbaa !6
%0 = tail call i1 @llvm.public.type.test(ptr %vtable, metadata !"_ZTS4Base")
tail call void @llvm.assume(i1 %0)
%0 = load ptr, ptr %vtable, align 8
tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
ret void

IR after the optimization:

.. code-block:: llvm

%vtable = load ptr, ptr %BV, align 8, !tbaa !12
%0 = load ptr, ptr %vtable, align 8
%1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev
br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15
if.true.direct_targ: ; preds = %entry
tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV)
br label %if.end.icp
if.false.orig_indirect: ; preds = %entry
tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
br label %if.end.icp
if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ
ret void

This feature is temporarily ignored at the LLVM side when LTO is enabled.
TODO: Update the comment when the LLVM side supports this feature for LTO.
This feature is turned off by default.

.. option:: -f[no-]unique-source-file-names

When enabled, allows the compiler to assume that each object file
Expand Down Expand Up @@ -5216,6 +5266,8 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
-fdevirtualize-speculatively
Enables speculative devirtualization optimization.
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Basic/CodeGenOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,8 @@ VALUE_CODEGENOPT(WarnStackSize , 32, UINT_MAX, Benign) ///< Set via -fwarn-s
CODEGENOPT(NoStackArgProbe, 1, 0, Benign) ///< Set when -mno-stack-arg-probe is used
CODEGENOPT(EmitLLVMUseLists, 1, 0, Benign) ///< Control whether to serialize use-lists.

CODEGENOPT(DevirtualizeSpeculatively, 1, 0, Benign) ///< Whether to apply the speculative
/// devirtualization optimization.
CODEGENOPT(WholeProgramVTables, 1, 0, Benign) ///< Whether to apply whole-program
/// vtable optimization.

Expand Down
12 changes: 9 additions & 3 deletions clang/include/clang/Options/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -4512,6 +4512,13 @@ defm new_infallible : BoolFOption<"new-infallible",
BothFlags<[], [ClangOption, CC1Option],
" treating throwing global C++ operator new as always returning valid memory "
"(annotates with __attribute__((returns_nonnull)) and throw()). This is detectable in source.">>;
defm devirtualize_speculatively
: BoolFOption<"devirtualize-speculatively",
CodeGenOpts<"DevirtualizeSpeculatively">, DefaultFalse,
PosFlag<SetTrue, [], [],
"Enables speculative devirtualization optimization.">,
NegFlag<SetFalse>,
BothFlags<[], [ClangOption, CLOption, CC1Option]>>;
defm whole_program_vtables : BoolFOption<"whole-program-vtables",
CodeGenOpts<"WholeProgramVTables">, DefaultFalse,
PosFlag<SetTrue, [], [ClangOption, CC1Option],
Expand Down Expand Up @@ -7122,9 +7129,8 @@ defm variable_expansion_in_unroller : BooleanFFlag<"variable-expansion-in-unroll
Group<clang_ignored_gcc_optimization_f_Group>;
defm web : BooleanFFlag<"web">, Group<clang_ignored_gcc_optimization_f_Group>;
defm whole_program : BooleanFFlag<"whole-program">, Group<clang_ignored_gcc_optimization_f_Group>;
defm devirtualize : BooleanFFlag<"devirtualize">, Group<clang_ignored_gcc_optimization_f_Group>;
defm devirtualize_speculatively : BooleanFFlag<"devirtualize-speculatively">,
Group<clang_ignored_gcc_optimization_f_Group>;
defm devirtualize : BooleanFFlag<"devirtualize">,
Group<clang_ignored_gcc_optimization_f_Group>;

// Generic gfortran options.
def A_DASH : Joined<["-"], "A-">, Group<gfortran_Group>;
Expand Down
1 change: 1 addition & 0 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -940,6 +940,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
PTO.DevirtualizeSpeculatively = CodeGenOpts.DevirtualizeSpeculatively;

LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
Expand Down
18 changes: 12 additions & 6 deletions clang/lib/CodeGen/CGClass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2827,10 +2827,15 @@ void CodeGenFunction::EmitTypeMetadataCodeForVCall(const CXXRecordDecl *RD,
SourceLocation Loc) {
if (SanOpts.has(SanitizerKind::CFIVCall))
EmitVTablePtrCheckForCall(RD, VTable, CodeGenFunction::CFITCK_VCall, Loc);
else if (CGM.getCodeGenOpts().WholeProgramVTables &&
// Don't insert type test assumes if we are forcing public
// visibility.
!CGM.AlwaysHasLTOVisibilityPublic(RD)) {
// Emit the intrinsics of (type_test and assume) for the features of WPD and
// speculative devirtualization. For WPD, emit the intrinsics only for the
// case of non_public LTO visibility.
// TODO: refactor this condition and similar ones into a function (e.g.,
// ShouldEmitDevirtualizationMD) to encapsulate the details of the different
// types of devirtualization.
else if ((CGM.getCodeGenOpts().WholeProgramVTables &&
!CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we document the implications that this has? This seems like it would break linking, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking? I think linking is not related here, maybe you got confused by the LTO related code ?
LTO related code is old code, I just modified it to also emit the type test assumes when the flag of devirt is enabled. But the change here is not related to LTO or linking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with devirtualization is always that we don't know what is going on in other TUs, right? There could be other variants of the function in other types in the same virtual tree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right. That's why this is only 'speculative' so it should be safe regardless of how other modules could affect the inheritance tree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain "should be safe"? Do you mean: "The user is promising us that it is safe", or "we are proving it is safe"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speculative devirtualization transforms an indirect call to a guarded direct call. It is guarded by a comparison of the virtual function pointer to the expected target. So it is always safe. We also do this with profile information (in that case aka IndirectCallPromotion).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speculative devirtualization transforms an indirect call to a guarded direct call. It is guarded by a comparison of the virtual function pointer to the expected target. So it is always safe. We also do this with profile information (in that case aka IndirectCallPromotion).

Ok, this seems interesting and more sound. Can you share some IR of the 'after' here that shows that? More importantly, can we see some tests to that effect?

CanQualType Ty = CGM.getContext().getCanonicalTagType(RD);
llvm::Metadata *MD = CGM.CreateMetadataIdentifierForType(Ty);
llvm::Value *TypeId =
Expand Down Expand Up @@ -2988,8 +2993,9 @@ void CodeGenFunction::EmitVTablePtrCheck(const CXXRecordDecl *RD,
}

bool CodeGenFunction::ShouldEmitVTableTypeCheckedLoad(const CXXRecordDecl *RD) {
if (!CGM.getCodeGenOpts().WholeProgramVTables ||
!CGM.HasHiddenLTOVisibility(RD))
if ((!CGM.getCodeGenOpts().WholeProgramVTables ||
!CGM.HasHiddenLTOVisibility(RD)) &&
!CGM.getCodeGenOpts().DevirtualizeSpeculatively)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, is this devirtualizing ALL calls? How can that work? That doesn't seem like an appropriate implementation?

I find myself wondering if this needs to be a 'smaller hammer' here other than TU level.

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @erichkeane
Thanks for reviewing :)
This patch enables speculative devirtualization for all calls 👀 but ONLY as long as there is a single possible callee equivalent to that virtual function (when there is a single implementation of the virtual function or there is a single initialisation of its related virtual table).
Here is an example in gcc:
https://godbolt.org/z/f4z5Wofaf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how can we KNOW there is only a single implementation of the virtual function cross TU? thats the point about linking above. In the GCC case, we know the concrete type of ArrayVectorBase, so doing a devirtualization should be a pretty much no-brainer (and I think we already do something similar?).

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang does not do the same: https://godbolt.org/z/fE48b3eK3
Here are other cases that the type is not clear:
https://godbolt.org/z/Gs73WTvbM
https://godbolt.org/z/zrM6svM8r

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other answer, but basically it is guarded by a function pointer comparison.

@hassnaaHamdi I'll take a look at the code changes later today, but perhaps the PR description needs to have an explanation of what is meant by speculative devirtualization, with a short before and after code snippet.

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are different cases of usages of virtual function, there are cpp code which shows the c++ test case, another tap for IR without enabling devirtualization and another tap for the IR when devietualization is enabled.
There are some comments in the CPP code and the generated Vtables in the IR.

  1. https://godbolt.org/z/sxWh5G3v6
  2. https://godbolt.org/z/ffK8qWTYo
  3. https://godbolt.org/z/MTd89obfE
  4. https://godbolt.org/z/PenWobYn8
  5. https://godbolt.org/z/q8E3dqdxa
  6. https://godbolt.org/z/PjGs6Kjx1
  7. https://godbolt.org/z/bbKPTx4Wr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! The branch based on the equality of the vtables wasn't something that was clear to me, and I would love to see a clang-level test that shows us generating that without the pass happening? (or am I missing something else?).

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang here only emits the the intrinsics of (public_type_test and assume) and type metadata for each related vtable (as part of the CGVtables.cpp logic). You can see those metadata nodes at the end of the IR.

For example here the vtable of Derived class without the devirt flag:
@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI7Derived, ptr @_ZN4Base16virtual_functionEv] }, comdat, align 8
it doesn't refer to any metadata nodes.

While when we use the devirt flag, it does:
@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI7Derived, ptr @_ZN4Base16virtual_functionEv] }, comdat, align 8, !type !0, !type !1, !type !2, !type !3

That logic already exists in Clang for the LTO/wholeprogramdevirt feature, in this patch I just ask Clang to also emit it when the speculative_devirt flag is enabled.
So, Clang only emits those data, while the backend (WholeProgramDevirt.cpp pass) makes use of those data and decide if we devirtualize or not.

I would love to see a clang-level test that shows us generating that without the pass happening?

Does this test fulfil your ask ? clang/test/CodeGenCXX/type-metadata.cpp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test isn't quite 'clear' enough? Though perhaps it is good enough for lit.

Either way, I think I understand enought o be happy, thanks for your patience.

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are a lot of other test cases in that test file. I will create another test dedicated to show the clang effects after enabling the flag.

Comment on lines +2996 to +2998
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to do that check in multiple places - maybe a ShouldPerformSpeculativeDevirtualization() function would be useful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which check specifically? this one: CGM.getCodeGenOpts().DevirtualizeSpeculatively? or the whole if-statement ?
If you mean the whole if-statement, this patch is not related to the part of WPD, but only the speculative devirtualization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to refactor the whole if statement, although there are a few different variations of the combination of conditions being checked. But yes, it should be named something more general, like shouldEmitVTableTypeInfo or something like that.

return false;

if (CGM.getCodeGenOpts().VirtualFunctionElimination)
Expand Down
6 changes: 4 additions & 2 deletions clang/lib/CodeGen/CGVTables.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1363,10 +1363,12 @@ llvm::GlobalObject::VCallVisibility CodeGenModule::GetVCallVisibilityLevel(
void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
llvm::GlobalVariable *VTable,
const VTableLayout &VTLayout) {
// Emit type metadata on vtables with LTO or IR instrumentation.
// Emit type metadata on vtables with LTO or IR instrumentation or
// speculative devirtualization.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
!getCodeGenOpts().DevirtualizeSpeculatively)
return;

CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
Expand Down
23 changes: 15 additions & 8 deletions clang/lib/CodeGen/ItaniumCXXABI.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -716,10 +716,14 @@ CGCallee ItaniumCXXABI::EmitLoadOfMemberFunctionPointer(

bool ShouldEmitVFEInfo = CGM.getCodeGenOpts().VirtualFunctionElimination &&
CGM.HasHiddenLTOVisibility(RD);
// TODO: Update this name not to be restricted to WPD only
// as we now emit the vtable info info for speculative devirtualization as
// well.
bool ShouldEmitWPDInfo =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that the WPD terminology is legacy and we now also perform speculative devirtualization using the same info.

CGM.getCodeGenOpts().WholeProgramVTables &&
// Don't insert type tests if we are forcing public visibility.
!CGM.AlwaysHasLTOVisibilityPublic(RD);
(CGM.getCodeGenOpts().WholeProgramVTables &&
// Don't insert type tests if we are forcing public visibility.
!CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
CGM.getCodeGenOpts().DevirtualizeSpeculatively;
llvm::Value *VirtualFn = nullptr;

{
Expand Down Expand Up @@ -2110,17 +2114,20 @@ void ItaniumCXXABI::emitVTableDefinitions(CodeGenVTables &CGVT,

// Always emit type metadata on non-available_externally definitions, and on
// available_externally definitions if we are performing whole program
// devirtualization. For WPD we need the type metadata on all vtable
// definitions to ensure we associate derived classes with base classes
// defined in headers but with a strong definition only in a shared library.
// devirtualization or speculative devirtualization. We need the type metadata
// on all vtable definitions to ensure we associate derived classes with base
// classes defined in headers but with a strong definition only in a shared
// library.
if (!VTable->isDeclarationForLinker() ||
CGM.getCodeGenOpts().WholeProgramVTables) {
CGM.getCodeGenOpts().WholeProgramVTables ||
CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
CGM.EmitVTableTypeMetadata(RD, VTable, VTLayout);
// For available_externally definitions, add the vtable to
// @llvm.compiler.used so that it isn't deleted before whole program
// analysis.
if (VTable->isDeclarationForLinker()) {
assert(CGM.getCodeGenOpts().WholeProgramVTables);
assert(CGM.getCodeGenOpts().WholeProgramVTables ||
CGM.getCodeGenOpts().DevirtualizeSpeculatively);
CGM.addCompilerUsedGlobal(VTable);
}
}
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7745,6 +7745,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,

addOpenMPHostOffloadingArgs(C, JA, Args, CmdArgs);

if (Args.hasFlag(options::OPT_fdevirtualize_speculatively,
options::OPT_fno_devirtualize_speculatively,
/*Default value*/ false))
CmdArgs.push_back("-fdevirtualize-speculatively");

bool VirtualFunctionElimination =
Args.hasFlag(options::OPT_fvirtual_function_elimination,
options::OPT_fno_virtual_function_elimination, false);
Expand Down
78 changes: 78 additions & 0 deletions clang/test/CodeGenCXX/speculative-devirt-metadata.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
// Test that Clang emits vtable metadata when speculative devirtualization is enabled.
// RUN: %clang_cc1 -triple x86_64-unknown-linux -fdevirtualize-speculatively -emit-llvm -o - %s | FileCheck --check-prefix=CHECK %s

struct A {
A();
virtual void f();
};

struct B : virtual A {
B();
virtual void g();
virtual void h();
};

namespace {

struct D : B {
D();
virtual void f();
virtual void h();
};

}

A::A() {}
B::B() {}
D::D() {}

void A::f() {
}

void B::g() {
}

void D::f() {
}

void D::h() {
}

void af(A *a) {
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
a->f();
}

void dg1(D *d) {
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1B")
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
d->g();
}

void df1(D *d) {
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !11)
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
d->f();
}

void dh1(D *d) {
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !11)
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
d->h();
}


D d;

void foo() {
dg1(&d);
df1(&d);
dh1(&d);


struct FA : A {
void f() {}
} fa;
af(&fa);
}
2 changes: 0 additions & 2 deletions clang/test/Driver/clang_f_opts.c
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,6 @@
// RUN: -ftree-ter \
// RUN: -ftree-vrp \
// RUN: -fno-devirtualize \
// RUN: -fno-devirtualize-speculatively \
// RUN: -fslp-vectorize-aggressive \
// RUN: -fno-slp-vectorize-aggressive \
// RUN: %s 2>&1 | FileCheck --check-prefix=CHECK-WARNING %s
Expand Down Expand Up @@ -436,7 +435,6 @@
// CHECK-WARNING-DAG: optimization flag '-ftree-ter' is not supported
// CHECK-WARNING-DAG: optimization flag '-ftree-vrp' is not supported
// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize' is not supported
// CHECK-WARNING-DAG: optimization flag '-fno-devirtualize-speculatively' is not supported
// CHECK-WARNING-DAG: the flag '-fslp-vectorize-aggressive' has been deprecated and will be ignored
// CHECK-WARNING-DAG: the flag '-fno-slp-vectorize-aggressive' has been deprecated and will be ignored

Expand Down
4 changes: 4 additions & 0 deletions llvm/include/llvm/Passes/PassBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,10 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;

// Tuning option to enable/disable speculative devirtualization.
// Its default value is false.
bool DevirtualizeSpeculatively;
};

/// This class provides access to building LLVM's passes.
Expand Down
7 changes: 5 additions & 2 deletions llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
Original file line number Diff line number Diff line change
Expand Up @@ -226,11 +226,14 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
bool DevirtSpeculatively = false;
WholeProgramDevirtPass()
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
const ModuleSummaryIndex *ImportSummary)
: ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
const ModuleSummaryIndex *ImportSummary,
bool DevirtSpeculatively = false)
: ExportSummary(ExportSummary), ImportSummary(ImportSummary),
DevirtSpeculatively(DevirtSpeculatively) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
Expand Down
36 changes: 36 additions & 0 deletions llvm/lib/Passes/PassBuilderPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,13 @@ static cl::opt<std::string> InstrumentColdFuncOnlyPath(
"with --pgo-instrument-cold-function-only)"),
cl::Hidden);

// TODO: There is a similar flag in WPD pass, we should consolidate them by
// parsing the option only once in PassBuilder and share it across both places.
static cl::opt<bool> EnableDevirtualizeSpeculatively(
"enable-devirtualize-speculatively",
cl::desc("Enable speculative devirtualization optimization"),
cl::init(false));
Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable PhaseOrdering testing, I created that flag since devirtualization is conditionally added to the pipeline. However, since a similar flag exists in the WholeProgramDevirt pass itself, the optimal solution would be to: remove both flags, add flag parsing logic to PassBuilder (similar to some other passes), and update the WholeProgramDevirt pass accordingly. This would provide single flag handling across both PassBuilderPipeline and the pass implementation.
So, I added a TODO to add that change later.


extern cl::opt<std::string> UseCtxProfile;
extern cl::opt<bool> PGOInstrumentColdFunctionOnly;

Expand All @@ -326,6 +333,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
DevirtualizeSpeculatively = EnableDevirtualizeSpeculatively;
}

namespace llvm {
Expand Down Expand Up @@ -1655,6 +1663,34 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());

// Add devirtualization pass only when LTO is not enabled, as otherwise
// the pass is already enabled in the LTO pipeline.
if (PTO.DevirtualizeSpeculatively && LTOPhase == ThinOrFullLTOPhase::None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO about LTO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, TODO about what ? This check here is because that if LTO is enabled, the WPD pass will be already added to the LTO pipeline, so we should not add it again here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the late reply, I initially thought I understood the comment, but now not 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, this isn't the right place to add the LTO speculative devirt support so a TODO doesn't make sense. Current comment is fine.

// TODO: explore a better pipeline configuration that can improve
// compilation time overhead.
MPM.addPass(WholeProgramDevirtPass(
/*ExportSummary*/ nullptr,
/*ImportSummary*/ nullptr,
/*DevirtSpeculatively*/ PTO.DevirtualizeSpeculatively));
MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
lowertypetests::DropTestKind::Assume));
// Given that the devirtualization creates more opportunities for inlining,
// we run the Inliner again here to maximize the optimization gain we
// get from devirtualization.
// Also, we can't run devirtualization before inlining because the
// devirtualization depends on the passes optimizing/eliminating vtable GVs
// and those passes are only effective after inlining.
if (EnableModuleInliner) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why you need another round of inlining vs doing this before the earlier inline pass? Is this an optimization for a specific use case that needs 2 rounds of inlining? I'm concerned about the potential side effects.

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments about this:

// Given that the devirtualization creates more opportunities for inlining,
// we run the Inliner again here to maximize the optimization gain we
// get from devirtualization.
// Also, we can't run devirtualization before inlining because the
// devirtualization depends on the passes optimizing/eliminating vtable GVs
// and those passes are only effective after inlining.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what vtable GV optimization is needed to expose these opportunities?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a test being added that would fail without this inliner invocation?

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what vtable GV optimization is needed to expose these opportunities?

I mean optimisation passes like GlobalOpt.cpp pass which eliminates the unused GVs, and then when devirtualization works, it finds a single GV that is refering to the virtual function, so it can devirtualize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look at how the DevirtSCCRepeatedPass is invoked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the compile-time on ct-mark, and the compilation overhead seems big for each of the inliner and the WPD.

Copy link
Member Author

@hassnaaHamdi hassnaaHamdi Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that integrating the WPD pass into another pipeline is non-trivial and would require changes across multiple places. Additionally, I'm uncertain if this approach will succeed given that WPD is a module pass.
And maybe (maybe) it could be better to have a different pass for non-to speculative devirtualization isolated from all the LTO stuff and in that case it's easier to add any needed changes into the pass without touching LTO things.
But anyway, I'll need to investigate the suggested approach further before implementing.

Does it make sense to keep the current opt-in speculative devirtualization changes for now, and revisit the pipeline integration as a follow-up?
Of course, I'm still open to pursuing the correct approach if the current implementation is not good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to putting it in as is for now, since it is disabled by default, and adding a TODO to explore a better pipeline configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the TODO.

MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
UseInlineAdvisor,
ThinOrFullLTOPhase::None));
} else {
MPM.addPass(ModuleInlinerWrapperPass(
getInlineParamsFromOptLevel(Level),
/* MandatoryFirst */ true,
InlineContext{ThinOrFullLTOPhase::None, InlinePass::CGSCCInliner}));
}
}
return MPM;
}

Expand Down
Loading
Loading