Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
PosFlag<SetTrue, [], [ClangOption, CC1Option],
"Emit extra debug info to make sample profile more accurate">,
NegFlag<SetFalse>>;
def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to expose this through the clang frontend?

Copy link
Contributor Author

@wlei-llvm wlei-llvm Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if my PR description is not clear. Note that there is no use for -fprofile-generate and -fprofile-instr-generate here, so a driver flag here is to configure the instr file path and make linker to link the compiler_rt.profile object files (just similar to -fprofile-[instr]-generate=).

The reason for not using -fprofile-[instr]-generate is because those flags conflict with -fprofile-sample-use, see PGOOptions, ProfileFile is a shared file path which means the two flags are actually mutually exclusive.

Another option is to make -fprofile-generate compatible with -fprofile-samle-use(like refactoring PGOOptions or adding another flag to configure the file path things), this seems to me they are more complicated than the current one. But I’m open to any suggestions on this.

Copy link
Member

@mtrofin mtrofin Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

I see, thanks for the suggestion! We also need to link runtime lib(compiler_rt.profile.a)
but yeah, I agree it's possible to pass directly to llvm and linker without through clang. Then the full command line would be like

clang ... -mllvm -instrument-cold-function-coverage -mllvm -instrument-sample-cold-function-path=<file-path> -mllvm --pgo-function-entry-coverage 
ld.lld ... --u__llvm_runtime_variable  .../lib/x86_64-unknown-linux-gnu/libclang_rt.profile.a

So yes, adding the clang driver flag is kind of to mirror current IRPGO flags, for easy maintenance purpose(given that -fprofile-generate doesn't work with -fprofile-sample-use) and also to centralize the configuration for the convenience. IMO, the compiler_rt is probably the main blocker here, I didn't find an easy way to bundle it with a llvm flag.

Appreciate any further suggestions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -Wl,-lclang_rt.profile work?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also to centralize the configuration for the convenience

+1 I think a frontend flag is useful. It allows us to identify incompatibilities early and provide clear error messages instead of more obscure failures down the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would incompatibilities and ability to provide useful error messages be dependent on the flag being in the frontend? Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict, so a message would be diluted to "sampling won't work with this"?

(probably a subject for a different forum tho)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -Wl,-lclang_rt.profile work?

Got it, I think it should work, except it requires another linker flag: the --u__llvm_runtime_variable, we can configure it to linker too, but basically those instr PGO flags control addProfileRTLibs (which seems not equivalent to -Wl,-lclang_rt.profile), I just wanted to mirror those flags so that we don't need to maintain if anything changes to addProfileRTLibs in the future. (Though I feel this code should be very stable, should not be a big problem, so mainly for the convenience for compiler user to use one flag instead of using/maintaining multiple flags )
Overall, I agree that it's feasible to do it without clang flag. I'm not very familiar with the convention for adding driver flags, so if you think this doesn't justify it, I will drop it from this patch. Thanks for the discussion!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict

Yes, if we know that this instrumentation mode should not be mixed with e.g. sanitizers or something else we can enforce these checks early. I don't see a particular downside to adding a frontend flag. The convenience of bundling the 2 lld flags and 3 mllvm flags into a single frontend flag seems like a good enough motivation to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on single driver flag for convenience.

Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
HelpText<"Generate instrumented code to cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">;
def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
Expand Down
4 changes: 3 additions & 1 deletion clang/lib/Driver/ToolChain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -889,7 +889,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
Args.hasArg(options::OPT_fprofile_instr_generate) ||
Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
Args.hasArg(options::OPT_fcreate_profile) ||
Args.hasArg(options::OPT_forder_file_instrumentation);
Args.hasArg(options::OPT_forder_file_instrumentation) ||
Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
}

bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
Expand Down
18 changes: 18 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -649,6 +649,24 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
}
}

if (auto *ColdFuncCoverageArg = Args.getLastArg(
options::OPT_fprofile_generate_cold_function_coverage,
options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
SmallString<128> Path(
ColdFuncCoverageArg->getOption().matches(
options::OPT_fprofile_generate_cold_function_coverage_EQ)
? ColdFuncCoverageArg->getValue()
: "");
llvm::sys::path::append(Path, "default_%m.profraw");
CmdArgs.push_back("-mllvm");
CmdArgs.push_back(Args.MakeArgString(
Twine("--instrument-cold-function-coverage-path=") + Path));
CmdArgs.push_back("-mllvm");
CmdArgs.push_back("--instrument-cold-function-coverage");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two flags seem duplicative. Path is guaranteed to be not empty here, so the boolean flag seems unnecessary?

Relatedly, I know this might not be easy. But while it's convenient to accept a new driver flag, I still kept wondering if we can communicate the single driver flag to LLVM using existing and orthogonal flags as much as possible. Like this:

-fprofile-instrument-path= // reuse existing flag to communicate file path
-pgo-function-entry-coverage // reuse existing flag to communicate coverage only
-pgo-instrument-cold-function-only // add new flag to communicate cold function only

I know that the current implementation has assumption when InstrProfileOutput is not empty, but I still wonder is there a way recognize the use of these three flags together, and special case for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea, however, this seems indeed not easy(needs a lot of refactoring).

As you said, the InstrProfileOutput filed is shared with other PGO flag(in our case, it's the fprofile-sample-use=) , see PGOOptions.ProfileFIle. One thing we can do is to split this field into two: ProfileUseFile and ProfileGenFole, let all profile-<instr>-gererate go to ProfileGenFile and let sample-use/instr-use go to ProfileUseFile. I think that's doable and not a big change(maybe a separate diff). Then in sample-use case, we can use the ProfileGenFIle path for cold coverage work. WDYT?

Though it would still not be perfect, the logic here https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/BackendUtil.cpp#L811-L841 is complicated/ would still need more refactoring. I was stuck here when thinking if -fprofile-sample-use and --fprofile-instrument-path= are both used, which branch(CodeGenOpts.hasProfileIRInstr() or !CodeGenOpts.SampleProfileFile.empty()) should it belongs to.

For this version, I just removed the duplicated flag(--pgo-instrument-cold-function-only) and added a FIXME comment to explain this.

CmdArgs.push_back("-mllvm");
CmdArgs.push_back("--pgo-function-entry-coverage");
}

Arg *PGOGenArg = nullptr;
if (PGOGenerateArg) {
assert(!CSPGOGenerateArg);
Expand Down
19 changes: 19 additions & 0 deletions clang/test/CodeGen/pgo-cold-function-coverage.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Test -fprofile-generate-cold-function-coverage

// RUN: split-file %s %t
// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%t/pgo-cold-func.prof -S -emit-llvm -o - %t/pgo-cold-func.c | FileCheck %s

// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"

// CHECK: store i8 0, ptr @__profc_bar, align 1
// CHECK-NOT: @__profc_foo

//--- pgo-cold-func.prof
foo:1:1
1: 1

//--- pgo-cold-func.c
int bar(int x) { return x;}
int foo(int x) {
return x;
}
9 changes: 9 additions & 0 deletions clang/test/Driver/fprofile-generate-cold-function-coverage.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
// RUN: %clang -### -c -fprofile-generate-cold-function-coverage %s 2>&1 | FileCheck %s
// CHECK: "--instrument-cold-function-coverage-path=default_%m.profraw"
// CHECK: "--instrument-cold-function-coverage"
// CHECK: "--pgo-function-entry-coverage"
// CHECK-NOT: "-fprofile-instrument"
// CHECK-NOT: "-fprofile-instrument-path=

// RUN: %clang -### -c -fprofile-generate-cold-function-coverage=dir %s 2>&1 | FileCheck %s --check-prefix=CHECK-EQ
// CHECK-EQ: "--instrument-cold-function-coverage-path=dir{{/|\\\\}}default_%m.profraw"
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ class PGOInstrumentationGenCreateVar
bool ProfileSampling;
};

enum class InstrColdFuncCovMode { Conservative = 0, Optimistic };

enum class PGOInstrumentationType { Invalid = 0, FDO, CSFDO, CTXPROF };
/// The instrumentation (profile-instr-gen) pass for IR based PGO.
class PGOInstrumentationGen : public PassInfoMixin<PGOInstrumentationGen> {
Expand Down
18 changes: 17 additions & 1 deletion llvm/lib/Passes/PassBuilderPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,13 @@ static cl::opt<bool> UseLoopVersioningLICM(
"enable-loop-versioning-licm", cl::init(false), cl::Hidden,
cl::desc("Enable the experimental Loop Versioning LICM pass"));

static cl::opt<std::string> InstrumentColdFuncCoveragePath(
"instrument-cold-function-coverage-path", cl::init(""),
cl::desc("File path for cold function coverage instrumentation"),
cl::Hidden);

extern cl::opt<std::string> UseCtxProfile;
extern cl::opt<bool> InstrumentColdFunctionCoverage;

namespace llvm {
extern cl::opt<bool> EnableMemProfContextDisambiguation;
Expand Down Expand Up @@ -1184,8 +1190,13 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
const bool IsCtxProfUse =
!UseCtxProfile.empty() && Phase == ThinOrFullLTOPhase::ThinLTOPreLink;

// Enable cold function coverage instrumentation if
// InstrumentColdFuncCoveragePath is provided.
const bool IsColdFuncCoverageGen = InstrumentColdFunctionCoverage =
IsPGOPreLink && !InstrumentColdFuncCoveragePath.empty();

if (IsPGOInstrGen || IsPGOInstrUse || IsMemprofUse || IsCtxProfGen ||
IsCtxProfUse)
IsCtxProfUse || IsColdFuncCoverageGen)
addPreInlinerPasses(MPM, Level, Phase);

// Add all the requested passes for instrumentation PGO, if requested.
Expand All @@ -1207,6 +1218,11 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
return MPM;
addPostPGOLoopRotation(MPM, Level);
MPM.addPass(PGOCtxProfLoweringPass());
} else if (IsColdFuncCoverageGen) {
addPGOInstrPasses(
MPM, Level, /* RunProfileGen */ true, /* IsCS */ false,
/* AtomicCounterUpdate */ false, InstrumentColdFuncCoveragePath,
/* ProfileRemappingFile */ "", IntrusiveRefCntPtr<vfs::FileSystem>());
}

if (IsPGOInstrGen || IsPGOInstrUse || IsCtxProfGen)
Expand Down
29 changes: 29 additions & 0 deletions llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,29 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
cl::desc("Do not instrument functions with the number of critical edges "
" greater than this threshold."));

static cl::opt<uint64_t> ColdFuncCoverageMaxEntryCount(
"cold-function-coverage-max-entry-count", cl::init(0), cl::Hidden,
cl::desc("When enabling cold function coverage instrumentation, skip "
"instrumenting the function whose entry count is above the given "
"value"));

static cl::opt<InstrColdFuncCovMode> InstrumentColdFunctionCoverageMode(
"instrument-cold-function-coverage-mode",
cl::init(InstrColdFuncCovMode::Conservative), cl::Hidden,
cl::desc("Control whether instrumenting unprofiled functions for cold "
"function coverage."),
cl::values(
clEnumValN(InstrColdFuncCovMode::Conservative, "conservative",
"Assume unprofiled functions are not cold, skip "
"instrumenting them."),
clEnumValN(InstrColdFuncCovMode::Optimistic, "optimistic",
"Treat unprofiled functions as cold and instrument them.")));

cl::opt<bool> InstrumentColdFunctionCoverage(
"instrument-cold-function-coverage", cl::init(false), cl::Hidden,
cl::desc("Enable cold function coverage instrumentation (currently only "
"used under sampling PGO pipeline)"));

extern cl::opt<unsigned> MaxNumVTableAnnotations;

namespace llvm {
Expand Down Expand Up @@ -1891,6 +1914,12 @@ static bool skipPGOGen(const Function &F) {
return true;
if (F.getInstructionCount() < PGOFunctionSizeThreshold)
return true;
if (InstrumentColdFunctionCoverage) {
if (!F.getEntryCount())
return InstrumentColdFunctionCoverageMode ==
InstrColdFuncCovMode::Conservative;
return F.getEntryCount()->getCount() > ColdFuncCoverageMaxEntryCount;
}
return false;
}

Expand Down
35 changes: 35 additions & 0 deletions llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -pgo-function-entry-coverage -S | FileCheck --check-prefixes=COLD %s
; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -pgo-function-entry-coverage -cold-function-coverage-max-entry-count=1 -S | FileCheck --check-prefixes=ENTRY-COUNT %s
; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -pgo-function-entry-coverage -instrument-cold-function-coverage-mode=optimistic -S | FileCheck --check-prefixes=UNPROFILED-FUNC %s

; COLD: call void @llvm.instrprof.cover(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
; COLD-NOT: __profn_main
; COLD-NOT: __profn_bar

; ENTRY-COUNT: call void @llvm.instrprof.cover(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
; ENTRY-COUNT: call void @llvm.instrprof.cover(ptr @__profn_main, i64 [[#]], i32 1, i32 0)

; UNPROFILED-FUNC: call void @llvm.instrprof.cover(ptr @__profn_bar, i64 [[#]], i32 1, i32 0)
; UNPROFILED-FUNC: call void @llvm.instrprof.cover(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)


target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @bar() {
entry:
ret void
}

define void @foo() !prof !0 {
entry:
ret void
}

define i32 @main() !prof !1 {
entry:
ret i32 0
}

!0 = !{!"function_entry_count", i64 0}
!1 = !{!"function_entry_count", i64 1}
Loading