Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
Original file line number Diff line number Diff line change
Expand Up @@ -131,13 +131,16 @@ struct Spec {
// Profitability of the specialization.
unsigned Score;

// Cost of the specialization, in terms of codesize.
unsigned CodeSizeCost;

// List of call sites, matching this specialization.
SmallVector<CallBase *> CallSites;

Spec(Function *F, const SpecSig &S, unsigned Score)
: F(F), Sig(S), Score(Score) {}
Spec(Function *F, const SpecSig &&S, unsigned Score)
: F(F), Sig(S), Score(Score) {}
Spec(Function *F, const SpecSig &S, unsigned Score, unsigned CodeSizeCost)
: F(F), Sig(S), Score(Score), CodeSizeCost(CodeSizeCost) {}
Spec(Function *F, const SpecSig &&S, unsigned Score, unsigned CodeSizeCost)
: F(F), Sig(S), Score(Score), CodeSizeCost(CodeSizeCost) {}
};

class InstCostVisitor : public InstVisitor<InstCostVisitor, Constant *> {
Expand Down
44 changes: 25 additions & 19 deletions llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,18 @@ FunctionSpecializer::~FunctionSpecializer() {
cleanUpSSA();
}

/// Get the unsigned Value of given Cost object. Assumes the Cost is always
/// non-negative, which is true for both TCK_CodeSize and TCK_Latency, and
/// always Valid.
static unsigned getCostValue(const Cost &C) {
int64_t Value = *C.getValue();

assert(Value >= 0 && "CodeSize and Latency cannot be negative");
// It is safe to down cast since we know the arguments cannot be negative and
// Cost is of type int64_t.
return static_cast<unsigned>(Value);
}

/// Attempt to specialize functions in the module to enable constant
/// propagation across function boundaries.
///
Expand Down Expand Up @@ -759,6 +771,14 @@ bool FunctionSpecializer::run() {
SmallVector<Function *> Clones;
for (unsigned I = 0; I < NSpecs; ++I) {
Spec &S = AllSpecs[BestSpecs[I]];

// Check that creating this specialization doesn't exceed the maximum
// codesize growth.
unsigned FuncSize = getCostValue(FunctionMetrics[S.F].NumInsts);
if ((FunctionGrowth[S.F] + S.CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
continue;
Copy link
Collaborator

@labrinea labrinea Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was wrong before suggesting you to move this here. At this point we have committed on creating NSpecs specializations, but now we are skipping some. The filtering needs to happen earlier, inside findSpecializations but after the lambda (actually before returning true). At that point we know it is profitable.

The thing is we know we are creating NSpecs but we don't know which ones. They may correspond to different functions. However the FunctionGrowth is kept per function, not per module. Inside findSpecialziations we are only examining specializations for the same function.

It would make sense to apply the filtering here if we kept codesize growth across functions (per module), which isn't the case. Correct me if I am wrong, but that's my understanding. Apologies for the confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is definitely the trade-off with moving the logic to here - we may skip some specializations, and because we do so after we have built the priority queue of specializations to perform, we may fail to specialize a "next-best" specialization candidate for a function that would have fit into the FunctionGrowth for a its function because it's place has been taken in the NSpecs best candidates.

However, if we move it to where you suggest we are making a slightly different tradeoff:
#113448 (comment)

Here we successfully got past the FuncGrowth check. We know for sure the specialization will be created. We can safely do FunctionGrowth[F] += SpecSize; at this point, before returning true.

If I understand correctly, we do not know for sure the specialization will be created, because we do not know that it will have a high enough score to end up in the NSpecs we decide to specialize for the module. If for a given function F we analyze N specialization candidates that are all deemed profitable and these N candidates use up the maximum FunctionGrowth for F, then we would deem the N+1-th candidate unprofitable based on FunctionGrowth even if it had a higher score (Inlining/CodeSize/Latency) than the candidates analyzed beforehand.

Maybe you still think this is the better tradeoff?

I suppose to some extent it might make more sense to instead have a codesize growth across functions (per module) budget, and to build the priority queue of BestSpecs based on that rather than a budget based on number of specializations.

Copy link
Collaborator

@labrinea labrinea Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth mentioning the background story. We initially came up with the FuncGrowth threshold to limit the overall codesize growth from being linear to the number of times the specializer runs. At first funcspec-max-iters was set to 1, and we increased it to 10 for enabling specialization of recursive functions. Imagine runnning 10 times yielded 10 times the original function size.

We have MaxClones to limit the amount of specializations per iteration. Within a single iteration of run, the growth has no effect basically unless you have many candidates (not the case for functions relying on promoteConstantStackValues). It is used to prevent the next iteration candidates. How can there be a N+1-th candidate if at the N-th step we decided to stop? (I am thinking of recursive functions).

Maybe we can keep FunctionGrowth[S.F] += S.CodeSizeCost; here but remove the early exit from the line above this comment. It should only remain in the isProfitable lambda in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok that makes a lot of sense, thank you for providing the additional context.

I have updated the code so that we only check the codesize increase inside the isProfitable lambda, but do the accumulation into FunctionGrowth[S.F] at the point we are actually creating the specialization. This has the downside of us potentially violating MaxCodeSizeGrowth in a single iteration, in the case where we have specialization candidates within a single iteration that individually do not cause us to exceed MaxCodeSizeGrowth (and so pass the check in the isProfitable lambda), but in combination do exceed when they are all performed. This is why I have removed the second test case from the regression test, as it no longer makes sense.

However I think that this is an acceptable trade-off, as it still preserves the intention of the MaxCodeSizeGrowth threshold by preventing linear codesize increase across multiple iterations, as even if the threshold is exceeded in an iteration we would not consider further specializations in successive iterations once the growth has been accounted for, and we have MaxClones to limit the amount of specializations per iteration as you say.

We could fix the potential violation of MaxCodeSizeGrowth within a single iteration by moving the accumulation into FunctionGrowth[S.F] to the end of the isProfitable lambda as you suggested. However I don't really like this, because we may end up discounting the most profitable specialization candidate for a function. To clarify my previous comment - by Nth and N+1-th candidates I meant candidates within a single iteration. If we accumulate the codesize growth whilst we are analyzing the candidates, we may 'use-up' MaxCodeSizeGrowth before encountering the best specialization and therefore fail to consider it:

// (Within a single iteration, MaxCodeSizeGrowth=100%)
spec1 [Score 500, Growth 50%]
spec2 [Score 500, Growth 50%]
// After examining the above, FunctionGrowth[S.F] = 100% so no more candidates will be considered profitable!
// Candidate with highest Score:
spec3 [Score 1000, Growth 10%]

FunctionGrowth[S.F] += S.CodeSizeCost;

S.Clone = createSpecialization(S.F, S.Sig);

// Update the known call sites to call the clone.
Expand Down Expand Up @@ -837,18 +857,6 @@ static Function *cloneCandidateFunction(Function *F, unsigned NSpecs) {
return Clone;
}

/// Get the unsigned Value of given Cost object. Assumes the Cost is always
/// non-negative, which is true for both TCK_CodeSize and TCK_Latency, and
/// always Valid.
static unsigned getCostValue(const Cost &C) {
int64_t Value = *C.getValue();

assert(Value >= 0 && "CodeSize and Latency cannot be negative");
// It is safe to down cast since we know the arguments cannot be negative and
// Cost is of type int64_t.
return static_cast<unsigned>(Value);
}

bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
SmallVectorImpl<Spec> &AllSpecs,
SpecMap &SM) {
Expand Down Expand Up @@ -924,16 +932,14 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
}
CodeSize += Visitor.getCodeSizeSavingsFromPendingPHIs();

unsigned CodeSizeSavings = getCostValue(CodeSize);
unsigned CodeSizeCost = FuncSize - CodeSizeSavings;

auto IsProfitable = [&]() -> bool {
// No check required.
if (ForceSpecialization)
return true;

unsigned CodeSizeSavings = getCostValue(CodeSize);
// TODO: We should only accumulate codesize increase of specializations
// that are actually created.
FunctionGrowth[F] += FuncSize - CodeSizeSavings;

LLVM_DEBUG(
dbgs() << "FnSpecialization: Specialization bonus {Inlining = "
<< Score << " (" << (Score * 100 / FuncSize) << "%)}\n");
Expand Down Expand Up @@ -964,7 +970,7 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
if (LatencySavings < MinLatencySavings * FuncSize / 100)
return false;
// Maximum codesize growth.
if (FunctionGrowth[F] / FuncSize > MaxCodeSizeGrowth)
if ((FunctionGrowth[F] + CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
return false;

Score += std::max(CodeSizeSavings, LatencySavings);
Expand All @@ -976,7 +982,7 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
continue;

// Create a new specialisation entry.
auto &Spec = AllSpecs.emplace_back(F, S, Score);
auto &Spec = AllSpecs.emplace_back(F, S, Score, CodeSizeCost);
if (CS.getFunction() != F)
Spec.CallSites.push_back(&CS);
const unsigned Index = AllSpecs.size() - 1;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --include-generated-funcs --version 5
; RUN: opt -passes="ipsccp<func-spec>" -funcspec-min-function-size=1 \
; RUN: -funcspec-for-literal-constant=true \
; RUN: -funcspec-min-codesize-savings=50 \
; RUN: -funcspec-min-latency-savings=50 \
; RUN: -funcspec-max-codesize-growth=1 \
; RUN: -S < %s | FileCheck %s

; Verify that we are able to specialize a function successfully after analysis
; of other specializations that are found to not be profitable.
define void @test_specialize_after_failed_analysis(i32 %n) {
entry:
%notspec0 = call i32 @add0(i32 0, i32 %n)
%notspec1 = call i32 @add0(i32 1, i32 %n)
%spec = call i32 @add0(i32 1, i32 1)
ret void
}

define i32 @add0(i32 %x, i32 %y) {
entry:
%res = add i32 %x, %y
ret i32 %res
}

; Verify that we do not specialize once maximum codesize growth has been
; exceeded.
define void @test_max_codesize_growth_exceeded(i32 %n) {
entry:
%spec0 = call i32 @add1(i32 0, i32 0)
%spec1 = call i32 @add1(i32 1, i32 1)
%spec2 = call i32 @add1(i32 2, i32 2)
%notspec = call i32 @add1(i32 3, i32 3)
ret void
}

define i32 @add1(i32 %x, i32 %y) {
entry:
%res = add i32 %x, %y
ret i32 %res
}

; CHECK-LABEL: define void @test_specialize_after_failed_analysis(
; CHECK-SAME: i32 [[N:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[NOTSPEC0:%.*]] = call i32 @add0(i32 0, i32 [[N]])
; CHECK-NEXT: [[NOTSPEC1:%.*]] = call i32 @add0(i32 1, i32 [[N]])
; CHECK-NEXT: [[SPEC:%.*]] = call i32 @add0.specialized.1(i32 1, i32 1)
; CHECK-NEXT: ret void
;
;
; CHECK-LABEL: define i32 @add0(
; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[RES:%.*]] = add i32 [[X]], [[Y]]
; CHECK-NEXT: ret i32 [[RES]]
;
;
; CHECK-LABEL: define void @test_max_codesize_growth_exceeded(
; CHECK-SAME: i32 [[N:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[SPEC0:%.*]] = call i32 @add1.specialized.2(i32 0, i32 0)
; CHECK-NEXT: [[SPEC1:%.*]] = call i32 @add1.specialized.3(i32 1, i32 1)
; CHECK-NEXT: [[SPEC2:%.*]] = call i32 @add1.specialized.4(i32 2, i32 2)
; CHECK-NEXT: [[NOTSPEC:%.*]] = call i32 @add1(i32 3, i32 3)
; CHECK-NEXT: ret void
;
;
; CHECK-LABEL: define i32 @add1(
; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[RES:%.*]] = add i32 [[X]], [[Y]]
; CHECK-NEXT: ret i32 [[RES]]
;
;
; CHECK-LABEL: define internal i32 @add0.specialized.1(
; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: ret i32 poison
;
;
; CHECK-LABEL: define internal i32 @add1.specialized.2(
; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: ret i32 poison
;
;
; CHECK-LABEL: define internal i32 @add1.specialized.3(
; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: ret i32 poison
;
;
; CHECK-LABEL: define internal i32 @add1.specialized.4(
; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: ret i32 poison
;
Loading