Skip to content

Conversation

wlei-llvm
Copy link
Contributor

Add a new flag (--force-profile-preinlined) that marks all function samples with the ContextShouldBeInlined attribute during post-processing. This can be useful for experiments outside of the preinliner, e.g. to fully replay the inlining for a given profile.

@WenleiHe
Copy link
Member

WenleiHe commented Sep 2, 2025

surprised that we don't have flag for this already. how did we do probe only (implies replay inlining) experiments earlier?

@wlei-llvm
Copy link
Contributor Author

surprised that we don't have flag for this already. how did we do probe only (implies replay inlining) experiments earlier?

Good question! Let me first confirm we’re aligned on term: the previous "probe-only" refers to using -ignore-stack-samples on llvm-profgen and no extra flags on compiler side, right? If so:

yes, we have done something to make it behave like "replay inlining", there are some settings on compiler side to remove the limits: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L2053-L2056, set the max/min inlinelimit to int::max, so it then works as it will replay without any limit.

However, there are still some important and subtle differences:

one major is the inlining callsite threshold: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1386-L1406

  • In the previous probe-only, the cost is compared against a SampleThreshold which can be SampleColdCallSiteThreshold(45) and SampleHotCallSiteThreshold(3000) depending on the hotness.

  • In preinlined setting(this PR), it use the "always_inline" cost(getAlways("preinliner")).

Another one is for the external/importing functions: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/SampleProfile.cpp#L1060-L1062
In preinliner setting, the threshold is zero while in previous probe-only, it's the HotCountThreshold.

Therefore, the previous "probe-only" was not a pure replay.

I don't remember whether this is a "bug"(that we intentionally want a pure reply but missed the above setting) or it's just by design as we may just want it to do more like the classic line-number based AutoFDO which has those limits.

Yeah, alternatively we can have a "fix" in compiler to remove the limits for all pseudo-probe based profile if it's a "bug".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants