-
Notifications
You must be signed in to change notification settings - Fork 5.1k
JIT: create inferred GDVs for enumerator var uses that lack PGO #118461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Long-running enumerator loops at Tier0+instr will see their executions transition over to a non-instrumented OSR version of the code. This can cause loss of PGO data in the portions of the method that execute after the leaving the loop that inspires OSR. For enumerator vars we can safely deduce the likely classes from probes made earlier in the method. So when we see a class profile for an enumerator var, remember it and use it for subsequent calls that lack their own profile data. Addresses part of dotnet#118420.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements inferred Guarded Devirtualization (GDV) for enumerator variables that lack Profile-Guided Optimization (PGO) data. The change addresses a specific issue where long-running enumerator loops transition to non-instrumented OSR (On-Stack Replacement) versions, causing loss of PGO data for code executing after loop exit.
Key changes:
- Adds logic to remember likely class types for enumerator variables when PGO data is available
- Creates a fallback mechanism to use previously observed class profiles when current call sites lack PGO data
- Introduces data structures to store and retrieve inferred type information for enumerator variables
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/coreclr/jit/importercalls.cpp | Implements the core GDV inference logic, including type lookup for calls without PGO data and storage of dominant class types for enumerator variables |
src/coreclr/jit/compiler.h | Adds data structures (InferredGdvEntry struct and VarToLikelyClassMap typedef) and accessor methods for managing inferred type information |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@EgorBo PTAL This should improve enumeration perf for enumeration sites that have long enumerations (long meaning more than 10k) such that OSR kicks in. This requires #118425 to see benefits for cases where the collection is a We could possibly use a similar mechanism to boost inlining for enumerator methods (see #116266). A few SPMI diffs, and some missing contexts. |
Fix spelling Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can re-use the same map for general case for call/delegate devirtualization when we don't have DynamicPGO at all, so then we can create "fake" candidates for GDV
Yes, something like this, if we have some other notion of what type is worth guessing for. |
@EgorBot -arm -amd -intel using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser(false)]
public class Bench
{
[Benchmark]
[ArgumentsSource(nameof(GetLists))]
public int SumList(List<int> list)
{
int sum = 0;
foreach (int item in list)
{
sum += item;
}
return sum;
}
[Benchmark]
[ArgumentsSource(nameof(GetLists))]
public int SumEnumerable(IEnumerable<int> list)
{
int sum = 0;
foreach (int item in list)
{
sum += item;
}
return sum;
}
public static IEnumerable<List<int>> GetLists() =>
from count in new int[] { 1, 10, 1_000, 10_000, 100_000 }
select Enumerable.Range(0, count).ToList();
} |
Egor bot results show allocations are gone for longer list lengths (10K, 100K). Trying even longer lengths will eventually result in allocation, as the benchmark method invocation will now take so long that BDN (with default settings) will only invoke the benchmark method once per iteration, and there won't be enough iterations to make it to Tier1, so BDN will measure the Tier0/OSR codegen, and that allocates (in Tier0). Looks like it requires several hundred million elements on my box.
We can't fix the allocation via OSR codegen; it happened already. But we might be able to fix the overall ~3x perf hit by promoting the enumerator fields in OSR. That's likely not easy. It falls under the "generalized promotion" umbrella with some OSR-specific aspects. |
I'm comfortable saying that if you've got a |
hint: if you use anything other than List/Array in a benchmark as an argument - better add a fake arg to describe its length, otherwise BDN groups them by Type.ToString() which is |
Long-running enumerator loops at Tier0+instr will see their executions transition over to a non-instrumented OSR version of the code.
This can cause loss of PGO data in the portions of the method that execute after the leaving the loop that inspires OSR.
For enumerator vars we can safely deduce the likely classes from probes made earlier in the method. So when we see a class profile for an enumerator var, remember it and use it for subsequent calls that lack their own profile data.
Addresses part of #118420.