Skip to content

Conversation

AndyAyersMS
Copy link
Member

Long-running enumerator loops at Tier0+instr will see their executions transition over to a non-instrumented OSR version of the code.

This can cause loss of PGO data in the portions of the method that execute after the leaving the loop that inspires OSR.

For enumerator vars we can safely deduce the likely classes from probes made earlier in the method. So when we see a class profile for an enumerator var, remember it and use it for subsequent calls that lack their own profile data.

Addresses part of #118420.

Long-running enumerator loops at Tier0+instr will see their executions
transition over to a non-instrumented OSR version of the code.

This can cause loss of PGO data in the portions of the method that
execute after the leaving the loop that inspires OSR.

For enumerator vars we can safely deduce the likely classes from
probes made earlier in the method. So when we see a class profile
for an enumerator var, remember it and use it for subsequent calls
that lack their own profile data.

Addresses part of dotnet#118420.
@Copilot Copilot AI review requested due to automatic review settings August 6, 2025 18:45
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements inferred Guarded Devirtualization (GDV) for enumerator variables that lack Profile-Guided Optimization (PGO) data. The change addresses a specific issue where long-running enumerator loops transition to non-instrumented OSR (On-Stack Replacement) versions, causing loss of PGO data for code executing after loop exit.

Key changes:

  • Adds logic to remember likely class types for enumerator variables when PGO data is available
  • Creates a fallback mechanism to use previously observed class profiles when current call sites lack PGO data
  • Introduces data structures to store and retrieve inferred type information for enumerator variables

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/jit/importercalls.cpp Implements the core GDV inference logic, including type lookup for calls without PGO data and storage of dominant class types for enumerator variables
src/coreclr/jit/compiler.h Adds data structures (InferredGdvEntry struct and VarToLikelyClassMap typedef) and accessor methods for managing inferred type information

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 6, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Aug 6, 2025

@EgorBo PTAL
@dotnet/jit-contrib @stephentoub FYI

This should improve enumeration perf for enumeration sites that have long enumerations (long meaning more than 10k) such that OSR kicks in.

This requires #118425 to see benefits for cases where the collection is a List.

We could possibly use a similar mechanism to boost inlining for enumerator methods (see #116266).

A few SPMI diffs, and some missing contexts.

Fix spelling

Co-authored-by: Copilot <[email protected]>
Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can re-use the same map for general case for call/delegate devirtualization when we don't have DynamicPGO at all, so then we can create "fake" candidates for GDV

@AndyAyersMS
Copy link
Member Author

I wonder if we can re-use the same map for general case for call/delegate devirtualization when we don't have DynamicPGO at all, so then we can create "fake" candidates for GDV

Yes, something like this, if we have some other notion of what type is worth guessing for.

@AndyAyersMS
Copy link
Member Author

@EgorBot -arm -amd -intel

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser(false)]
public class Bench
{
    [Benchmark]
    [ArgumentsSource(nameof(GetLists))]
    public int SumList(List<int> list)
    {
        int sum = 0;
        foreach (int item in list)
        {
            sum += item;
        }
        return sum;
    }

    [Benchmark]
    [ArgumentsSource(nameof(GetLists))]
    public int SumEnumerable(IEnumerable<int> list)
    {
        int sum = 0;
        foreach (int item in list)
        {
            sum += item;
        }
        return sum;
    }

    public static IEnumerable<List<int>> GetLists() =>
        from count in new int[] { 1, 10, 1_000, 10_000, 100_000 }
        select Enumerable.Range(0, count).ToList();
}

@AndyAyersMS
Copy link
Member Author

Egor bot results show allocations are gone for longer list lengths (10K, 100K).

Trying even longer lengths will eventually result in allocation, as the benchmark method invocation will now take so long that BDN (with default settings) will only invoke the benchmark method once per iteration, and there won't be enough iterations to make it to Tier1, so BDN will measure the Tier0/OSR codegen, and that allocates (in Tier0). Looks like it requires several hundred million elements on my box.

Method Count Mean Allocated
Sum2 1,000 636.4 ns -
Sum2 10,000 6,311.5 ns -
Sum2 100,000 62,854.5 ns -
Sum2 1,000,000 629,740.5 ns -
Sum2 10,000,000 6,449,448.8 ns -
Sum2 100,000,000 64,557,182.8 ns -
Sum2 1,000,000,000 2,239,850,891.7 ns 40 B

We can't fix the allocation via OSR codegen; it happened already. But we might be able to fix the overall ~3x perf hit by promoting the enumerator fields in OSR. That's likely not easy. It falls under the "generalized promotion" umbrella with some OSR-specific aspects.

@stephentoub
Copy link
Member

I'm comfortable saying that if you've got a List<T> with a billion elements and you're enumerating all of them, you can afford a 40 byte allocation :)

@AndyAyersMS AndyAyersMS merged commit 386cb8d into dotnet:main Aug 7, 2025
103 of 105 checks passed
@EgorBo
Copy link
Member

EgorBo commented Aug 7, 2025

hint: if you use anything other than List/Array in a benchmark as an argument - better add a fake arg to describe its length, otherwise BDN groups them by Type.ToString() which is Syste(...)nt32] [47] 🙂

@github-actions github-actions bot locked and limited conversation to collaborators Sep 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants