Introduce size-optimized IListSelect iterator #118156

agocke · 2025-07-29T08:19:41Z

This lets us keep some of the constant-time indexing advantages of the IList iterator, without the GVM overhead of Select. There is a small size increase here, but nowhere near the cost of the GVM.

In a pathological generated example for GVMs the cost was:

.NET 9: 12 MB
.NET 10 w/out this change: 2.2 MB
.NET 10 w/ this change: 2.3 MB

In a real-world example (AzureMCP), the size attributed to System.Linq was:

.NET 9: 1.2 MB
.NET 10 w/out this change: 340 KB
.NET 10 w/ this change: 430 KB

This seems like a good tradeoff. We mostly keep the algorithmic complexity the same across the size/speed-opt versions, and just tradeoff on the margins. We could probably continue to improve this in the future.

Fixes #115033

This lets us keep some of the constant-time indexing advantages of the IList iterator, without the GVM overhead of Select. There is a small size increase here, but nowhere near the cost of the GVM. In a pathological generated example for GVMs the cost was: 1. .NET 9: 12 MB 2. .NET 10 w/out this change: 2.2 MB 3. .NET 10 w/ this change: 2.3 MB In a real-world example (AzureMCP), the size attributed to System.Linq was: 1. .NET 9: 1.2 MB 2. .NET 10 w/out this change: 340 KB 3. .NET 10 w/ this change: 430 KB This seems like a good tradeoff. We mostly keep the algorithmic complexity the same across the size/speed-opt versions, and just tradeoff on the margins. We could probably continue to improve this in the future.

dotnet-policy-service · 2025-07-29T08:20:34Z

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Copilot

Pull Request Overview

This PR introduces a size-optimized IList-based Select iterator to reduce binary size while maintaining constant-time indexing advantages. The change removes the size-optimized variants of Take and Skip operations in favor of a more targeted optimization for Select operations on IList collections.

Key changes:

Introduces SizeOptIListSelectIterator for size-optimized Select operations on IList collections
Removes size-optimized Take and Skip iterator implementations
Updates Take and Skip to always use speed-optimized variants

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/libraries/System.Linq/src/System/Linq/Take.cs	Removes conditional size optimization, always uses speed-optimized iterator
src/libraries/System.Linq/src/System/Linq/Take.SizeOpt.cs	Removes size-optimized Take iterator implementation
src/libraries/System.Linq/src/System/Linq/Skip.cs	Removes conditional size optimization, always uses speed-optimized iterator
src/libraries/System.Linq/src/System/Linq/Skip.SizeOpt.cs	Completely removes file containing size-optimized Skip iterator
src/libraries/System.Linq/src/System/Linq/Select.cs	Adds IList detection and uses new size-optimized IList iterator
src/libraries/System.Linq/src/System/Linq/Iterator.SizeOpt.cs	New file implementing size-optimized IList Select iterator
src/libraries/System.Linq/src/System.Linq.csproj	Updates project to include new Iterator.SizeOpt.cs and remove Skip.SizeOpt.cs

src/libraries/System.Linq/src/System/Linq/Select.cs

src/libraries/System.Linq/src/System/Linq/Iterator.SizeOpt.cs

Co-authored-by: Copilot <[email protected]>

src/libraries/System.Linq/src/System/Linq/Iterator.SizeOpt.cs

src/libraries/System.Linq/src/System/Linq/Select.SizeOpt.cs

src/libraries/System.Linq/src/System/Linq/Skip.cs

Co-authored-by: Stephen Toub <[email protected]>

agocke · 2025-08-01T08:31:06Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2025-08-01T08:31:15Z

Azure Pipelines successfully started running 1 pipeline(s).

Co-authored-by: Stephen Toub <[email protected]>

This takes a more aggressive direction and removes the size optimized versions of iterators for Skip and Take. As far as I can tell these are relatively small size increases, but using them preserves the O(1) optimizations in the ' speed' version.

agocke · 2025-08-02T02:46:39Z

Fixed to be slightly more aggressive -- while doing more thorough testing I found more code paths that went from O(1) to O(n) with sizeopt enabled. I've fixed them by just removing sizeopt for Take and Skip. This doesn't seem to seriously hurt the binary size. I bet there some small improvements we can make (trying to fold array and IList into the same codepath), but this seems like a pretty good middle ground.

agocke · 2025-08-02T02:47:00Z

Also, I added a unit test for the Select().Count() case you mentioned. Couldn't find an existing one.

reflectronic · 2025-08-02T03:35:45Z

There are similar complexity issues with Count, ElementAt, Contains, and Last. They still persist with this PR, right?

agocke · 2025-08-02T04:45:23Z

There are similar complexity issues with Count, ElementAt, Contains, and Last. They still persist with this PR, right?

Yes. I haven't tried to walk through every scenario in this PR, just one Select/Take/Skip scenario.

agocke · 2025-08-02T04:48:14Z

Also, surprised Contains could have this problem, since it's inherently O(n). But, I haven't looked at that code.

reflectronic · 2025-08-02T05:04:05Z

HashSet<int> x = [...];
HashSet<int> y = [...];

bool contains = x.Union(y).Contains(100);

UnionIterator2<int>.Contains has complexity x.Contains(100) || y.Contains(100), which is O(1)

reflectronic · 2025-08-02T05:36:39Z

If the complexity has to be fixed for all of those methods, there isn't much left in Iterator.SpeedOpt.cs: just ToList, ToArray, and TryGetFirst. Pessimizing just those operators seems a little unbalanced--is that even worth it?

I think the biggest issue here is the quadratic growth with Select. That has led to serious customer failures (#102131). How bad are the regressions if you remove all the size optimizations except non-GVM Select? That seems like it reaps most of the size benefits without "playing performance games."

agocke · 2025-08-02T15:01:51Z

How bad are the regressions if you remove all the size optimizations except non-GVM Select? That seems like it reaps most of the size benefits without "playing performance games."

I agree, but I’d still rather do the other ones in a different PR.

UnionIterator2.Contains

Ah I didn’t realize there was special support for sets.

MichalStrehovsky · 2025-08-05T04:20:06Z

I agree, but I’d still rather do the other ones in a different PR.

I think we'd want to evaluate the overall tradeoff here instead of doing several individual papercuts. Take all the changes we'd want to make here, evaluate if it still makes the difference between sizeopt/speedopt meaningful, and whether we reached the stated goal (no differences in algorithmic complexity between sizeopt and speedopt). The changes can still go in through individual PRs for reviewability, but the full scope of the regression and achievability of the stated goal is important. If we cannot achieve 100% same algorithmic complexity or if the sizeof/speedopt size difference becomes insignificant, it is a very different conversation.

For example the first iteration of this PR was a 0.3% size regression for WebApiAot. The second iteration is already at 1.3%. How much more can we expect? We know the impact to throughput is going to be zero, because I measured it in the past.

We'd want similar numbers for Mono iDevices/Android/WASM, since this is regressing those too and that regression would be a version-to-version regression. It might also be a tougher sell because we received zero TP complaints over the years this was the default and in .NET 10 we make it possible for the user to just disable this (it was not even possible to disable in the past so if someone raised this, we'd not be able to offer a solution).

MichalStrehovsky · 2025-08-05T06:08:54Z

How bad are the regressions if you remove all the size optimizations except non-GVM Select? That seems like it reaps most of the size benefits without "playing performance games."

We have those numbers for native AOT in various places. If you're in the pathological case, the GVM impact is the biggest contributor: #109978 (comment). If you're in the non-pathological case the non-GVM expansion dominates (e.g. #109978 (comment) is Avalonia without the GVM fix - about 1 MB; the GVM bypass saves another 300 kB).

For non-Native AOT scenarios, I don't think GVM bypass does anything because Mono doesn't try to analyze GVMs and falls back to universal slow generic code.

We could reconsider placing the GVM bypass behind a separate feature switch but that runs into problems discussed in #109978 (comment). The GVM bypass has interactions with SizeOpt, so an app will behave differently if you enable one, the other, or both. And I don't know how we'd doc two feature switches that do various tradeoffs to LINQ (and set different defaults based on platforms).

Sometimes simpler is just better. I'd not lose sleep over won't fixing #115033 despite the heated conversation.

agocke · 2025-08-05T21:50:17Z

I tested out my suggestion above: folding List, Array, and IList for Select all into IList seems to have significant benefits. It seems to win back ~30% of the difference. I'd still rather do those changes in a separate PR: if we have to revert any of these PRs, I'd rather revert only one.

I'm still looking at the other places where we separate List/Array/IList, like Where.

These are more (relatively common) cases where you could end up with an O(n) implementation instead of O(1) even when the backing enumerable is capable of doing O(1) index access.

agocke · 2025-08-09T23:17:12Z

Added a number of commits here:

A huge amount of the extra size seemed to be not in LINQ, but equality comparers. That seemed to come through via a new path to OfType. But the IList use in OfType was easy to skip via the feature switch.
I addressed what I saw as the high-priority O(1) -> O(n) cases. I think everything that remains we can probably leave alone
To gain back some of the size I trimmed away specializations for Array and List, vs. just IList.

My local testing is that the webaot project goes from ~100 KB extra to only ~40 KB extra. I haven't yet re-run the test for azure MCP

github-actions bot added the area-System.Linq label Jul 29, 2025

dotnet-policy-service bot assigned agocke Jul 29, 2025

agocke added 2 commits July 29, 2025 08:49

Swap order

32ef019

Fix condition

5f160df

agocke marked this pull request as ready for review July 29, 2025 20:32

Copilot AI review requested due to automatic review settings July 29, 2025 20:32

Copilot AI reviewed Jul 29, 2025

View reviewed changes

src/libraries/System.Linq/src/System/Linq/Select.cs Outdated Show resolved Hide resolved

src/libraries/System.Linq/src/System/Linq/Iterator.SizeOpt.cs Outdated Show resolved Hide resolved

Update src/libraries/System.Linq/src/System/Linq/Select.cs

ba6e844

Co-authored-by: Copilot <[email protected]>

This was referenced Jul 29, 2025

TimeZoneInfoTests.NoBackwardTimeZones tests are failing on Android #117731

Open

TimeZoneInfoTests.Platform_TimeZoneNames failing on Android #117903

Closed

stephentoub reviewed Jul 30, 2025

View reviewed changes

src/libraries/System.Linq/src/System/Linq/Iterator.SizeOpt.cs Outdated Show resolved Hide resolved

Rework to avoid implementing IList and just override Skip/Take

85cc7f6

github-actions bot mentioned this pull request Jul 30, 2025

118156 MichalStrehovsky/rt-sz#154

Closed

This was referenced Jul 30, 2025

System.Data.OleDb.Tests crash with exit code -1073740771 #112360

Open

Unable to pull image from mcr.microsoft.com #117164

Open

Remove dead code

de8bc77