Skip to content

Conversation

@ChayimFriedman2
Copy link
Contributor

Because LLVM is unable to optimize well external iteration with some iterator kinds (e.g. chain()).

To do that I had to hoist the size_hint() call to the beginning of the loop (since I no longer have access to the iterator inside the loop), which might slightly pessimize certain iterators that are able to give more accurate size bounds during iteration (e.g. flatten()). However, the effect should not be big, and also, common iterators like these also suffer from the external iteration optimizibility problem (e.g. flatten()).

@rustbot
Copy link
Collaborator

rustbot commented Mar 20, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 20, 2025
@compiler-errors
Copy link
Member

Is there some sort of codegen test you could use to demonstrate that this has a beneficial effect?

@ChayimFriedman2
Copy link
Contributor Author

I'm pretty sure a benchmark will show a difference, but I will check.

@rust-log-analyzer

This comment has been minimized.

@ChayimFriedman2
Copy link
Contributor Author

So, @compiler-errors, unlike what I thought, this is not a 100% win (never trust your instincts in performance!). I benchmarked three scenarios: a flatten() call with a small array containing small slices, a call to flatten() with a large vector (100 elements) containing large vectors (0-100 elements, ascending), and small (15x10) vectors but with two chains and one flatten.

The results are: in the first two cases, the algorithms are almost equivalent, with a slight preference to the old algorithm for the first case and a slight preference to the new in the second case. In the third case, however, the new algorithm is almost 3x faster.

So my conclusion is: one such operation is not bad, but once you start to add more this version is significantly faster.

@joboet
Copy link
Member

joboet commented Mar 21, 2025

I think you could preserve the more precise length estimation by using try_for_each combined with Vec::push_within_capacity.

@ChayimFriedman2
Copy link
Contributor Author

@joboet I don't think, since the iterator will still be mutably borrowed.

@joboet
Copy link
Member

joboet commented Mar 23, 2025

No, that's not an issue. I've opened ChayimFriedman2#3 with what I had in mind.

@joboet
Copy link
Member

joboet commented Apr 3, 2025

@rustbot author
while CI isn't passing

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 3, 2025
@rustbot
Copy link
Collaborator

rustbot commented Apr 3, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@Dylan-DPC
Copy link
Member

@ChayimFriedman2 any updates on this? thanks

@ChayimFriedman2
Copy link
Contributor Author

@Dylan-DPC Thanks for reminding me, I'll return to this soon.

@ChayimFriedman2
Copy link
Contributor Author

@joboet Your version was the fastest, so I switched to it.

A slight disadvantage of it is that currently user-defined iterators cannot override try_for_each() on stable.

@rustbot
Copy link
Collaborator

rustbot commented Sep 4, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@ChayimFriedman2
Copy link
Contributor Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 4, 2025
@joboet
Copy link
Member

joboet commented Sep 8, 2025

Let's see what impact this has on compiler performance...
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 8, 2025
Use internal iteration in `Vec::extend_desugared()`
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 8, 2025
@rust-bors
Copy link

rust-bors bot commented Sep 8, 2025

☀️ Try build successful (CI)
Build commit: d5302c6 (d5302c6e5566a5c29c551b4c02a4e6077858b5f1, parent: beeb8e3af54295ba494c250e84ecda4c2c5d85ff)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d5302c6): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
7.0% [0.2%, 260.8%] 49
Regressions ❌
(secondary)
4.2% [4.2%, 4.2%] 1
Improvements ✅
(primary)
-0.3% [-0.6%, -0.1%] 33
Improvements ✅
(secondary)
-0.7% [-2.9%, -0.0%] 29
All ❌✅ (primary) 4.1% [-0.6%, 260.8%] 82

Max RSS (memory usage)

Results (primary 3.5%, secondary 3.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.5% [0.8%, 14.1%] 9
Regressions ❌
(secondary)
4.5% [2.4%, 5.1%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.1% [-2.3%, -2.0%] 2
All ❌✅ (primary) 3.5% [0.8%, 14.1%] 9

Cycles

Results (primary 14.3%, secondary 3.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
14.3% [1.5%, 228.0%] 21
Regressions ❌
(secondary)
3.8% [2.2%, 7.8%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 14.3% [1.5%, 228.0%] 21

Binary size

Results (primary 0.9%, secondary 0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.9% [0.0%, 4.5%] 82
Regressions ❌
(secondary)
0.4% [0.0%, 1.3%] 82
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
-0.4% [-0.4%, -0.4%] 3
All ❌✅ (primary) 0.9% [-0.1%, 4.5%] 83

Bootstrap: 466.876s -> 477.217s (2.21%)
Artifact size: 387.41 MiB -> 389.59 MiB (0.56%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 8, 2025
@ChayimFriedman2
Copy link
Contributor Author

Wow, that doesn't look good. Maybe try with the other way?

Because LLVM is unable to optimize well external iteration with some iterator kinds (e.g. `chain()`).

To do that I had to hoist the `size_hint()` call to the beginning of the loop (since I no longer have access to the iterator inside the loop), which might slightly pessimize certain iterators that are able to give more accurate size bounds during iteration (e.g. `flatten()`). However, the effect should not be big, and also, common iterators like these also suffer from the external iteration optimizibility problem (e.g. `flatten()`).
@joboet
Copy link
Member

joboet commented Sep 11, 2025

It's worth a shot...
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 11, 2025
Use internal iteration in `Vec::extend_desugared()`
@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 11, 2025
@rust-bors
Copy link

rust-bors bot commented Sep 11, 2025

☀️ Try build successful (CI)
Build commit: 8c0340d (8c0340dde69e632fafbeea36c3df20c1c0f06904, parent: 76c5ed2847cdb26ef2822a3a165d710f6b772217)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8c0340d): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
8.1% [0.2%, 258.4%] 39
Regressions ❌
(secondary)
3.1% [3.1%, 3.1%] 1
Improvements ✅
(primary)
-0.2% [-0.7%, -0.1%] 46
Improvements ✅
(secondary)
-0.3% [-1.1%, -0.1%] 31
All ❌✅ (primary) 3.6% [-0.7%, 258.4%] 85

Max RSS (memory usage)

Results (primary 3.8%, secondary -1.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.8% [0.7%, 12.0%] 12
Regressions ❌
(secondary)
2.5% [2.5%, 2.5%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.2% [-5.2%, -5.2%] 1
All ❌✅ (primary) 3.8% [0.7%, 12.0%] 12

Cycles

Results (primary 17.5%, secondary 1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
17.5% [2.2%, 225.6%] 16
Regressions ❌
(secondary)
2.8% [1.8%, 3.4%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.0% [-4.9%, -3.2%] 2
All ❌✅ (primary) 17.5% [2.2%, 225.6%] 16

Binary size

Results (primary 0.8%, secondary 0.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.9% [0.0%, 3.7%] 78
Regressions ❌
(secondary)
0.3% [0.0%, 0.9%] 85
Improvements ✅
(primary)
-0.4% [-1.5%, -0.1%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.8% [-1.5%, 3.7%] 84

Bootstrap: 466.951s -> 474.49s (1.61%)
Artifact size: 387.77 MiB -> 387.73 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 11, 2025
@ChayimFriedman2
Copy link
Contributor Author

Okay, not worth it.

@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants