Skip to content

Conversation

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Oct 26, 2025

Because #148054 was a slight perf regression.

The problem was seemingly because this iterator structure:

slice_iter.chain(Option_iter.chain(Option_iter))

changed to this:

slice_iter.chain(Option_iter).chain(Option_iter)

The commit also tweaks the slice_iter part, changing into_iter to iter and using [] instead of (&[]), for conciseness and consistency.

r? @saethlin

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 26, 2025
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Oct 26, 2025
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 26, 2025
@rust-bors
Copy link

rust-bors bot commented Oct 27, 2025

☀️ Try build successful (CI)
Build commit: 07d7953 (07d7953ec5ca596313c79b067e4a60e37ef77a99, parent: f977dfc388ea39c9886b7f8c49abce26e6918df6)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (07d7953): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 1
Improvements ✅
(primary)
-0.2% [-0.4%, -0.1%] 26
Improvements ✅
(secondary)
-0.2% [-0.4%, -0.0%] 19
All ❌✅ (primary) -0.2% [-0.4%, -0.1%] 26

Max RSS (memory usage)

Results (primary -0.5%, secondary 0.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.3% [1.3%, 1.3%] 1
Regressions ❌
(secondary)
3.2% [3.2%, 3.2%] 1
Improvements ✅
(primary)
-1.4% [-2.0%, -0.8%] 2
Improvements ✅
(secondary)
-1.7% [-1.7%, -1.7%] 1
All ❌✅ (primary) -0.5% [-2.0%, 1.3%] 3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 473.751s -> 477.517s (0.79%)
Artifact size: 390.43 MiB -> 390.50 MiB (0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 27, 2025
@nnethercote nnethercote marked this pull request as ready for review October 27, 2025 07:09
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 27, 2025
@rustbot

This comment has been minimized.

@nnethercote
Copy link
Contributor Author

cc @Zalathar, @cuviper

@nnethercote
Copy link
Contributor Author

The perf regression is fixed.

@Kobzol
Copy link
Member

Kobzol commented Oct 27, 2025

The bootstrap regression looks weird, let's retry to see if it was noise or not.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Oct 27, 2025
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 27, 2025
@rust-bors
Copy link

rust-bors bot commented Oct 27, 2025

☀️ Try build successful (CI)
Build commit: fc81109 (fc811093150698364a6f00d2f99302f972a01248, parent: 23fced0fcc5e0ec260d25f04a8b78b269e5e90f0)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (fc81109): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.4%, -0.1%] 26
Improvements ✅
(secondary)
-0.2% [-0.4%, -0.0%] 17
All ❌✅ (primary) -0.2% [-0.4%, -0.1%] 26

Max RSS (memory usage)

Results (primary 0.1%, secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.2% [0.7%, 3.7%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.2% [-4.2%, -4.2%] 1
Improvements ✅
(secondary)
-2.8% [-4.1%, -1.5%] 2
All ❌✅ (primary) 0.1% [-4.2%, 3.7%] 3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 475.144s -> 474.464s (-0.14%)
Artifact size: 390.52 MiB -> 390.50 MiB (-0.00%)

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Oct 27, 2025
@Kobzol
Copy link
Member

Kobzol commented Oct 27, 2025

Ok, looks like it was just noise after all.

Because rust-lang#148054 was a slight perf regression.

The problem was seemingly because this iterator structure:
```
slice_iter.chain(Option_iter.chain(Option_iter))
```
changed to this:
```
slice_iter.chain(Option_iter).chain(Option_iter)
```
The commit also tweaks the `slice_iter` part, changing `into_iter` to
`iter` and using `[]` instead of `(&[])`, for conciseness and
consistency.
@rustbot
Copy link
Collaborator

rustbot commented Oct 28, 2025

⚠️ Warning ⚠️

  • There are issue links (such as #123) in the commit messages of the following commits.
    Please move them to the PR description, to avoid spamming the issues with references to the commit, and so this bot can automatically canonicalize them to avoid issues with subtree.

@saethlin
Copy link
Member

@bors r+

Do we have any idea why this version is better? The code here isn't particularly hot, so I fear that if you extract this situation to a standalone microbenchmark the difference will be massive.

@bors
Copy link
Collaborator

bors commented Oct 31, 2025

📌 Commit d18b0b4 has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2025
@nnethercote
Copy link
Contributor Author

any idea why

Not really. I tried https://godbolt.org/z/YKoYx7sno but it wasn't very enlightening -- the version that's better in the compiler generates slightly longer code.

I do know that chained iterators are slow in general, and the perf book recommends avoiding them if possible.

@cuviper
Copy link
Member

cuviper commented Oct 31, 2025

slice_iter.chain(Option_iter.chain(Option_iter))
//         ^outer_chain      ^inner_chain
// -> Chain<Slice, Chain<Opt, Opt>>

In this version, Iterator::next is something like:

if let Some(slice_iter) = &mut outer_chain.a {
    if let Some(item) = slice_iter.next() {
        return Some(item);
    }
    outer_chain.a = None;
}
if let Some(inner_chain) = &mut outer_chain.b {
    if let Some(opt1) = &mut inner_chain.a {
        if let Some(item) = opt1.next() {
            return Some(item);
        }
        inner_chain.a = None;
    }
    if let Some(opt2) = &mut inner_chain.b {
        if let Some(item) = opt2.next() {
            return Some(item);
        }
    }
}
None

Notice that the slice iterator is only one branch away, which is advantageous if that ends up providing multiple items. Compare that to the other version where it has to check two Chain layers to get to the slice.

slice_iter.chain(Option_iter).chain(Option_iter)
//         ^inner_chain       ^outer_chain
// -> Chain<Chain<Slice, Opt>, Opt>
if let Some(inner_chain) = &mut outer_chain.a {
    if let Some(slice_iter) = &mut inner_chain.a {
        if let Some(item) = slice_iter.next() {
            return Some(item);
        }
        inner_chain.a = None;
    }
    if let Some(opt1) = &mut inner_chain.b {
        if let Some(item) = opt1.next() {
            return Some(item);
        }
    }
    outer_chain.a = None;
}
if let Some(opt2) = &mut outer_chain.b {
    if let Some(item) = opt2.next() {
        return Some(item);
    }
}
None

However, the structure matters less if it's consumed by inner iteration -- fold, for_each, etc. Your godbolt link with -O squashes both versions to a constant if you change to iter.for_each(|i| n += i), or just iter.sum().

@nnethercote
Copy link
Contributor Author

I fiddled around with a custom iterator but I couldn't get an outcome better than the chained one.

@bors
Copy link
Collaborator

bors commented Oct 31, 2025

⌛ Testing commit d18b0b4 with merge 23c7bad...

@bors
Copy link
Collaborator

bors commented Oct 31, 2025

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing 23c7bad to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 31, 2025
@bors bors merged commit 23c7bad into rust-lang:master Oct 31, 2025
12 checks passed
@rustbot rustbot added this to the 1.93.0 milestone Oct 31, 2025
@github-actions
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 647f153 (parent) -> 23c7bad (this PR)

Test differences

Show 4 test diffs

4 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 23c7bad921fb7163de37ea680bed317deaa03fda --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-x86_64-apple: 7882.6s -> 6390.2s (-18.9%)
  2. dist-aarch64-apple: 8666.0s -> 7094.0s (-18.1%)
  3. x86_64-msvc-ext3: 5879.9s -> 6674.7s (+13.5%)
  4. dist-apple-various: 3683.6s -> 3196.8s (-13.2%)
  5. x86_64-rust-for-linux: 2953.2s -> 2599.9s (-12.0%)
  6. armhf-gnu: 5576.3s -> 4968.5s (-10.9%)
  7. tidy: 196.8s -> 178.8s (-9.1%)
  8. dist-s390x-linux: 5125.7s -> 5565.0s (+8.6%)
  9. i686-msvc-2: 7987.9s -> 7304.5s (-8.6%)
  10. aarch64-msvc-2: 5209.0s -> 5638.9s (+8.3%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (23c7bad): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.4%, -0.1%] 19
Improvements ✅
(secondary)
-0.2% [-0.4%, -0.0%] 21
All ❌✅ (primary) -0.2% [-0.4%, -0.1%] 19

Max RSS (memory usage)

Results (primary 1.7%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.7% [1.7%, 1.7%] 1
Regressions ❌
(secondary)
2.1% [2.1%, 2.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.7% [1.7%, 1.7%] 1

Cycles

Results (secondary 0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [2.0%, 3.2%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.9% [-3.9%, -3.9%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 474.981s -> 473.971s (-0.21%)
Artifact size: 390.81 MiB -> 390.89 MiB (0.02%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants