-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Optimize std::str::Chars::next
and std::str::Chars::next_back
#142038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ping @scottmcm ? |
So you know, you can make diff views in godbolt: https://godbolt.org/z/Thn1bf9qG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general structure here does make sense to me, but overall I feel like it removed a bunch of helpers and constants unnecessarily. Not having utf8_first_byte
, sure, but this ends up repeating the X << 6 | (Y & 0x3F)
in a bunch of places, so keeping the utf8_acc_cont_byte
to do that would make sense to me. The standard library is always compiled with optimizations, and the MIR inliner will inline it, so there's no reason to avoid the function call. Having the u32::from
in there would also make the two functions more similar, since now the forward one is using as u32
in a different line instead with no obvious reason whey they should differ.
26b614c
to
54a699b
Compare
I could not get LLVM to produce the |
r? @scottmcm |
Requested reviewer is already assigned to this pull request. Please choose another assignee. |
r? libs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an implementation standpoint this looks good, but the unsafe
ty isn't encapsulated correctly. This should be a pretty easy fix.
Could you update the godbolt links in the top post after this change?
library/core/src/str/validations.rs
Outdated
// SAFETY: `bytes` produces a UTF-8-like string | ||
let mut next_byte = || unsafe { | ||
let b = *bytes.next().unwrap_unchecked(); | ||
assume(utf8_is_cont_byte(b)); | ||
b | ||
}; | ||
|
||
// SAFETY: `bytes` produces a UTF-8-like string | ||
let combine = |c: u32, b: u8| unsafe { disjoint_bitor(c << 6, u32::from(b & CONT_MASK)) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These preconditions don't match up; by this API it is "safe" but completely unsound to call next_byte()
5 times on a 4 byte codepoint. And the safety comments don't cover the precondition.
Instead, this could probably be a function:
#[inline]
unsafe fn advance_mask<'a, I: Iterator<Item = &'a u8>>(bytes: &mut I) -> u32 {
// next_byte followed by combine
}
to ensure that unsafety is encapsulated. Applies to both the forward and backward version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closures doesn't escape the body of the function, so we shouldn't need to worry about them being used unsoundly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential use isn't really an issue; the problem is that the calls to next_byte()
look innocently safe, but that isn't accurate.
I'll rerun after the above change but let's get a baseline @bors2 try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Optimize `std::str::Chars::next` and `std::str::Chars::next_back`
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (38eb248): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 4.8%, secondary 2.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 3.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 468.177s -> 466.627s (-0.33%) |
Well, those results are a bit interesting. From the first godbolt link in the top post I'm not really sure why we're showing regressions; the prelude is identical, the fastest path starting at Cc the asm analyzer expert @hanna-kruppe who could probably provide some more insight. |
At least one benchmark that's gotten slower (unicode-normalization) uses |
54a699b
to
3069ce8
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
3069ce8
to
652813e
Compare
Since the perf job doesn't really show anything useful here, do you have local benchmarks indicating an improvement? |
There are only 0x10FFFF possible codepoints, so we can exhaustively test all of them.
By reordering some operations, we can expose some opportunites for CSE. Also convert the series of nested `if` branches to early return, which IMO makes the code clearer. Comparison of assembly before and after for `next_code_point`: https://godbolt.org/z/9Te84YzhK Comparison of assembly before and after for `next_code_point_reverse`: https://godbolt.org/z/fTx1a7oz1
652813e
to
43c4909
Compare
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Benchmark resultsVery strange, I need to investigate why the regression is so large on x86 AArch64 (Apple M3)
x86_64 (AMD Ryzen 9 9950X)
|
Before/after for
next
: https://godbolt.org/z/Yb9TGc4vaBefore/after for
next_back
: https://godbolt.org/z/v6x7GWsj1std::sys_common::wtf8::Wtf8CodePoints
will also benefit from this, since it uses the samenext_code_point
andnext_code_point_reverse
functions internally.I also added tests for all codepoints in the range
0..=char::MAX
(including surrogats that can only appear in WTF-8), so the new implementations have been exhaustively tested