-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Improve TLS codegen by marking the panic/init path as cold #143511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
rustbot has assigned @workingjubilee. Use |
|
Not sure if this will show up at all on perf but 🤷 @bors2 try @rust-timer queue Do you have any local benchmarks? |
This comment has been minimized.
This comment has been minimized.
Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.
|
@compiler-errors No I don't have any local benchmarks. But I look at assembly output a lot, and trust me when I say these code paths should never get inlined. Could you restart the benchmark with my second commit included? |
|
@bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (8b17150): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 5.4%, secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.6%, secondary -2.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 459.09s -> 461.518s (0.53%) |
|
I removed some Codegen is still nicer just due to the addition of |
|
@bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (9782d0a): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary 0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 461.809s -> 462.209s (0.09%) |
| } | ||
|
|
||
| #[cold] | ||
| unsafe fn initialize(&self) -> *const T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be marked as #[inline(never)]? I've noticed destructors::register calls in the hot-path before, for const thread-locals with destructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that in an earlier PR, see #143511 (comment).
| #[derive(Clone, Copy)] | ||
| enum State { | ||
| Initial, | ||
| Uninitialized, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uninitialized isn't a great name, the value itself is very much initialised. It's just the destructor that's not registered yet.
| /// The resulting pointer may not be used after reentrant inialialization | ||
| /// or thread destruction has occurred. | ||
| #[inline] | ||
| pub fn get(&'static self, i: Option<&mut Option<T>>, f: impl FnOnce() -> T) -> *const T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, I think it might be beneficial to inline the ptr.addr() == 1 case into this function, as that might yield more optimized LocalKey::withs.
| if let State::Alive = self.state.get() { | ||
| self.val.get() | ||
| } else { | ||
| unsafe { self.get_or_init_slow() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is beneficial – the returned pointer is later compared against null in LocalKey::with anyway, so the optimiser should be able to merge the state comparison into that.
|
whoops, didn't mean for this to slip, but I espy a nominee r? @joboet |
This is an extension of the performance improvements seen from #141685. I noticed that the non-
constTLS still didn't have the#[cold]attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever.These paths are taken either only once per thread (
init) or never (panic, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to#[cold]I added the more aggressive#[inline(never)]to both cold paths as well.