-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Optimize std::__tree::__assign_multi to insert the provided range at the end of the tree every time #131030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…the end of the tree every time This improves performance for the copy-assignment operators of associative containers such as std::map
|
@llvm/pr-subscribers-libcxx Author: None (higher-performance) ChangesThis improves performance for the copy-assignment operators of associative containers such as Note that this optimization already exists in other places, e.g.: llvm-project/libcxx/include/map Lines 1217 to 1218 in 15e6bb6
Full diff: https://github.com/llvm/llvm-project/pull/131030.diff 1 Files Affected:
diff --git a/libcxx/include/__tree b/libcxx/include/__tree
index c627641d5d86f..c07de5c95b0dc 100644
--- a/libcxx/include/__tree
+++ b/libcxx/include/__tree
@@ -1432,8 +1432,9 @@ void __tree<_Tp, _Compare, _Allocator>::__assign_multi(_InputIterator __first, _
__cache.__advance();
}
}
+ const_iterator __e = end();
for (; __first != __last; ++__first)
- __insert_multi(_NodeTypes::__get_value(*__first));
+ __insert_multi(__e, _NodeTypes::__get_value(*__first));
}
template <class _Tp, class _Compare, class _Allocator>
|
|
@philnik777 would you mind reviewing this? It's a fairly trivial PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some benchmark numbers for this change?
|
@philnik777 sure, added a benchmark in the PR description. |
|
@higher-performance We have benchmarks for |
|
You can also find documentation on it at https://libcxx.llvm.org/TestingLibcxx.html#benchmarks. |
|
@philnik777 thanks, I wasn't aware. But I'm not using CMake, I have a different build system. Setting this up is quite a headache. The change is incredibly trivial and it's a clear benchmark win -- any chance you could either just accept it or run the benchmarks on your end? |
|
@higher-performance I can't see much of a difference. These are the numbers from our benchmarks: |
|
@philnik777 thank you, that's strange. The assignment operator should be faster. Are you able to reproduce the timing on the code I posted, or do you not see any difference there either? |
|
With your code I see a bigger difference (about 8%). I'm not sure what the difference between the libc++ and your benchmark is. They look very similar. |
|
If the |
|
@philnik777 Just a friendly bump - did you have a chance to try with a larger size? |
|
@philnik777 just checking in. Do you see any reason not to merge this? It's a strict efficiency gain for us -- would love to merge it if that is all right with you. |
|
@higher-performance My main concern is that this adds a hint for ever range we're assigning from, so the hint could be completely bogus when you assign from something other than a container of the same type. Given the relatively small performance improvement and the extremely large numbers required to achieve that I'm not sure this is a net win in the end. |
|
@philnik777 oh. That's because this is an algorithmic improvement, which you can't assess by relying on microbenchmarks. They miss the bigger picture and give you highly misleading results. To elaborate: The number of comparisons is being cut down significantly, and comparisons can be expensive. Not merely because of the container size (side note: 2^24 is really not "extremely large" for a tree of To illustrate, just try running this: On my machine it's 1.65 ms vs. 5.81 ms, which is a rather catastrophic > 3x performance hit. |
|
@philnik777 where do you stand on this? Do you still see a downside to merging it? |
|
@philnik777 just one last bump -- given we haven't had any further concerns the past couple of weeks, and that this optimization already exists everywhere else I've seen in the codebase, I plan to merge this soon (probably in the next couple of days). If you still see any concrete reasons not to, please let me know. So far as I can tell, it seems to be a clear win for some usages, and I believe we haven't seen any cases where it's detrimental. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philnik777 just one last bump -- given we haven't had any further concerns the past couple of weeks, and that this optimization already exists everywhere else I've seen in the codebase, I plan to merge this soon (probably in the next couple of days).
Please do not merge libc++ patches without approval from @llvm/reviewers-libcxx, as that is a general requirement for libc++ patches to be landed. I realize that this patch has been open for a while. Feel free to ping in the libc++ channel on Discord if patches get stale. There it's much more likely to be seen by someone.
If you still see any concrete reasons not to, please let me know. So far as I can tell, it seems to be a clear win for some usages, and I believe we haven't seen any cases where it's detrimental.
Given that you haven't provided any benchmarks for non-optimal cases that's not exactly surprising. How does the performance look if you e.g. assign a range that contains random elements or is sorted inversely?
Edit: I've raised this in my previous comment already, so there has been review feedback that hasn't been addressed.
Huh, I didn't realize. Is this documented anywhere? I was just going by the official LLVM Developer Policy, which appeared to specifically give permission to do this:
That sounds impossible -- am I perhaps not understanding what you mean? This is in The comparators are identical and the target of the assignment is empty at the type of the assignment. How could the inputs have any ordering other than the correctly-sorted one?
I'm not seeing the feedback you're referring to unfortunately. The only previous feedback I see on my side is the objection that the performance improvement was small (not negative), and requires large numbers (both of which I responded to), as well as the one asking me to run your existing benchmarks -- which you yourself did, and said you didn't see much difference. I'm not seeing anything about benchmarking with random elements, inversion, or other feedback that I didn't address. Did you perhaps type more comments and forget to publish them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not merge libc++ patches without approval from @llvm/reviewers-libcxx, as that is a general requirement for libc++ patches to be landed.
Huh, I didn't realize. Is this documented anywhere? I was just going by the official LLVM Developer Policy, which appeared to specifically give permission to do this:
_2. You are allowed to commit patches without approval which you think are obvious.
I'm not sure it's documented anywhere. We should certainly do that if it's not.
Given that you haven't provided any benchmarks for non-optimal cases that's not exactly surprising. How does the performance look if you e.g. assign a range that contains random elements or is sorted inversely?
That sounds impossible -- am I perhaps not understanding what you mean? This is in
__assign_multi. So far as I'm aware, it is only used in__tree::operator=, like this:template <class _Tp, class _Compare, class _Allocator> __tree<_Tp, _Compare, _Allocator>& __tree<_Tp, _Compare, _Allocator>::operator=(const __tree& __t) { if (this != std::addressof(__t)) { value_comp() = __t.value_comp(); __copy_assign_alloc(__t); __assign_multi(__t.begin(), __t.end()); } return *this; }The comparators are identical and the target of the assignment is empty at the type of the assignment. How could the inputs have any ordering other than the correctly-sorted one?
Never mind. I was absolutely sure that it's used in map(InputIterator, InputIterator), but it's not. That does indeed make it impossible.
Edit: I've raised this in my previous comment already, so there has been review feedback that hasn't been addressed.
I'm not seeing the feedback you're referring to unfortunately. The only previous feedback I see on my side is the objection that the performance improvement was small (not negative), and requires large numbers (both of which I responded to), as well as the one asking me to run your existing benchmarks -- which you yourself did, and said you didn't see much difference. I'm not seeing anything about benchmarking with random elements, inversion, or other feedback that I didn't address. Did you perhaps type more comments and forget to publish them?
I'm talking about
so the hint could be completely bogus when you assign from something other than a container of the same type
that might not have been obvious though.
|
So, we do have https://libcxx.llvm.org/Contributing.html#the-review-process, but that's... a bit outdated, let's say. |
|
Ah gotcha! And yeah, it would definitely be great to update any relevant policies. Thanks, I appreciate it! |
…the end of the tree every time (llvm#131030) This improves performance for the copy-assignment operators of associative containers such as `std::map`. This optimization already exists in other places in the codebase, and seems to have been missed here.
This improves performance for the copy-assignment operators of associative containers such as
std::map.Note that this optimization already exists in other places, e.g.:
llvm-project/libcxx/include/map
Lines 1217 to 1218 in 15e6bb6
As an example, this makes the following code 15% faster: