-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[libc++] Added segmented iterator for std::fill #104680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-libcxx Author: NoumanAmir-10xe (NoumanAmir657) ChangesFor the issue #102817. Full diff: https://github.com/llvm/llvm-project/pull/104680.diff 1 Files Affected:
diff --git a/libcxx/include/__algorithm/fill.h b/libcxx/include/__algorithm/fill.h
index 1ce3eadb013d05..3ba37e7260a4a2 100644
--- a/libcxx/include/__algorithm/fill.h
+++ b/libcxx/include/__algorithm/fill.h
@@ -21,25 +21,66 @@ _LIBCPP_BEGIN_NAMESPACE_STD
// fill isn't specialized for std::memset, because the compiler already optimizes the loop to a call to std::memset.
-template <class _ForwardIterator, class _Tp>
+template <
+ class _ForwardIterator,
+ class _Tp,
+ __enable_if_t<
+ is_same<typename iterator_traits<_ForwardIterator>::iterator_category, forward_iterator_tag>::value ||
+ is_same<typename iterator_traits<_ForwardIterator>::iterator_category, bidirectional_iterator_tag>::value,
+ int> = 0>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
-__fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value, forward_iterator_tag) {
+__fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
for (; __first != __last; ++__first)
*__first = __value;
}
-template <class _RandomAccessIterator, class _Tp>
+template <class _RandomAccessIterator,
+ class _Tp,
+ __enable_if_t<(is_same<typename iterator_traits<_RandomAccessIterator>::iterator_category,
+ random_access_iterator_tag>::value ||
+ is_same<typename iterator_traits<_RandomAccessIterator>::iterator_category,
+ contiguous_iterator_tag>::value) &&
+ !__is_segmented_iterator<_RandomAccessIterator>::value,
+ int> = 0>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
-__fill(_RandomAccessIterator __first, _RandomAccessIterator __last, const _Tp& __value, random_access_iterator_tag) {
+__fill(_RandomAccessIterator __first, _RandomAccessIterator __last, const _Tp& __value) {
std::fill_n(__first, __last - __first, __value);
}
+template <class _SegmentedIterator,
+ class _Tp,
+ __enable_if_t<__is_segmented_iterator<_SegmentedIterator>::value, int> = 0>
+inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
+__fill(_SegmentedIterator __first, _SegmentedIterator __last, const _Tp& __value) {
+ using _Traits = __segmented_iterator_traits<_SegmentedIterator>;
+
+ auto __sfirst = _Traits::__segment(__first);
+ auto __slast = _Traits::__segment(__last);
+
+ // We are in a single segment, so we might not be at the beginning or end
+ if (__sfirst == __slast) {
+ __fill(_Traits::__local(__first), _Traits::__local(__last), __value);
+ return;
+ }
+
+ // We have more than one segment. Iterate over the first segment, since we might not start at the beginning
+ __fill(_Traits::__local(__first), _Traits::__end(__sfirst), __value);
+ ++__sfirst;
+ // iterate over the segments which are guaranteed to be completely in the range
+ while (__sfirst != __slast) {
+ __fill(_Traits::__begin(__sfirst), _Traits::__end(__sfirst), __value);
+ ++__sfirst;
+ }
+ // iterate over the last segment
+ __fill(_Traits::__begin(__sfirst), _Traits::__local(__last), __value);
+}
+
template <class _ForwardIterator, class _Tp>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
- std::__fill(__first, __last, __value, typename iterator_traits<_ForwardIterator>::iterator_category());
+ std::__fill(__first, __last, __value);
}
_LIBCPP_END_NAMESPACE_STD
-#endif // _LIBCPP___ALGORITHM_FILL_H
+#endif // _LIBCPP___ALGORITHM_FILL_H
\ No newline at end of file
|
libcxx/include/__algorithm/fill.h
Outdated
| __enable_if_t< | ||
| is_same<typename iterator_traits<_ForwardIterator>::iterator_category, forward_iterator_tag>::value || | ||
| is_same<typename iterator_traits<_ForwardIterator>::iterator_category, bidirectional_iterator_tag>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. Users are allowed to add their own iterator categories that derive from standard ones. You should use the __has_x_iterator_category utilities instead. In this case it could simply be !__has_random_access_iterator_category<_ForwardIterator>::value. Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Now uses __has_x_iterator_category.
| std::fill_n(__first, __last - __first, __value); | ||
| } | ||
|
|
||
| template <class _SegmentedIterator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new optimization should come with some tests to make sure it's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests.
libcxx/include/__algorithm/fill.h
Outdated
| using _Traits = __segmented_iterator_traits<_SegmentedIterator>; | ||
|
|
||
| auto __sfirst = _Traits::__segment(__first); | ||
| auto __slast = _Traits::__segment(__last); | ||
|
|
||
| // We are in a single segment, so we might not be at the beginning or end | ||
| if (__sfirst == __slast) { | ||
| __fill(_Traits::__local(__first), _Traits::__local(__last), __value); | ||
| return; | ||
| } | ||
|
|
||
| // We have more than one segment. Iterate over the first segment, since we might not start at the beginning | ||
| __fill(_Traits::__local(__first), _Traits::__end(__sfirst), __value); | ||
| ++__sfirst; | ||
| // iterate over the segments which are guaranteed to be completely in the range | ||
| while (__sfirst != __slast) { | ||
| __fill(_Traits::__begin(__sfirst), _Traits::__end(__sfirst), __value); | ||
| ++__sfirst; | ||
| } | ||
| // iterate over the last segment | ||
| __fill(_Traits::__begin(__sfirst), _Traits::__local(__last), __value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't you using __for_each_segment here? More generally, could we simply call std::for_each to get the optimization instead? That would avoid a bunch of code here.
libcxx/include/__algorithm/fill.h
Outdated
| // fill isn't specialized for std::memset, because the compiler already optimizes the loop to a call to std::memset. | ||
|
|
||
| template <class _ForwardIterator, class _Tp> | ||
| template < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not attached: Please provide a small benchmark to show the improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Old New
bm_deque_fill/1 2.33 3.75 ns
bm_deque_fill/2 2.82 4.01 ns
bm_deque_fill/3 3.26 4.01 ns
bm_deque_fill/4 3.60 3.34 ns
bm_deque_fill/5 3.92 3.64 ns
bm_deque_fill/6 4.48 3.64 ns
bm_deque_fill/7 4.83 3.79 ns
bm_deque_fill/8 5.46 3.28 ns
bm_deque_fill/16 9.44 3.25 ns
bm_deque_fill/64 33.3 3.14 ns
bm_deque_fill/512 200 7.20 ns
bm_deque_fill/4096 2041 39.7 ns
bm_deque_fill/32768 16345 288 ns
bm_deque_fill/262144 130562 2340 ns
bm_deque_fill/1048576 522694 12346 ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for such wrong formatting ...
libcxx/include/__algorithm/fill.h
Outdated
|
|
||
| template <class _ForwardIterator, class _Tp> | ||
| template < | ||
| class _ForwardIterator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not attached: Please add a release note for this improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added release note.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
This is failing two of the tests while building. @philnik777 can you tell why this must be the case? |
You need to update the transitive includes. |
Can you tell where I have to do the updates? and how is that breaking the test cases? |
Updated. |
fe24a1b to
e4db365
Compare
| // constexpr OutputIterator // constexpr after C++17 | ||
| // fill_n(Iter first, Size n, const T& value); | ||
|
|
||
| #include <algorithm> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you format this in a separate patch? It's really not obvious what you've changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay sure.
libcxx/include/__algorithm/fill.h
Outdated
| __enable_if_t<__is_segmented_iterator<_SegmentedIterator>::value, int> = 0> | ||
| inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void | ||
| __fill(_SegmentedIterator __first, _SegmentedIterator __last, const _Tp& __value) { | ||
| std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to call for_each unconditionally. No need to only do it for segmented iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philnik777 Would not that lead to ambiguous call? I am assuming here you mean that I have to remove the condition __is_segmented_iterator.
Or do you mean that I just forward the call to fill to for_each and for_each can deal with SFINAE?
This would not really work because of __bit_iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the latter case as mentioned above but it leads to these test not compiling count.pass.cpp and ranges.count.pass.cpp.
# | In file included from /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:19:
# | In file included from /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/algorithm:1835:
# | In file included from /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:12:
# | In file included from /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill_n.h:12:
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/for_each.h:34:5: error: no matching function for call to object of type '(lambda at /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:34)'
# | 34 | __f(*__first);
# | | ^~~
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:8: note: in instantiation of function template specialization 'std::for_each<std::__bit_iterator<std::vector<bool>, false>, (lambda at /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:34)>' requested here
# | 58 | std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; });
# | | ^
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:45:14: note: in instantiation of function template specialization 'std::fill<std::__bit_iterator<std::vector<bool>, false>, bool>' requested here
# | 45 | std::fill(vec.begin(), vec.end(), false);
# | | ^
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:34: note: candidate function not viable: no known conversion from 'reference' (aka 'std::__bit_reference<std::vector<bool>>') to 'bool &' for 1st argument
# | 58 | std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; });
# | | ^ ~~~~~~~~~~
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:59:17: error: static assertion expression is not an integral constant expression
# | 59 | static_assert(test());
# | | ^~~~~~
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:3: note: subexpression not valid in a constant expression
# | 58 | std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; });
# | | ^
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:45:9: note: in call to 'fill<std::__bit_iterator<std::vector<bool>, false>, bool>({&{*new unsigned long[5]#0}[0], 0}, {&{*new unsigned long[5]#0}[5], 0}, false)'
# | 45 | std::fill(vec.begin(), vec.end(), false);
# | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:59:17: note: in call to 'test()'
# | 59 | static_assert(test());
# | | ^~~~~~
# | 2 errors generated.
# `-----------------------------
# error: command failed with exit status: 1
It seems like this is occurring because bool vectors are treated differently.
|
Please make sure the CI is green. If you have any problems feel free to ask. |
9e4aa8a to
a178a00
Compare
The CI is only failing for stuff that is not linked to this PR I'm assuming. Can you look at it and tell if that is the case? |
It looks like some problems with the CI are related to this patch but should be working in trunk. Could you rebase (or merge) on top of the current trunk to check? |
I did rebase it before my last commit today. Maybe I made some sort of mistake while rebasing (not sure about that). |
Should I just move this to a new PR and delete this one? |
|
It looks like the problem is that AppleClang doesn't yet support lambdas in C++03. I guess the simplest fix would be to make it a function object instead (i.e. make a class with an overloaded |
Hmm but is not the 105888 using lambdas too and the CI is passing for it? |
378c79b to
65f3b99
Compare
…08631) Swift ClangImporter now supports concurrency annotations on imported declarations and their parameters/results, to make it possible to use imported APIs in Swift safely there has to be a way to annotate individual parameters and result types with relevant attributes that indicate that e.g. a block is called on a particular actor or it accepts a `Sendable` parameter. To faciliate that `SwiftAttr` is switched from `InheritableAttr` which is a declaration attribute to `DeclOrTypeAttr`. To support this attribute in type context we need access to its "Attribute" argument which requires `AttributedType` to be extended to include `Attr *` when available instead of just `attr::Kind` otherwise it won't be possible to determine what attribute should be imported.
…vm#114239) This PR fixes a bug in the `RescaleConverter` that allows non-integer types, which leads to a crash. Fixes llvm#61383.
…14036) This patch unifies the handling of errors passed through the OpenMPIRBuilder and removes some redundant error messages through the introduction of a custom `ErrorInfo` subclass. Additionally, the current list of operations and clauses unsupported by the MLIR to LLVM IR translation pass is added to a new Lit test to check they are being reported to the user.
Identified with misc-include-cleaner.
We have helpers to emit warnings and errors. Do the same thing for notes to they stand out more.
…1717) This PR fixes part of llvm#92433. It specifically adds the 4 cases mentioned in llvm#92433 (comment). I've added 8 positive tests, 4 of which are mentioned in the comment above and 4 which are their commutative equivalents. Alive proof: https://alive2.llvm.org/ce/z/z6eFTb I've also added 8 negative tests, because we want to make sure we do not optimise if the relevant flags are not relevant because the optimisation wouldn't be sound. Alive proof that the optimisation is invalid: https://alive2.llvm.org/ce/z/NvNjTD I did have to make the integer types `i4` to make Alive not timeout and to fit them all on one page.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
…of extern template (llvm#112568) This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and LLVM plugins on window.
Provide computeCost implementations for all remaining sub-classes of VPSingleDefRecipe. This pushes one of the last uses of getLegacyCost directly to its user: VPReplicateRecipe.
Identified with misc-include-cleaner.
C's Annex F specifies that log1p +/-0.0 returns the input value; however, this behavior is optional and host C libraries may behave differently. This change applies the Annex F behavior to constant folding by LLVM.
…lvm#114676) This PR implements destructuring of `StringMapEntry<T>` where `T` is a move-only type. Also adds the non-const version.
…nctions (llvm#113300) I have commented out the test for `neg_zero`(creal) because : 1. real(neg_zero + 0.0i) equals zero. 2. real(neg_zero - 0.0i) equals neg_zero. I am not sure if this is the intended behaviour. [EDIT] I have updated tests for `neg_zero` (creal) to be : ``` EXPECT_FP_EQ(func(CFPT(neg_zero - zero * 1.0i)), neg_zero); EXPECT_FP_EQ(func(CFPT(neg_zero + zero * 1.0i)), zero); ``` because all three [gcc, clang and GNU MPC] also give the same result. https://godbolt.org/z/hxhcn6aof and it seems that it is indeed the correct behaviour since Imaginary types are not supported yet, refer llvm#113671
…4-64 (llvm#114696) Temporarily Fixes buildbot errors due to llvm#113300
Adapted from AArch64/machine-outliner-leaf-descendants.ll Towards: llvm#114437
For the issue #102817.
@philnik777