Skip to content

Conversation

@NoumanAmir657
Copy link
Contributor

For the issue #102817.
@philnik777

@NoumanAmir657 NoumanAmir657 requested a review from a team as a code owner August 17, 2024 18:23
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 17, 2024

@llvm/pr-subscribers-libcxx

Author: NoumanAmir-10xe (NoumanAmir657)

Changes

For the issue #102817.
@philnik777


Full diff: https://github.com/llvm/llvm-project/pull/104680.diff

1 Files Affected:

  • (modified) libcxx/include/__algorithm/fill.h (+47-6)
diff --git a/libcxx/include/__algorithm/fill.h b/libcxx/include/__algorithm/fill.h
index 1ce3eadb013d05..3ba37e7260a4a2 100644
--- a/libcxx/include/__algorithm/fill.h
+++ b/libcxx/include/__algorithm/fill.h
@@ -21,25 +21,66 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 
 // fill isn't specialized for std::memset, because the compiler already optimizes the loop to a call to std::memset.
 
-template <class _ForwardIterator, class _Tp>
+template <
+    class _ForwardIterator,
+    class _Tp,
+    __enable_if_t<
+        is_same<typename iterator_traits<_ForwardIterator>::iterator_category, forward_iterator_tag>::value ||
+            is_same<typename iterator_traits<_ForwardIterator>::iterator_category, bidirectional_iterator_tag>::value,
+        int> = 0>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
-__fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value, forward_iterator_tag) {
+__fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
   for (; __first != __last; ++__first)
     *__first = __value;
 }
 
-template <class _RandomAccessIterator, class _Tp>
+template <class _RandomAccessIterator,
+          class _Tp,
+          __enable_if_t<(is_same<typename iterator_traits<_RandomAccessIterator>::iterator_category,
+                                 random_access_iterator_tag>::value ||
+                         is_same<typename iterator_traits<_RandomAccessIterator>::iterator_category,
+                                 contiguous_iterator_tag>::value) &&
+                            !__is_segmented_iterator<_RandomAccessIterator>::value,
+                        int> = 0>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
-__fill(_RandomAccessIterator __first, _RandomAccessIterator __last, const _Tp& __value, random_access_iterator_tag) {
+__fill(_RandomAccessIterator __first, _RandomAccessIterator __last, const _Tp& __value) {
   std::fill_n(__first, __last - __first, __value);
 }
 
+template <class _SegmentedIterator,
+          class _Tp,
+          __enable_if_t<__is_segmented_iterator<_SegmentedIterator>::value, int> = 0>
+inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
+__fill(_SegmentedIterator __first, _SegmentedIterator __last, const _Tp& __value) {
+  using _Traits = __segmented_iterator_traits<_SegmentedIterator>;
+
+  auto __sfirst = _Traits::__segment(__first);
+  auto __slast  = _Traits::__segment(__last);
+
+  // We are in a single segment, so we might not be at the beginning or end
+  if (__sfirst == __slast) {
+    __fill(_Traits::__local(__first), _Traits::__local(__last), __value);
+    return;
+  }
+
+  // We have more than one segment. Iterate over the first segment, since we might not start at the beginning
+  __fill(_Traits::__local(__first), _Traits::__end(__sfirst), __value);
+  ++__sfirst;
+  // iterate over the segments which are guaranteed to be completely in the range
+  while (__sfirst != __slast) {
+    __fill(_Traits::__begin(__sfirst), _Traits::__end(__sfirst), __value);
+    ++__sfirst;
+  }
+  // iterate over the last segment
+  __fill(_Traits::__begin(__sfirst), _Traits::__local(__last), __value);
+}
+
 template <class _ForwardIterator, class _Tp>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
 fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
-  std::__fill(__first, __last, __value, typename iterator_traits<_ForwardIterator>::iterator_category());
+  std::__fill(__first, __last, __value);
 }
 
 _LIBCPP_END_NAMESPACE_STD
 
-#endif // _LIBCPP___ALGORITHM_FILL_H
+#endif // _LIBCPP___ALGORITHM_FILL_H
\ No newline at end of file

Comment on lines 27 to 29
__enable_if_t<
is_same<typename iterator_traits<_ForwardIterator>::iterator_category, forward_iterator_tag>::value ||
is_same<typename iterator_traits<_ForwardIterator>::iterator_category, bidirectional_iterator_tag>::value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. Users are allowed to add their own iterator categories that derive from standard ones. You should use the __has_x_iterator_category utilities instead. In this case it could simply be !__has_random_access_iterator_category<_ForwardIterator>::value. Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.
Now uses __has_x_iterator_category.

std::fill_n(__first, __last - __first, __value);
}

template <class _SegmentedIterator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new optimization should come with some tests to make sure it's correct.

Copy link
Contributor Author

@NoumanAmir657 NoumanAmir657 Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests.

Comment on lines 55 to 75
using _Traits = __segmented_iterator_traits<_SegmentedIterator>;

auto __sfirst = _Traits::__segment(__first);
auto __slast = _Traits::__segment(__last);

// We are in a single segment, so we might not be at the beginning or end
if (__sfirst == __slast) {
__fill(_Traits::__local(__first), _Traits::__local(__last), __value);
return;
}

// We have more than one segment. Iterate over the first segment, since we might not start at the beginning
__fill(_Traits::__local(__first), _Traits::__end(__sfirst), __value);
++__sfirst;
// iterate over the segments which are guaranteed to be completely in the range
while (__sfirst != __slast) {
__fill(_Traits::__begin(__sfirst), _Traits::__end(__sfirst), __value);
++__sfirst;
}
// iterate over the last segment
__fill(_Traits::__begin(__sfirst), _Traits::__local(__last), __value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't you using __for_each_segment here? More generally, could we simply call std::for_each to get the optimization instead? That would avoid a bunch of code here.

// fill isn't specialized for std::memset, because the compiler already optimizes the loop to a call to std::memset.

template <class _ForwardIterator, class _Tp>
template <
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attached: Please provide a small benchmark to show the improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the benchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark               Old                     New

bm_deque_fill/1                   2.33             3.75 ns
bm_deque_fill/2                   2.82             4.01 ns
bm_deque_fill/3                   3.26             4.01 ns
bm_deque_fill/4                   3.60             3.34 ns
bm_deque_fill/5                   3.92             3.64 ns
bm_deque_fill/6                   4.48             3.64 ns
bm_deque_fill/7                   4.83             3.79 ns
bm_deque_fill/8                   5.46             3.28 ns
bm_deque_fill/16                 9.44             3.25 ns
bm_deque_fill/64                 33.3             3.14 ns
bm_deque_fill/512               200              7.20 ns
bm_deque_fill/4096             2041            39.7 ns
bm_deque_fill/32768           16345          288 ns
bm_deque_fill/262144         130562        2340 ns
bm_deque_fill/1048576       522694        12346 ns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for such wrong formatting ...


template <class _ForwardIterator, class _Tp>
template <
class _ForwardIterator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attached: Please add a release note for this improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added release note.

@github-actions
Copy link

github-actions bot commented Aug 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@NoumanAmir657
Copy link
Contributor Author

NoumanAmir657 commented Aug 17, 2024

This is failing two of the tests while building.

FAIL: llvm-libc++-shared.cfg.in :: /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/transitive_includes.gen.py/bitset.sh.cpp (915 of 9739)
******************** TEST 'llvm-libc++-shared.cfg.in :: /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/transitive_includes.gen.py/bitset.sh.cpp' FAILED 

@philnik777 can you tell why this must be the case?

@Zingam
Copy link
Contributor

Zingam commented Aug 18, 2024

This is failing two of the tests while building.

FAIL: llvm-libc++-shared.cfg.in :: /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/transitive_includes.gen.py/bitset.sh.cpp (915 of 9739)
******************** TEST 'llvm-libc++-shared.cfg.in :: /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/transitive_includes.gen.py/bitset.sh.cpp' FAILED 

@philnik777 can you tell why this must be the case?

You need to update the transitive includes.

@NoumanAmir657
Copy link
Contributor Author

NoumanAmir657 commented Aug 18, 2024

You need to update the transitive includes.

Can you tell where I have to do the updates? and how is that breaking the test cases?
Do I make changes in libcxx/test/libcxx/transitive_includes/cxx03.csv?

@NoumanAmir657
Copy link
Contributor Author

You need to update the transitive includes.

Can you tell where I have to do the updates? and how is that breaking the test cases? Do I make changes in libcxx/test/libcxx/transitive_includes/cxx03.csv?

Updated.

@NoumanAmir657 NoumanAmir657 changed the title [libcxx] Added segment iterator for fill [libc++] Added segment iterator for std::fill Aug 28, 2024
@NoumanAmir657 NoumanAmir657 changed the title [libc++] Added segment iterator for std::fill [libc++] Added segmented iterator for std::fill Aug 28, 2024
@NoumanAmir657 NoumanAmir657 force-pushed the fill_segmented branch 2 times, most recently from fe24a1b to e4db365 Compare September 2, 2024 07:28
// constexpr OutputIterator // constexpr after C++17
// fill_n(Iter first, Size n, const T& value);

#include <algorithm>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you format this in a separate patch? It's really not obvious what you've changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sure.

__enable_if_t<__is_segmented_iterator<_SegmentedIterator>::value, int> = 0>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
__fill(_SegmentedIterator __first, _SegmentedIterator __last, const _Tp& __value) {
std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to call for_each unconditionally. No need to only do it for segmented iterators.

Copy link
Contributor Author

@NoumanAmir657 NoumanAmir657 Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philnik777 Would not that lead to ambiguous call? I am assuming here you mean that I have to remove the condition __is_segmented_iterator.

Or do you mean that I just forward the call to fill to for_each and for_each can deal with SFINAE?
This would not really work because of __bit_iterator.

Copy link
Contributor Author

@NoumanAmir657 NoumanAmir657 Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philnik777

I tried the latter case as mentioned above but it leads to these test not compiling count.pass.cpp and ranges.count.pass.cpp.

# | In file included from /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:19:
# | In file included from /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/algorithm:1835:
# | In file included from /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:12:
# | In file included from /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill_n.h:12:
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/for_each.h:34:5: error: no matching function for call to object of type '(lambda at /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:34)'
# |    34 |     __f(*__first);
# |       |     ^~~
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:8: note: in instantiation of function template specialization 'std::for_each<std::__bit_iterator<std::vector<bool>, false>, (lambda at /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:34)>' requested here
# |    58 |   std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; });
# |       |        ^
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:45:14: note: in instantiation of function template specialization 'std::fill<std::__bit_iterator<std::vector<bool>, false>, bool>' requested here
# |    45 |         std::fill(vec.begin(), vec.end(), false);
# |       |              ^
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:34: note: candidate function not viable: no known conversion from 'reference' (aka 'std::__bit_reference<std::vector<bool>>') to 'bool &' for 1st argument
# |    58 |   std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; });
# |       |                                  ^         ~~~~~~~~~~
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:59:17: error: static assertion expression is not an integral constant expression
# |    59 |   static_assert(test());
# |       |                 ^~~~~~
# | /home/nouman-10x/con_llvm_project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/fill.h:58:3: note: subexpression not valid in a constant expression
# |    58 |   std::for_each(__first, __last, [__value](_Tp& __val) { __val = __value; });
# |       |   ^
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:45:9: note: in call to 'fill<std::__bit_iterator<std::vector<bool>, false>, bool>({&{*new unsigned long[5]#0}[0], 0}, {&{*new unsigned long[5]#0}[5], 0}, false)'
# |    45 |         std::fill(vec.begin(), vec.end(), false);
# |       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | /home/nouman-10x/con_llvm_project/libcxx/test/std/algorithms/alg.nonmodifying/alg.count/count.pass.cpp:59:17: note: in call to 'test()'
# |    59 |   static_assert(test());
# |       |                 ^~~~~~
# | 2 errors generated.
# `-----------------------------
# error: command failed with exit status: 1

It seems like this is occurring because bool vectors are treated differently.

@philnik777
Copy link
Contributor

Please make sure the CI is green. If you have any problems feel free to ask.

@NoumanAmir657
Copy link
Contributor Author

NoumanAmir657 commented Oct 6, 2024

Please make sure the CI is green. If you have any problems feel free to ask.

The CI is only failing for stuff that is not linked to this PR I'm assuming. Can you look at it and tell if that is the case?

@philnik777
Copy link
Contributor

Please make sure the CI is green. If you have any problems feel free to ask.

The CI is only failing for stuff that is not linked to this PR I'm assuming. Can you look at it and tell if that is the case?

It looks like some problems with the CI are related to this patch but should be working in trunk. Could you rebase (or merge) on top of the current trunk to check?

@NoumanAmir657
Copy link
Contributor Author

Please make sure the CI is green. If you have any problems feel free to ask.

The CI is only failing for stuff that is not linked to this PR I'm assuming. Can you look at it and tell if that is the case?

It looks like some problems with the CI are related to this patch but should be working in trunk. Could you rebase (or merge) on top of the current trunk to check?

I did rebase it before my last commit today. Maybe I made some sort of mistake while rebasing (not sure about that).

@NoumanAmir657
Copy link
Contributor Author

Please make sure the CI is green. If you have any problems feel free to ask.

The CI is only failing for stuff that is not linked to this PR I'm assuming. Can you look at it and tell if that is the case?

It looks like some problems with the CI are related to this patch but should be working in trunk. Could you rebase (or merge) on top of the current trunk to check?

I did rebase it before my last commit today. Maybe I made some sort of mistake while rebasing (not sure about that).

Should I just move this to a new PR and delete this one?

@philnik777
Copy link
Contributor

It looks like the problem is that AppleClang doesn't yet support lambdas in C++03. I guess the simplest fix would be to make it a function object instead (i.e. make a class with an overloaded operator()).

@philnik777 philnik777 self-assigned this Oct 19, 2024
@NoumanAmir657
Copy link
Contributor Author

It looks like the problem is that AppleClang doesn't yet support lambdas in C++03. I guess the simplest fix would be to make it a function object instead (i.e. make a class with an overloaded operator()).

Hmm but is not the 105888 using lambdas too and the CI is passing for it?

xedin and others added 3 commits November 4, 2024 09:40
…08631)

Swift ClangImporter now supports concurrency annotations on imported
declarations and their parameters/results, to make it possible to use
imported APIs in Swift safely there has to be a way to annotate
individual parameters and result types with relevant attributes that
indicate that e.g. a block is called on a particular actor or it accepts
a `Sendable` parameter.

To faciliate that `SwiftAttr` is switched from `InheritableAttr` which
is a declaration attribute to `DeclOrTypeAttr`. To support this
attribute in type context we need access to its "Attribute" argument
which requires `AttributedType` to be extended to include `Attr *` when
available instead of just `attr::Kind` otherwise it won't be possible to
determine what attribute should be imported.
…vm#114239)

This PR fixes a bug in the `RescaleConverter` that allows non-integer
types, which leads to a crash.
Fixes llvm#61383.
…14036)

This patch unifies the handling of errors passed through the
OpenMPIRBuilder and removes some redundant error messages through the
introduction of a custom `ErrorInfo` subclass.

Additionally, the current list of operations and clauses unsupported by
the MLIR to LLVM IR translation pass is added to a new Lit test to check
they are being reported to the user.
kazutakahirata and others added 17 commits November 4, 2024 09:42
Identified with misc-include-cleaner.
We have helpers to emit warnings and errors. Do the same thing for notes
to they stand out more.
…1717)

This PR fixes part of llvm#92433.

It specifically adds the 4 cases mentioned in
llvm#92433 (comment).

I've added 8 positive tests, 4 of which are mentioned in the comment
above and 4 which are their commutative equivalents. Alive proof:
https://alive2.llvm.org/ce/z/z6eFTb
I've also added 8 negative tests, because we want to make sure we do not
optimise if the relevant flags are not relevant because the optimisation
wouldn't be sound. Alive proof that the optimisation is invalid:
https://alive2.llvm.org/ce/z/NvNjTD
I did have to make the integer types `i4` to make Alive not timeout and
to fit them all on one page.
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
…of extern template (llvm#112568)

This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and LLVM
plugins on window.
Provide computeCost implementations for all remaining sub-classes of
VPSingleDefRecipe. This pushes one of the last uses of getLegacyCost
directly to its user: VPReplicateRecipe.
Identified with misc-include-cleaner.
C's Annex F specifies that log1p +/-0.0 returns the input value;
however, this behavior is optional and host C libraries may behave
differently. This change applies the Annex F behavior to constant
folding by LLVM.
…lvm#114676)

This PR implements destructuring of `StringMapEntry<T>` where `T` is a
move-only type. Also adds the non-const version.
…nctions (llvm#113300)

I have commented out the test for `neg_zero`(creal) because : 

1. real(neg_zero + 0.0i) equals zero. 
2. real(neg_zero - 0.0i) equals neg_zero.

I am not sure if this is the intended behaviour. 

[EDIT]
I have updated tests for `neg_zero` (creal) to be : 

```
    EXPECT_FP_EQ(func(CFPT(neg_zero - zero * 1.0i)), neg_zero);
    EXPECT_FP_EQ(func(CFPT(neg_zero + zero * 1.0i)), zero);
```

because all three [gcc, clang and GNU MPC] also give the same result. 
https://godbolt.org/z/hxhcn6aof
and it seems that it is indeed the correct behaviour since Imaginary
types are not supported yet, refer llvm#113671
Adapted from AArch64/machine-outliner-leaf-descendants.ll 

Towards: llvm#114437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.