Skip to content

Commit d0cee69

Browse files
winner245ldionnefrederick-vs-ja
authored
[libc++] Optimize std::{,ranges}::{fill,fill_n} for segmented iterators (#132665)
This patch optimizes `std::fill`, `std::fill_n`, `std::ranges::fill`, and `std::ranges::fill_n` for segmented iterators, achieving substantial performance improvements. Specifically, for `deque<int>` iterators, the performance improvements are above 10x for all these algorithms. The optimization also enables filling segmented memory of `deque<int>` to approach the performance of filling contiguous memory of `vector<int>`. Benchmark results comparing the before and after implementations are provided below. For additional context, we’ve included `vector<int>` results, which remain unchanged, as this patch specifically targets segmented iterators and leaves non-segmented iterator behavior untouched. Fixes two subtasks outlined in #102817. #### `fill_n` ``` ----------------------------------------------------------------------------- Benchmark Before After Speedup ----------------------------------------------------------------------------- std::fill_n(deque<int>)/32 11.4 ns 2.28 ns 5.0x std::fill_n(deque<int>)/50 19.7 ns 3.40 ns 5.8x std::fill_n(deque<int>)/1024 391 ns 37.3 ns 10.5x std::fill_n(deque<int>)/8192 3174 ns 301 ns 10.5x std::fill_n(deque<int>)/65536 26504 ns 2951 ns 9.0x std::fill_n(deque<int>)/1048576 407960 ns 80658 ns 5.1x rng::fill_n(deque<int>)/32 14.3 ns 2.15 ns 6.6x rng::fill_n(deque<int>)/50 20.2 ns 3.22 ns 6.3x rng::fill_n(deque<int>)/1024 381 ns 37.8 ns 10.1x rng::fill_n(deque<int>)/8192 3101 ns 294 ns 10.5x rng::fill_n(deque<int>)/65536 25098 ns 2926 ns 8.6x rng::fill_n(deque<int>)/1048576 394342 ns 78874 ns 5.0x std::fill_n(vector<int>)/32 1.76 ns 1.72 ns 1.0x std::fill_n(vector<int>)/50 3.00 ns 2.73 ns 1.1x std::fill_n(vector<int>)/1024 38.4 ns 37.9 ns 1.0x std::fill_n(vector<int>)/8192 258 ns 252 ns 1.0x std::fill_n(vector<int>)/65536 2993 ns 2889 ns 1.0x std::fill_n(vector<int>)/1048576 80328 ns 80468 ns 1.0x rng::fill_n(vector<int>)/32 1.99 ns 1.35 ns 1.5x rng::fill_n(vector<int>)/50 2.66 ns 2.12 ns 1.3x rng::fill_n(vector<int>)/1024 37.7 ns 35.8 ns 1.1x rng::fill_n(vector<int>)/8192 253 ns 250 ns 1.0x rng::fill_n(vector<int>)/65536 2922 ns 2930 ns 1.0x rng::fill_n(vector<int>)/1048576 79739 ns 79742 ns 1.0x ``` #### `fill` ``` -------------------------------------------------------------------------- Benchmark Before After Speedup -------------------------------------------------------------------------- std::fill(deque<int>)/32 13.7 ns 2.45 ns 5.6x std::fill(deque<int>)/50 21.7 ns 4.57 ns 4.7x std::fill(deque<int>)/1024 367 ns 38.5 ns 9.5x std::fill(deque<int>)/8192 2896 ns 247 ns 11.7x std::fill(deque<int>)/65536 23723 ns 2907 ns 8.2x std::fill(deque<int>)/1048576 379043 ns 79885 ns 4.7x rng::fill(deque<int>)/32 13.6 ns 2.70 ns 5.0x rng::fill(deque<int>)/50 23.4 ns 3.94 ns 5.9x rng::fill(deque<int>)/1024 377 ns 37.9 ns 9.9x rng::fill(deque<int>)/8192 2914 ns 286 ns 10.2x rng::fill(deque<int>)/65536 23612 ns 2939 ns 8.0x rng::fill(deque<int>)/1048576 379841 ns 80079 ns 4.7x std::fill(vector<int>)/32 1.99 ns 1.79 ns 1.1x std::fill(vector<int>)/50 3.05 ns 3.06 ns 1.0x std::fill(vector<int>)/1024 37.6 ns 38.0 ns 1.0x std::fill(vector<int>)/8192 255 ns 257 ns 1.0x std::fill(vector<int>)/65536 2966 ns 2981 ns 1.0x std::fill(vector<int>)/1048576 78300 ns 80348 ns 1.0x rng::fill(vector<int>)/32 1.77 ns 1.75 ns 1.0x rng::fill(vector<int>)/50 4.85 ns 2.31 ns 2.1x rng::fill(vector<int>)/1024 39.6 ns 36.1 ns 1.1x rng::fill(vector<int>)/8192 238 ns 251 ns 0.9x rng::fill(vector<int>)/65536 2941 ns 2918 ns 1.0x rng::fill(vector<int>)/1048576 80497 ns 80442 ns 1.0x ``` --------- Co-authored-by: Louis Dionne <[email protected]> Co-authored-by: A. Jiang <[email protected]>
1 parent 65c895d commit d0cee69

File tree

8 files changed

+166
-24
lines changed

8 files changed

+166
-24
lines changed

libcxx/docs/ReleaseNotes/22.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,9 @@ Improvements and New Features
7171
- The ``std::distance`` and ``std::ranges::distance`` algorithms have been optimized for segmented iterators (e.g.,
7272
``std::join_view`` iterators), reducing the complexity from ``O(n)`` to ``O(n / segment_size)``. Benchmarks show
7373
performance improvements of over 1600x in favorable cases with large segment sizes (e.g., 1024).
74+
- The ``std::{fill, fill_n}`` and ``std::ranges::{fill, fill_n}`` algorithms have been optimized for segmented iterators,
75+
resulting in a performance improvement of at least 10x for ``std::deque<int>`` iterators and
76+
``std::join_view<std::vector<std::vector<int>>>`` iterators.
7477

7578
Deprecations and Removals
7679
-------------------------

libcxx/include/__algorithm/fill.h

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,11 @@
1010
#define _LIBCPP___ALGORITHM_FILL_H
1111

1212
#include <__algorithm/fill_n.h>
13+
#include <__algorithm/for_each_segment.h>
1314
#include <__config>
1415
#include <__iterator/iterator_traits.h>
16+
#include <__iterator/segmented_iterator.h>
17+
#include <__type_traits/enable_if.h>
1518

1619
#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
1720
# pragma GCC system_header
@@ -21,23 +24,40 @@ _LIBCPP_BEGIN_NAMESPACE_STD
2124

2225
// fill isn't specialized for std::memset, because the compiler already optimizes the loop to a call to std::memset.
2326

24-
template <class _ForwardIterator, class _Tp>
25-
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
26-
__fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value, forward_iterator_tag) {
27+
template <class _ForwardIterator, class _Sentinel, class _Tp>
28+
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _ForwardIterator
29+
__fill(_ForwardIterator __first, _Sentinel __last, const _Tp& __value) {
2730
for (; __first != __last; ++__first)
2831
*__first = __value;
32+
return __first;
2933
}
3034

31-
template <class _RandomAccessIterator, class _Tp>
32-
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
33-
__fill(_RandomAccessIterator __first, _RandomAccessIterator __last, const _Tp& __value, random_access_iterator_tag) {
34-
std::fill_n(__first, __last - __first, __value);
35+
template <class _RandomAccessIterator,
36+
class _Tp,
37+
__enable_if_t<__has_random_access_iterator_category<_RandomAccessIterator>::value &&
38+
!__is_segmented_iterator_v<_RandomAccessIterator>,
39+
int> = 0>
40+
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _RandomAccessIterator
41+
__fill(_RandomAccessIterator __first, _RandomAccessIterator __last, const _Tp& __value) {
42+
return std::__fill_n(__first, __last - __first, __value);
43+
}
44+
45+
#ifndef _LIBCPP_CXX03_LANG
46+
template <class _SegmentedIterator, class _Tp, __enable_if_t<__is_segmented_iterator_v<_SegmentedIterator>, int> = 0>
47+
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
48+
_SegmentedIterator __fill(_SegmentedIterator __first, _SegmentedIterator __last, const _Tp& __value) {
49+
using __local_iterator_t = typename __segmented_iterator_traits<_SegmentedIterator>::__local_iterator;
50+
std::__for_each_segment(__first, __last, [&](__local_iterator_t __lfirst, __local_iterator_t __llast) {
51+
std::__fill(__lfirst, __llast, __value);
52+
});
53+
return __last;
3554
}
55+
#endif // !_LIBCPP_CXX03_LANG
3656

3757
template <class _ForwardIterator, class _Tp>
3858
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
3959
fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
40-
std::__fill(__first, __last, __value, typename iterator_traits<_ForwardIterator>::iterator_category());
60+
std::__fill(__first, __last, __value);
4161
}
4262

4363
_LIBCPP_END_NAMESPACE_STD

libcxx/include/__algorithm/fill_n.h

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,17 @@
99
#ifndef _LIBCPP___ALGORITHM_FILL_N_H
1010
#define _LIBCPP___ALGORITHM_FILL_N_H
1111

12+
#include <__algorithm/for_each_n_segment.h>
1213
#include <__algorithm/min.h>
1314
#include <__config>
1415
#include <__fwd/bit_reference.h>
16+
#include <__iterator/iterator_traits.h>
17+
#include <__iterator/segmented_iterator.h>
1518
#include <__memory/pointer_traits.h>
19+
#include <__type_traits/conjunction.h>
20+
#include <__type_traits/enable_if.h>
21+
#include <__type_traits/integral_constant.h>
22+
#include <__type_traits/negation.h>
1623
#include <__utility/convert_to_integral.h>
1724

1825
#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -26,9 +33,38 @@ _LIBCPP_BEGIN_NAMESPACE_STD
2633

2734
// fill_n isn't specialized for std::memset, because the compiler already optimizes the loop to a call to std::memset.
2835

29-
template <class _OutputIterator, class _Size, class _Tp>
36+
template <class _OutputIterator,
37+
class _Size,
38+
class _Tp
39+
#ifndef _LIBCPP_CXX03_LANG
40+
,
41+
__enable_if_t<!_And<_BoolConstant<__is_segmented_iterator_v<_OutputIterator>>,
42+
__has_random_access_local_iterator<_OutputIterator>>::value,
43+
int> = 0
44+
#endif
45+
>
3046
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _OutputIterator
31-
__fill_n(_OutputIterator __first, _Size __n, const _Tp& __value);
47+
__fill_n(_OutputIterator __first, _Size __n, const _Tp& __value) {
48+
for (; __n > 0; ++__first, (void)--__n)
49+
*__first = __value;
50+
return __first;
51+
}
52+
53+
#ifndef _LIBCPP_CXX03_LANG
54+
template < class _OutputIterator,
55+
class _Size,
56+
class _Tp,
57+
__enable_if_t<_And<_BoolConstant<__is_segmented_iterator_v<_OutputIterator>>,
58+
__has_random_access_local_iterator<_OutputIterator>>::value,
59+
int> = 0>
60+
inline _LIBCPP_HIDE_FROM_ABI
61+
_LIBCPP_CONSTEXPR_SINCE_CXX14 _OutputIterator __fill_n(_OutputIterator __first, _Size __n, const _Tp& __value) {
62+
using __local_iterator_t = typename __segmented_iterator_traits<_OutputIterator>::__local_iterator;
63+
return std::__for_each_n_segment(__first, __n, [&](__local_iterator_t __lfirst, __local_iterator_t __llast) {
64+
std::__fill_n(__lfirst, __llast - __lfirst, __value);
65+
});
66+
}
67+
#endif // !_LIBCPP_CXX03_LANG
3268

3369
template <bool _FillVal, class _Cp>
3470
_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
@@ -68,14 +104,6 @@ __fill_n(__bit_iterator<_Cp, false> __first, _Size __n, const bool& __value) {
68104
return __first + __n;
69105
}
70106

71-
template <class _OutputIterator, class _Size, class _Tp>
72-
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _OutputIterator
73-
__fill_n(_OutputIterator __first, _Size __n, const _Tp& __value) {
74-
for (; __n > 0; ++__first, (void)--__n)
75-
*__first = __value;
76-
return __first;
77-
}
78-
79107
template <class _OutputIterator, class _Size, class _Tp>
80108
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _OutputIterator
81109
fill_n(_OutputIterator __first, _Size __n, const _Tp& __value) {

libcxx/include/__algorithm/ranges_fill.h

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@
99
#ifndef _LIBCPP___ALGORITHM_RANGES_FILL_H
1010
#define _LIBCPP___ALGORITHM_RANGES_FILL_H
1111

12-
#include <__algorithm/ranges_fill_n.h>
12+
#include <__algorithm/fill.h>
13+
#include <__algorithm/fill_n.h>
1314
#include <__config>
1415
#include <__iterator/concepts.h>
1516
#include <__ranges/access.h>
1617
#include <__ranges/concepts.h>
1718
#include <__ranges/dangling.h>
19+
#include <__utility/move.h>
1820

1921
#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
2022
# pragma GCC system_header
@@ -31,12 +33,11 @@ namespace ranges {
3133
struct __fill {
3234
template <class _Type, output_iterator<const _Type&> _Iter, sentinel_for<_Iter> _Sent>
3335
_LIBCPP_HIDE_FROM_ABI constexpr _Iter operator()(_Iter __first, _Sent __last, const _Type& __value) const {
34-
if constexpr (random_access_iterator<_Iter> && sized_sentinel_for<_Sent, _Iter>) {
35-
return ranges::fill_n(__first, __last - __first, __value);
36+
if constexpr (sized_sentinel_for<_Sent, _Iter>) {
37+
auto __n = __last - __first;
38+
return std::__fill_n(std::move(__first), __n, __value);
3639
} else {
37-
for (; __first != __last; ++__first)
38-
*__first = __value;
39-
return __first;
40+
return std::__fill(std::move(__first), std::move(__last), __value);
4041
}
4142
}
4243

libcxx/test/std/algorithms/alg.modifying.operations/alg.fill/fill.pass.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
#include <array>
1818
#include <cassert>
1919
#include <cstddef>
20+
#include <deque>
21+
#include <ranges>
2022
#include <vector>
2123

2224
#include "sized_allocator.h"
@@ -93,6 +95,13 @@ TEST_CONSTEXPR_CXX20 bool test_vector_bool(std::size_t N) {
9395
return true;
9496
}
9597

98+
/*TEST_CONSTEXPR_CXX26*/ void test_deque() { // TODO: Mark as TEST_CONSTEXPR_CXX26 once std::deque is constexpr
99+
std::deque<int> in(20);
100+
std::deque<int> expected(in.size(), 42);
101+
std::fill(in.begin(), in.end(), 42);
102+
assert(in == expected);
103+
}
104+
96105
TEST_CONSTEXPR_CXX20 bool test() {
97106
types::for_each(types::forward_iterator_list<char*>(), Test<char>());
98107
types::for_each(types::forward_iterator_list<int*>(), Test<int>());
@@ -138,6 +147,20 @@ TEST_CONSTEXPR_CXX20 bool test() {
138147
}
139148
}
140149

150+
if (!TEST_IS_CONSTANT_EVALUATED) // TODO: Use TEST_STD_AT_LEAST_26_OR_RUNTIME_EVALUATED when std::deque is made constexpr
151+
test_deque();
152+
153+
#if TEST_STD_VER >= 20
154+
{ // Verify that join_view of vectors work properly.
155+
std::vector<std::vector<int>> v{{1, 2}, {1, 2, 3}, {}, {3, 4, 5}, {6}, {7, 8, 9, 6}, {0, 1, 2, 3, 0, 1, 2}};
156+
auto jv = std::ranges::join_view(v);
157+
std::fill(jv.begin(), jv.end(), 42);
158+
for (const auto& vec : v)
159+
for (auto n : vec)
160+
assert(n == 42);
161+
}
162+
#endif
163+
141164
return true;
142165
}
143166

libcxx/test/std/algorithms/alg.modifying.operations/alg.fill/fill_n.pass.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
#include <array>
1818
#include <cassert>
1919
#include <cstddef>
20+
#include <deque>
21+
#include <ranges>
2022
#include <vector>
2123

2224
#include "sized_allocator.h"
@@ -126,6 +128,13 @@ TEST_CONSTEXPR_CXX20 bool test_vector_bool(std::size_t N) {
126128
return true;
127129
}
128130

131+
/*TEST_CONSTEXPR_CXX26*/ void test_deque() { // TODO: Mark as TEST_CONSTEXPR_CXX26 once std::deque is constexpr
132+
std::deque<int> in(20);
133+
std::deque<int> expected(in.size(), 42);
134+
std::fill_n(in.begin(), in.size(), 42);
135+
assert(in == expected);
136+
}
137+
129138
TEST_CONSTEXPR_CXX20 bool test() {
130139
types::for_each(types::forward_iterator_list<char*>(), Test<char>());
131140
types::for_each(types::forward_iterator_list<int*>(), Test<int>());
@@ -221,6 +230,20 @@ TEST_CONSTEXPR_CXX20 bool test() {
221230
}
222231
}
223232

233+
if (!TEST_IS_CONSTANT_EVALUATED) // TODO: Use TEST_STD_AT_LEAST_26_OR_RUNTIME_EVALUATED when std::deque is made constexpr
234+
test_deque();
235+
236+
#if TEST_STD_VER >= 20
237+
{
238+
std::vector<std::vector<int>> v{{1, 2}, {1, 2, 3}, {}, {3, 4, 5}, {6}, {7, 8, 9, 6}, {0, 1, 2, 3, 0, 1, 2}};
239+
auto jv = std::ranges::join_view(v);
240+
std::fill_n(jv.begin(), std::distance(jv.begin(), jv.end()), 42);
241+
for (const auto& vec : v)
242+
for (auto n : vec)
243+
assert(n == 42);
244+
}
245+
#endif
246+
224247
return true;
225248
}
226249

libcxx/test/std/algorithms/alg.modifying.operations/alg.fill/ranges.fill.pass.cpp

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include <algorithm>
1919
#include <array>
2020
#include <cassert>
21+
#include <deque>
2122
#include <ranges>
2223
#include <string>
2324
#include <vector>
@@ -128,6 +129,13 @@ constexpr bool test_vector_bool(std::size_t N) {
128129
}
129130
#endif
130131

132+
/*TEST_CONSTEXPR_CXX26*/ void test_deque() { // TODO: Mark as TEST_CONSTEXPR_CXX26 once std::deque is constexpr
133+
std::deque<int> in(20);
134+
std::deque<int> expected(in.size(), 42);
135+
std::ranges::fill(in, 42);
136+
assert(in == expected);
137+
}
138+
131139
constexpr bool test() {
132140
test_iterators<cpp17_output_iterator<int*>, sentinel_wrapper<cpp17_output_iterator<int*>>>();
133141
test_iterators<cpp20_output_iterator<int*>, sentinel_wrapper<cpp20_output_iterator<int*>>>();
@@ -227,6 +235,20 @@ constexpr bool test() {
227235
}
228236
#endif
229237

238+
if (!TEST_IS_CONSTANT_EVALUATED) // TODO: Use TEST_STD_AT_LEAST_26_OR_RUNTIME_EVALUATED when std::deque is made constexpr
239+
test_deque();
240+
241+
#if TEST_STD_VER >= 20
242+
{
243+
std::vector<std::vector<int>> v{{1, 2}, {1, 2, 3}, {}, {3, 4, 5}, {6}, {7, 8, 9, 6}, {0, 1, 2, 3, 0, 1, 2}};
244+
auto jv = std::ranges::join_view(v);
245+
std::ranges::fill(jv, 42);
246+
for (const auto& vec : v)
247+
for (auto n : vec)
248+
assert(n == 42);
249+
}
250+
#endif
251+
230252
return true;
231253
}
232254

libcxx/test/std/algorithms/alg.modifying.operations/alg.fill/ranges.fill_n.pass.cpp

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include <algorithm>
1717
#include <array>
1818
#include <cassert>
19+
#include <deque>
1920
#include <ranges>
2021
#include <string>
2122
#include <vector>
@@ -101,6 +102,13 @@ constexpr bool test_vector_bool(std::size_t N) {
101102
}
102103
#endif
103104

105+
/*TEST_CONSTEXPR_CXX26*/ void test_deque() { // TODO: Mark as TEST_CONSTEXPR_CXX26 once std::deque is constexpr
106+
std::deque<int> in(20);
107+
std::deque<int> expected(in.size(), 42);
108+
std::ranges::fill_n(std::ranges::begin(in), std::ranges::size(in), 42);
109+
assert(in == expected);
110+
}
111+
104112
constexpr bool test() {
105113
test_iterators<cpp17_output_iterator<int*>, sentinel_wrapper<cpp17_output_iterator<int*>>>();
106114
test_iterators<cpp20_output_iterator<int*>, sentinel_wrapper<cpp20_output_iterator<int*>>>();
@@ -175,6 +183,20 @@ constexpr bool test() {
175183
}
176184
#endif
177185

186+
if (!TEST_IS_CONSTANT_EVALUATED) // TODO: Use TEST_STD_AT_LEAST_26_OR_RUNTIME_EVALUATED when std::deque is made constexpr
187+
test_deque();
188+
189+
#if TEST_STD_VER >= 20
190+
{
191+
std::vector<std::vector<int>> v{{1, 2}, {1, 2, 3}, {}, {3, 4, 5}, {6}, {7, 8, 9, 6}, {0, 1, 2, 3, 0, 1, 2}};
192+
auto jv = std::ranges::join_view(v);
193+
std::ranges::fill_n(std::ranges::begin(jv), std::ranges::distance(jv), 42);
194+
for (const auto& vec : v)
195+
for (auto n : vec)
196+
assert(n == 42);
197+
}
198+
#endif
199+
178200
return true;
179201
}
180202

0 commit comments

Comments
 (0)