-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[libc++] Extend the scope of radix sorting inside std::stable_sort to floating-point types #129452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Extend the scope of radix sorting inside std::stable_sort to floating-point types #129452
Conversation
|
@llvm/pr-subscribers-libcxx Author: Дмитрий Изволов (izvolov) ChangesThese changes speed up Why does this worth doing? Comparison for only Full diff: https://github.com/llvm/llvm-project/pull/129452.diff 3 Files Affected:
diff --git a/libcxx/include/__algorithm/radix_sort.h b/libcxx/include/__algorithm/radix_sort.h
index f6d9fb1ad7ca9..f957b37f3628e 100644
--- a/libcxx/include/__algorithm/radix_sort.h
+++ b/libcxx/include/__algorithm/radix_sort.h
@@ -29,9 +29,11 @@
#include <__algorithm/for_each.h>
#include <__algorithm/move.h>
+#include <__bit/bit_cast.h>
#include <__bit/bit_log2.h>
#include <__bit/countl.h>
#include <__config>
+#include <__cstddef/size_t.h>
#include <__functional/identity.h>
#include <__iterator/access.h>
#include <__iterator/distance.h>
@@ -44,9 +46,12 @@
#include <__type_traits/enable_if.h>
#include <__type_traits/invoke.h>
#include <__type_traits/is_assignable.h>
+#include <__type_traits/is_enum.h>
+#include <__type_traits/is_floating_point.h>
#include <__type_traits/is_integral.h>
#include <__type_traits/is_unsigned.h>
#include <__type_traits/make_unsigned.h>
+#include <__type_traits/underlying_type.h>
#include <__utility/forward.h>
#include <__utility/integer_sequence.h>
#include <__utility/move.h>
@@ -298,6 +303,94 @@ _LIBCPP_HIDE_FROM_ABI constexpr auto __shift_to_unsigned(_Ip __n) {
return static_cast<make_unsigned_t<_Ip> >(__n ^ __min_value);
}
+template <size_t _Size>
+struct __unsigned_integer_of_size {};
+
+template <>
+struct __unsigned_integer_of_size<1> {
+ using type = uint8_t;
+};
+
+template <>
+struct __unsigned_integer_of_size<2> {
+ using type = uint16_t;
+};
+
+template <>
+struct __unsigned_integer_of_size<4> {
+ using type = uint32_t;
+};
+
+template <>
+struct __unsigned_integer_of_size<8> {
+ using type = uint64_t;
+};
+
+template <>
+struct __unsigned_integer_of_size<16> {
+# if _LIBCPP_HAS_INT128
+ using type = __int128;
+# endif
+};
+
+template <size_t _Size>
+using __unsigned_integer_of_size_t = typename __unsigned_integer_of_size<_Size>::type;
+
+template <class _Sc>
+using __unsigned_representation_for_t = __unsigned_integer_of_size_t<sizeof(_Sc)>;
+
+// Represent a scalar type as an ordered integer
+
+// The function is defined for ordered scalar types: integers, floating-point numbers, pointers, and enums.
+// Returns an integer representation such that for any `x` and `y` such that `x < y`, the expression
+// `__to_ordered_integral(x) < __to_ordered_integral(y)` is true, where `x`, `y` are values of the `Scalar` type.
+// __unsigned_representation_for_t<_Scalar> __to_ordered_integral(_Scalar);
+
+template <class _Integral, enable_if_t< is_integral_v<_Integral>, int> = 0>
+constexpr auto __to_ordered_integral(_Integral __n) {
+ return __n;
+}
+
+// An overload for floating-point numbers
+
+// From the IEEE 754 standard, we know that:
+// 1. The bit representation of positive floats directly reflects their order:
+// When comparing floats by magnitude, the number with the larger exponent is greater, and if the exponents are
+// equal, the one with the larger mantissa is greater.
+// 2. The bit representation of negative floats reflects their reverse order (for the same reasons).
+// 3. The most significant bit (sign bit) is zero for positive floats and one for negative floats. Therefore, in the raw
+// bit representation, any negative number will be greater than any positive number.
+
+// The only exception from this rule is `NaN`, which is unordered by definition.
+
+// Based on the above, to obtain correctly ordered integral representation of floating-point numbers, we need to:
+// 1. Invert the bit representation (including the sign bit) of negative floats to switch from reverse order to direct
+// order;
+// 2. Invert the sign bit for positive floats.
+
+// Thus, in final integral representation, we have reversed the order for negative floats and made all negative floats
+// smaller than all positive numbers (by inverting the sign bit).
+template <class _Floating, enable_if_t< is_floating_point_v<_Floating>, int> = 0>
+constexpr auto __to_ordered_integral(_Floating __f) {
+ using __integral_type = __unsigned_representation_for_t<_Floating>;
+ constexpr auto __bit_count = std::numeric_limits<__integral_type>::digits;
+ constexpr auto __sign_bit_mask = static_cast<__integral_type>(__integral_type{1} << (__bit_count - 1));
+
+ const auto __u = std::__bit_cast<__integral_type>(__f);
+
+ return static_cast<__integral_type>(__u & __sign_bit_mask ? ~__u : __u ^ __sign_bit_mask);
+}
+
+template <class _Enum, enable_if_t< is_enum_v<_Enum>, int> = 0>
+constexpr auto __to_ordered_integral(_Enum __e) {
+ return static_cast<std::underlying_type_t<_Enum>>(__e);
+}
+
+template <class _Pointer>
+constexpr auto __to_ordered_integral(_Pointer* __ptr) {
+ return std::__bit_cast<__unsigned_representation_for_t<_Pointer*>>(__ptr);
+}
+
struct __low_byte_fn {
template <class _Ip>
_LIBCPP_HIDE_FROM_ABI constexpr uint8_t operator()(_Ip __integer) const {
@@ -314,7 +407,9 @@ __radix_sort(_RandomAccessIterator1 __first,
_RandomAccessIterator2 __buffer,
_Map __map,
_Radix __radix) {
- auto __map_to_unsigned = [__map = std::move(__map)](const auto& __x) { return std::__shift_to_unsigned(__map(__x)); };
+ auto __map_to_unsigned = [__map = std::move(__map)](const auto& __x) {
+ return std::__shift_to_unsigned(__map(__to_ordered_integral(__x)));
+ };
std::__radix_sort_impl(__first, __last, __buffer, __map_to_unsigned, __radix);
}
diff --git a/libcxx/include/__algorithm/stable_sort.h b/libcxx/include/__algorithm/stable_sort.h
index 76d9e5557008f..2eb62b5230b32 100644
--- a/libcxx/include/__algorithm/stable_sort.h
+++ b/libcxx/include/__algorithm/stable_sort.h
@@ -25,9 +25,10 @@
#include <__memory/unique_temporary_buffer.h>
#include <__type_traits/desugars_to.h>
#include <__type_traits/enable_if.h>
+#include <__type_traits/invoke.h>
#include <__type_traits/is_constant_evaluated.h>
-#include <__type_traits/is_integral.h>
#include <__type_traits/is_same.h>
+#include <__type_traits/is_scalar.h>
#include <__type_traits/is_trivially_assignable.h>
#include <__type_traits/remove_cvref.h>
#include <__utility/move.h>
@@ -202,7 +203,7 @@ struct __stable_sort_switch {
#if _LIBCPP_STD_VER >= 17
template <class _Tp>
_LIBCPP_HIDE_FROM_ABI constexpr unsigned __radix_sort_min_bound() {
- static_assert(is_integral<_Tp>::value);
+ static_assert(is_scalar<_Tp>::value);
if constexpr (sizeof(_Tp) == 1) {
return 1 << 8;
}
@@ -212,13 +213,14 @@ _LIBCPP_HIDE_FROM_ABI constexpr unsigned __radix_sort_min_bound() {
template <class _Tp>
_LIBCPP_HIDE_FROM_ABI constexpr unsigned __radix_sort_max_bound() {
- static_assert(is_integral<_Tp>::value);
+ static_assert(is_scalar<_Tp>::value);
if constexpr (sizeof(_Tp) >= 8) {
return 1 << 15;
}
return 1 << 16;
}
+
#endif // _LIBCPP_STD_VER >= 17
template <class _AlgPolicy, class _Compare, class _RandomAccessIterator>
@@ -246,12 +248,12 @@ _LIBCPP_CONSTEXPR_SINCE_CXX26 void __stable_sort(
}
#if _LIBCPP_STD_VER >= 17
- constexpr auto __default_comp =
- __desugars_to_v<__totally_ordered_less_tag, __remove_cvref_t<_Compare>, value_type, value_type >;
- constexpr auto __integral_value =
- is_integral_v<value_type > && is_same_v< value_type&, __iter_reference<_RandomAccessIterator>>;
- constexpr auto __allowed_radix_sort = __default_comp && __integral_value;
- if constexpr (__allowed_radix_sort) {
+ constexpr auto __default_comp = __desugars_to_v<__less_tag, __remove_cvref_t<_Compare>, value_type, value_type >;
+ constexpr auto __scalar_value =
+ is_scalar_v<value_type > && is_same_v< value_type&, __iter_reference<_RandomAccessIterator>>;
+ // There are non-comparable scalars (std::nullptr_t, pointers to members), so we need to exclude them.
+ constexpr auto __comparable_value = is_invocable_r_v<bool, _Compare, value_type, value_type>;
+ if constexpr (__default_comp && __scalar_value && __comparable_value) {
if (__len <= __buff_size && __len >= static_cast<difference_type>(std::__radix_sort_min_bound<value_type>()) &&
__len <= static_cast<difference_type>(std::__radix_sort_max_bound<value_type>())) {
if (__libcpp_is_constant_evaluated()) {
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.sort/stable.sort/stable_sort.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.sort/stable.sort/stable_sort.pass.cpp
index 4ee1d795a23b2..7fda6c3d7b966 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.sort/stable.sort/stable_sort.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.sort/stable.sort/stable_sort.pass.cpp
@@ -68,6 +68,13 @@ TEST_CONSTEXPR_CXX26 std::vector<T> generate_sawtooth(int N, int M) {
if (++x == M)
x = 0;
}
+
+ if constexpr (std::is_signed_v<T>) {
+ for (auto& a : v) {
+ a -= (M / 2);
+ }
+ }
+
return v;
}
@@ -193,12 +200,38 @@ TEST_CONSTEXPR_CXX26 bool test() {
return true;
}
+template <class T>
+bool test_floating_special_values() {
+ static_assert(std::is_floating_point_v<T>);
+
+ auto v = generate_sawtooth<T>(1024, 512);
+ v.insert(v.end(), 256, T{0.0});
+ v.insert(v.end(), 256, T{-0.0});
+ v.insert(v.end(), 256, std::numeric_limits<T>::infinity());
+ v.insert(v.end(), 256, -std::numeric_limits<T>::infinity());
+
+ std::mt19937 randomness;
+ std::shuffle(v.begin(), v.end(), randomness);
+
+ std::stable_sort(v.begin(), v.end());
+ assert(std::is_sorted(v.begin(), v.end()));
+
+ return true;
+}
+
+template <class T>
+bool test_floating() {
+ return test<T>() && test_floating_special_values<T>();
+}
+
int main(int, char**) {
test<int>();
- test<float>();
+ test_floating<float>();
+ test_floating<double>();
#if TEST_STD_VER >= 26
static_assert(test<int>());
static_assert(test<float>());
+ static_assert(test<double>());
// test constexprness of radix sort branch
static_assert(test<char>());
#endif
|
803b546 to
d06a762
Compare
| // From the IEEE 754 standard, we know that: | ||
| // 1. The bit representation of positive floats directly reflects their order: | ||
| // When comparing floats by magnitude, the number with the larger exponent is greater, and if the exponents are | ||
| // equal, the one with the larger mantissa is greater. | ||
| // 2. The bit representation of negative floats reflects their reverse order (for the same reasons). | ||
| // 3. The most significant bit (sign bit) is zero for positive floats and one for negative floats. Therefore, in the raw | ||
| // bit representation, any negative number will be greater than any positive number. | ||
|
|
||
| // The only exception from this rule is `NaN`, which is unordered by definition. | ||
|
|
||
| // Based on the above, to obtain correctly ordered integral representation of floating-point numbers, we need to: | ||
| // 1. Invert the bit representation (including the sign bit) of negative floats to switch from reverse order to direct | ||
| // order; | ||
| // 2. Invert the sign bit for positive floats. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just signed integer comparison? Also, this doesn't exclude non-IEEE floating point types or types with padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because bit representation of positive floats is ascending, but bit representation of negative floats is descending (the larger integer, the smaller float), so we need to reverse the negative ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is also the case with two's complement integers? 0b1111 == -1, 0b1110 == -2 and so on. It's not clear to me though what you mean by "larger" and "smaller", since you can interpret these words in very different ways in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the strait order:
x = 0b1110 = -2
y = 0b1110 + 1 = 0b1111 = -1
x < y
But:
0b1111 represents a float which is bigger than 0b1110 by magnitude, but smaller in absolute values.
So bitcast<float>(x) > bitcast<float>(y).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that floating-point values in C++ are not required to be IEEE-754. Whether or not they are is determined by numeric_limits<T>::is_iec559.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
determined by numeric_limits::is_iec559
Wow, thanks.
| template <class _Enum, enable_if_t< is_enum<_Enum>::value, int> = 0> | ||
| _LIBCPP_HIDE_FROM_ABI constexpr auto __to_ordered_integral(_Enum __e) { | ||
| return static_cast<std::underlying_type_t<_Enum>>(__e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization is incorrect. You're allowed to define your own comparison operators for enums. Please add a regression test to make sure people don't try to add this again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've missed that, thanks.
So we can allow only arithmetic types and pointers, am I right?
Do we also need to exclude pointers to functions?
| return static_cast<std::underlying_type_t<_Enum>>(__e); | ||
| } | ||
|
|
||
| template <class _Pointer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| template <class _Pointer> | |
| template <class _Tp> |
since it's not a pointer but the pointee type.
|
|
||
| template <class _Pointer> | ||
| _LIBCPP_HIDE_FROM_ABI constexpr auto __to_ordered_integral(_Pointer* __ptr) { | ||
| return std::__bit_cast<__unsigned_representation_for_t<_Pointer*>>(__ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use uintptr_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, such cast is not constant-evaluation-compatible. The constexpr should be dropped and we should make sure that the cast is not performed in constant evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that, despite compilers allow this function to be constexpr, the standard forbids it?
Also why to use uintptr_t since it is defined as optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that, despite compilers allow this function to be
constexpr, the standard forbids it?
The function can be marked constexpr, but you can't evaluate it during constant evaluation.
Also why to use
uintptr_tsince it is defined as optional?
It's defined as such, but reality disagrees.
|
|
|
|
I think I've addressed all the issues, can you take a look, please? |
|
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| struct __unsigned_integer_of_size {}; | |
| struct __unsigned_integer_of_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| template <> | |
| struct __unsigned_integer_of_size<16> { | |
| # if _LIBCPP_HAS_INT128 | |
| using type = __int128; | |
| # endif | |
| }; | |
| template <> | |
| # if _LIBCPP_HAS_INT128 | |
| struct __unsigned_integer_of_size<16> { | |
| using type = __int128; | |
| }; | |
| # endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4e339ee to
0a0513d
Compare
philnik777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost good. Just a few nits.
libcxx/docs/ReleaseNotes/21.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - The ``std::stable_sort`` algorithm uses radix sort for floating-point types now, which can improve the performance | |
| up to 10 times, depending on type of sorted elements and the initial state of the sorted array. | |
| - The ``std::stable_sort`` algorithm uses radix sort for floating-point types now, which can improve the performance | |
| up to 10x, depending on type of sorted elements and the initial state of the sorted array. |
Since we're using it in the other release notes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| template <class _Tp, class = void> | |
| struct __is_ordered_integer_representable : false_type {}; | |
| template <class _Tp> | |
| struct __is_ordered_integer_representable<_Tp, __void_t<decltype(std::__to_ordered_integral(std::declval<_Tp>()))>> | |
| : true_type {}; | |
| template <class _Tp, class = void> | |
| inline const bool __is_ordered_integer_representable_v = false; | |
| template <class _Tp> | |
| inline const bool __is_ordered_integer_representable_v<_Tp, __void_t<decltype(std::__to_ordered_integral(std::declval<_Tp>()))>> | |
| = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
a095229 to
6ec578c
Compare
6ec578c to
3964314
Compare
3964314 to
80afedd
Compare
|
@philnik777 ping |
frederick-vs-ja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm merging this.
Thanks! |
These changes speed up
std::stable_sortin the case of sorting floating-point types.This applies only to IEEE 754 floats.
The speedup is similar to that achieved for integers in PR #104683 (see benchmarks below).
Why does this worth doing?
Previously,
std::stable_sorthad almost no chance of beatingstd::sort.Now there are cases when
std::stable_sortis preferrable, and the difference is significant.Comparison for only
std::stable_sort: