Skip to content

Conversation

kisuhorikka
Copy link
Contributor

@kisuhorikka kisuhorikka commented Jul 30, 2025

Fix #65136

Benchmark Baseline Candidate Difference % Difference
BM_CmpEqual_int_int 0.46 0.46 -0.00 -0.62
BM_CmpEqual_int_schar 0.45 0.45 -0.00 -0.40
BM_CmpEqual_int_short 0.45 0.45 0.00 0.34
BM_CmpEqual_int_uchar 0.78 0.44 -0.34 -43.18
BM_CmpEqual_int_uint 0.90 0.66 -0.24 -26.84
BM_CmpEqual_int_ushort 0.78 0.45 -0.33 -42.20
BM_CmpEqual_schar_int 0.45 0.45 -0.00 -0.77
BM_CmpEqual_schar_schar 0.54 0.57 0.03 5.64
BM_CmpEqual_schar_short 0.92 0.88 -0.04 -4.80
BM_CmpEqual_schar_uchar 1.84 0.66 -1.18 -64.16
BM_CmpEqual_schar_uint 0.78 0.66 -0.12 -15.18
BM_CmpEqual_schar_ushort 1.01 0.66 -0.35 -34.53
BM_CmpEqual_short_int 0.45 0.45 0.00 0.03
BM_CmpEqual_short_schar 0.89 0.88 -0.01 -0.80
BM_CmpEqual_short_short 0.47 0.46 -0.01 -1.28
BM_CmpEqual_short_uchar 1.11 0.66 -0.45 -40.63
BM_CmpEqual_short_uint 0.77 0.66 -0.12 -14.88
BM_CmpEqual_short_ushort 1.76 0.66 -1.10 -62.64
BM_CmpEqual_uchar_int 0.79 0.44 -0.35 -44.06
BM_CmpEqual_uchar_schar 1.76 0.66 -1.11 -62.68
BM_CmpEqual_uchar_short 1.11 0.66 -0.45 -40.33
BM_CmpEqual_uchar_uchar 0.57 0.51 -0.06 -10.61
BM_CmpEqual_uchar_uint 0.45 0.44 -0.01 -1.74
BM_CmpEqual_uchar_ushort 0.77 0.77 -0.00 -0.64
BM_CmpEqual_uint_int 0.88 0.66 -0.23 -25.69
BM_CmpEqual_uint_schar 0.77 0.66 -0.11 -14.85
BM_CmpEqual_uint_short 0.77 0.66 -0.11 -14.56
BM_CmpEqual_uint_uchar 0.44 0.44 -0.00 -0.57
BM_CmpEqual_uint_uint 0.47 0.51 0.04 8.62
BM_CmpEqual_uint_ushort 0.45 0.44 -0.00 -0.47
BM_CmpEqual_ushort_int 0.77 0.45 -0.33 -42.02
BM_CmpEqual_ushort_schar 1.02 0.66 -0.36 -35.30
BM_CmpEqual_ushort_short 1.76 0.66 -1.10 -62.60
BM_CmpEqual_ushort_uchar 0.78 0.77 -0.01 -1.84
BM_CmpEqual_ushort_uint 0.45 0.45 0.00 0.24
BM_CmpEqual_ushort_ushort 0.46 0.51 0.05 11.00
BM_CmpLess_int_int 0.67 0.66 -0.01 -0.99
BM_CmpLess_int_schar 0.66 0.66 -0.01 -0.86
BM_CmpLess_int_short 0.66 0.66 -0.00 -0.57
BM_CmpLess_int_uchar 0.88 0.66 -0.23 -25.48
BM_CmpLess_int_uint 1.76 0.66 -1.11 -62.68
BM_CmpLess_int_ushort 0.89 0.66 -0.23 -25.50
BM_CmpLess_schar_int 0.66 0.66 -0.00 -0.44
BM_CmpLess_schar_schar 0.66 0.66 -0.00 -0.40
BM_CmpLess_schar_short 0.88 0.88 -0.00 -0.50
BM_CmpLess_schar_uchar 1.10 0.71 -0.39 -35.24
BM_CmpLess_schar_uint 0.89 0.66 -0.23 -25.66
BM_CmpLess_schar_ushort 0.99 0.77 -0.22 -22.49
BM_CmpLess_short_int 0.66 0.66 -0.00 -0.35
BM_CmpLess_short_schar 0.89 0.88 -0.00 -0.48
BM_CmpLess_short_short 0.66 0.66 -0.00 -0.34
BM_CmpLess_short_uchar 1.10 0.71 -0.39 -35.36
BM_CmpLess_short_uint 0.88 0.66 -0.22 -25.39
BM_CmpLess_short_ushort 1.77 0.77 -1.00 -56.42
BM_CmpLess_uchar_int 0.97 0.66 -0.31 -31.95
BM_CmpLess_uchar_schar 1.11 0.66 -0.44 -40.17
BM_CmpLess_uchar_short 1.19 0.66 -0.53 -44.59
BM_CmpLess_uchar_uchar 0.66 0.66 -0.00 -0.67
BM_CmpLess_uchar_uint 0.67 0.66 -0.01 -1.19
BM_CmpLess_uchar_ushort 0.77 0.77 -0.00 -0.40
BM_CmpLess_uint_int 1.76 0.66 -1.10 -62.59
BM_CmpLess_uint_schar 0.89 0.66 -0.23 -25.99
BM_CmpLess_uint_short 0.88 0.66 -0.22 -25.41
BM_CmpLess_uint_uchar 0.66 0.66 -0.01 -0.81
BM_CmpLess_uint_uint 0.66 0.66 -0.00 -0.71
BM_CmpLess_uint_ushort 0.66 0.66 -0.00 -0.29
BM_CmpLess_ushort_int 0.98 0.66 -0.32 -33.00
BM_CmpLess_ushort_schar 1.29 0.77 -0.52 -40.56
BM_CmpLess_ushort_short 1.77 0.77 -1.00 -56.55
BM_CmpLess_ushort_uchar 0.77 0.77 -0.01 -0.72
BM_CmpLess_ushort_uint 0.66 0.66 -0.00 -0.46
BM_CmpLess_ushort_ushort 0.66 0.66 -0.00 -0.71

@kisuhorikka kisuhorikka requested a review from a team as a code owner July 30, 2025 13:23
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jul 30, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 30, 2025

@llvm/pr-subscribers-libcxx

Author: None (kisuhorikka)

Changes

Fix #65136


Full diff: https://github.com/llvm/llvm-project/pull/151332.diff

1 Files Affected:

  • (modified) libcxx/include/__utility/cmp.h (+14-2)
diff --git a/libcxx/include/__utility/cmp.h b/libcxx/include/__utility/cmp.h
index 14dc0c154c040..4c29f09628095 100644
--- a/libcxx/include/__utility/cmp.h
+++ b/libcxx/include/__utility/cmp.h
@@ -28,7 +28,13 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 
 template <__signed_or_unsigned_integer _Tp, __signed_or_unsigned_integer _Up>
 _LIBCPP_HIDE_FROM_ABI constexpr bool cmp_equal(_Tp __t, _Up __u) noexcept {
-  if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
+  if constexpr (sizeof(_Tp) < sizeof(int) && sizeof(_Up) < sizeof(int)) {
+    __builtin_assume(__t < numeric_limits<int>::max() && __u < numeric_limits<int>::max());
+    return static_cast<int>(__t) == static_cast<int>(__u);
+  } else if constexpr (sizeof(_Tp) < sizeof(long long) && sizeof(_Up) < sizeof(long long)) {
+    __builtin_assume(__t < numeric_limits<long long>::max() && __u < numeric_limits<long long>::max());
+    return static_cast<long long>(__t) == static_cast<long long>(__u);
+  } else if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
     return __t == __u;
   else if constexpr (is_signed_v<_Tp>)
     return __t < 0 ? false : make_unsigned_t<_Tp>(__t) == __u;
@@ -43,7 +49,13 @@ _LIBCPP_HIDE_FROM_ABI constexpr bool cmp_not_equal(_Tp __t, _Up __u) noexcept {
 
 template <__signed_or_unsigned_integer _Tp, __signed_or_unsigned_integer _Up>
 _LIBCPP_HIDE_FROM_ABI constexpr bool cmp_less(_Tp __t, _Up __u) noexcept {
-  if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
+  if constexpr (sizeof(_Tp) < sizeof(int) && sizeof(_Up) < sizeof(int)) {
+    __builtin_assume(__t < numeric_limits<int>::max() && __u < numeric_limits<int>::max());
+    return static_cast<int>(__t) < static_cast<int>(__u);
+  } else if constexpr (sizeof(_Tp) < sizeof(long long) && sizeof(_Up) < sizeof(long long)) {
+    __builtin_assume(__t < numeric_limits<long long>::max() && __u < numeric_limits<long long>::max());
+    return static_cast<long long>(__t) < static_cast<long long>(__u);
+  } else if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
     return __t < __u;
   else if constexpr (is_signed_v<_Tp>)
     return __t < 0 ? true : make_unsigned_t<_Tp>(__t) < __u;

@kisuhorikka
Copy link
Contributor Author

libcxx seems unhappy with __builtin_assume. Should we keep it for performance, or remove it? @ldionne

@H-G-Hristov
Copy link
Contributor

libcxx seems unhappy with __builtin_assume. Should we keep it for performance, or remove it? @ldionne

Have you read this discussion: #91612
You should probably use the macro __LIBCPP_ASSUME instead: https://github.com/llvm/llvm-project/blob/098426c4585ed99a6c7db771359fcd01d10ded44/libcxx/include/__assert#L26

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from cedea5b to b2409ee Compare July 31, 2025 12:02
@kisuhorikka
Copy link
Contributor Author

Checks for last commit failed due to 'interrupted by user', and I found no obvious errors.

The discussion points out that the CI is not done running yet and that job needs to be restarted (which will happen automatically).

So I will try to commit again if the CI doesn't restarts automatically after hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add these in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous bench shows slower results when converting parameters from short to long long. The performance is better if parameters are converted to int. https://quick-bench.com/q/q87HT8ePhR1AqI_Gy3Zj6rbElp8

However, the above results are based on clang 15 (I neglected this before unfortunately), and the difference seems vanished on clang 17. https://quick-bench.com/q/q0Q-bgFsVQyqe8QiDTiPHkawIu4

Though the little difference between speed, the assembly for int is shorter than long long, and _LIBCPP_ASSUME seems no work now. https://godbolt.org/z/a5s3zx64W

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Clang 21 (or 22?), _LIBCPP_ASSUME doesn't affect the generated same assembly. I'd like to remove _LIBCPP_ASSUME here.

@H-G-Hristov
Copy link
Contributor

Checks for last commit failed due to 'interrupted by user', and I found no obvious errors.

The discussion points out that the CI is not done running yet and that job needs to be restarted (which will happen automatically).

So I will try to commit again if the CI doesn't restarts automatically after hours.

Issues like that can happen. Just restart the failed job if you have access or rebase to restart the CI. Sometimes somebody can restart the job for you. Specifically I have the same issue with buildkite/libcxx-ci/aarch64

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch 2 times, most recently from 680faaa to 00b0d72 Compare August 3, 2025 02:09
@kisuhorikka
Copy link
Contributor Author

Hi @philnik777 There are 3 CI tests waiting for status to be reported for 3 days. The CI might be required to start manually for first PR in similar case . Could you please help to start it? Thank you for your help!

Copy link
Contributor

@frederick-vs-ja frederick-vs-ja Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't cover comparison between int and unsigned short yet.

I wonder whether we can/should do the following (ditto cmp_less).

template <__signed_or_unsigned_integer _Tp, __signed_integer _Ip>
inline constexpr bool __comparison_can_promote_to =
    __signed_integer<_Tp> ? sizeof(_Tp) <= sizeof(_Ip) : sizeof(_Tp) < sizeof(_Ip);

template <__signed_or_unsigned_integer _Tp, __signed_or_unsigned_integer _Up>
_LIBCPP_HIDE_FROM_ABI constexpr bool cmp_equal(_Tp __t, _Up __u) noexcept {
  if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
    return __t == __u;
  else if constexpr (__comparison_can_promote_to<_Tp, int> && __comparison_can_promote_to<_Up, int>)
    return static_cast<int>(__t) == static_cast<int>(__u);
  else if constexpr (__comparison_can_promote_to<_Tp, long long> && __comparison_can_promote_to<_Up, long long>)
    return static_cast<long long>(__t) == static_cast<long long>(__u);
  else if constexpr (is_signed_v<_Tp>)
    return __t < 0 ? false : make_unsigned_t<_Tp>(__t) == __u;
  else
    return __u < 0 ? false : __t == make_unsigned_t<_Up>(__u);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts to feel like it should be a compiler builtin.

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch 5 times, most recently from 534afd4 to f39c2b6 Compare August 10, 2025 08:54
@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from f39c2b6 to f421f9f Compare August 13, 2025 14:38
@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from f421f9f to a888172 Compare August 14, 2025 14:00
@kisuhorikka
Copy link
Contributor Author

ping

@kisuhorikka
Copy link
Contributor Author

ping

Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get some benchmarks for this change in libcxx/test/benchmarks/?

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from a888172 to 09796b7 Compare September 1, 2025 12:24
@kisuhorikka
Copy link
Contributor Author

Could we get some benchmarks for this change in libcxx/test/benchmarks/?

Original benchmarks like cmp_less.pass.cpp contain all comparison between signed or unsigned types, max/min, positive/negative values along with their orthogonal combination. It seems that no cases are ignored (I think).

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from 09796b7 to b51c731 Compare September 2, 2025 05:25
@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch 3 times, most recently from 79bed33 to 1acfba7 Compare September 3, 2025 04:56
@philnik777
Copy link
Contributor

Could we get some benchmarks for this change in libcxx/test/benchmarks/?

Original benchmarks like cmp_less.pass.cpp contain all comparison between signed or unsigned types, max/min, positive/negative values along with their orthogonal combination. It seems that no cases are ignored (I think).

That's a conformance test, not a benchmark.

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from 1acfba7 to 1d74e7a Compare September 6, 2025 12:48
@frederick-vs-ja
Copy link
Contributor

Then you can post the benchmark results in the description of this PR, which will be generally used as the squashed commit message.

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from 1d74e7a to 631b182 Compare September 7, 2025 01:29
Copy link

github-actions bot commented Sep 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from 631b182 to e31deed Compare September 9, 2025 13:41
@kisuhorikka
Copy link
Contributor Author

kisuhorikka commented Sep 16, 2025

Benchmark

Previous:

# .---command stdout------------
# | --------------------------------------------------------------------
# | Benchmark                          Time             CPU   Iterations
# | --------------------------------------------------------------------
# | BM_CmpEqual_schar_schar        0.220 ns        0.220 ns   3177494752
# | BM_CmpEqual_schar_uchar        0.331 ns        0.331 ns   2110362727
# | BM_CmpEqual_schar_short        0.279 ns        0.279 ns   2514663759
# | BM_CmpEqual_schar_ushort       0.440 ns        0.440 ns   1575565826
# | BM_CmpEqual_schar_int          0.221 ns        0.221 ns   3181969803
# | BM_CmpEqual_schar_uint         0.390 ns        0.390 ns   1816995929
# | BM_CmpEqual_uchar_schar        0.331 ns        0.331 ns   2115649335
# | BM_CmpEqual_uchar_uchar        0.221 ns        0.221 ns   3184200746
# | BM_CmpEqual_uchar_short        0.442 ns        0.442 ns   1589321810
# | BM_CmpEqual_uchar_ushort       0.309 ns        0.309 ns   2021463425
# | BM_CmpEqual_uchar_int          0.391 ns        0.391 ns   1784422719
# | BM_CmpEqual_uchar_uint         0.223 ns        0.223 ns   3172128497
# | BM_CmpEqual_short_schar        0.666 ns        0.666 ns   1061302500
# | BM_CmpEqual_short_uchar        0.387 ns        0.387 ns   1798205992
# | BM_CmpEqual_short_short        0.220 ns        0.220 ns   3151357745
# | BM_CmpEqual_short_ushort       0.440 ns        0.440 ns   1590645438
# | BM_CmpEqual_short_int          0.222 ns        0.222 ns   3130886416
# | BM_CmpEqual_short_uint         0.386 ns        0.386 ns   1789004566
# | BM_CmpEqual_ushort_schar       0.879 ns        0.879 ns    797628809
# | BM_CmpEqual_ushort_uchar       0.318 ns        0.318 ns   2124274656
# | BM_CmpEqual_ushort_short       0.442 ns        0.442 ns   1586989623
# | BM_CmpEqual_ushort_ushort      0.220 ns        0.220 ns   3185261261
# | BM_CmpEqual_ushort_int         0.388 ns        0.388 ns   1817087337
# | BM_CmpEqual_ushort_uint        0.222 ns        0.222 ns   3177555592
# | BM_CmpEqual_int_schar          0.220 ns        0.220 ns   3166528498
# | BM_CmpEqual_int_uchar          0.387 ns        0.387 ns   1813404703
# | BM_CmpEqual_int_short          0.221 ns        0.221 ns   3175979418
# | BM_CmpEqual_int_ushort         0.391 ns        0.391 ns   1815091806
# | BM_CmpEqual_int_int            0.221 ns        0.221 ns   3153454379
# | BM_CmpEqual_int_uint           0.444 ns        0.444 ns   1587217170
# | BM_CmpEqual_uint_schar         0.389 ns        0.389 ns   1805536199
# | BM_CmpEqual_uint_uchar         0.224 ns        0.224 ns   2989165156
# | BM_CmpEqual_uint_short         0.386 ns        0.386 ns   1819370240
# | BM_CmpEqual_uint_ushort        0.222 ns        0.222 ns   3166484295
# | BM_CmpEqual_uint_int           0.444 ns        0.444 ns   1590870523
# | BM_CmpEqual_uint_uint          0.222 ns        0.222 ns   3185152124
# | BM_CmpLess_schar_schar         0.222 ns        0.222 ns   3152212984
# | BM_CmpLess_schar_uchar         0.332 ns        0.332 ns   2109812040
# | BM_CmpLess_schar_short         0.279 ns        0.279 ns   2492153277
# | BM_CmpLess_schar_ushort        0.387 ns        0.387 ns   1795110195
# | BM_CmpLess_schar_int           0.222 ns        0.222 ns   3192497973
# | BM_CmpLess_schar_uint          0.331 ns        0.331 ns   2123653138
# | BM_CmpLess_uchar_schar         0.331 ns        0.331 ns   2123138786
# | BM_CmpLess_uchar_uchar         0.220 ns        0.220 ns   3164676524
# | BM_CmpLess_uchar_short         0.441 ns        0.441 ns   1592313841
# | BM_CmpLess_uchar_ushort        0.220 ns        0.220 ns   3183835199
# | BM_CmpLess_uchar_int           0.387 ns        0.387 ns   1809244269
# | BM_CmpLess_uchar_uint          0.223 ns        0.223 ns   3148934956
# | BM_CmpLess_short_schar         0.663 ns        0.663 ns   1050432573
# | BM_CmpLess_short_uchar         0.369 ns        0.369 ns   1908637743
# | BM_CmpLess_short_short         0.221 ns        0.221 ns   3116176255
# | BM_CmpLess_short_ushort        0.441 ns        0.441 ns   1592596128
# | BM_CmpLess_short_int           0.221 ns        0.221 ns   3187138545
# | BM_CmpLess_short_uint          0.336 ns        0.336 ns   2099495940
# | BM_CmpLess_ushort_schar        0.885 ns        0.885 ns    796785213
# | BM_CmpLess_ushort_uchar        0.221 ns        0.221 ns   3151738761
# | BM_CmpLess_ushort_short        0.440 ns        0.440 ns   1591517988
# | BM_CmpLess_ushort_ushort       0.220 ns        0.220 ns   3186446945
# | BM_CmpLess_ushort_int          0.390 ns        0.390 ns   1800763667
# | BM_CmpLess_ushort_uint         0.223 ns        0.223 ns   3119223477
# | BM_CmpLess_int_schar           0.221 ns        0.221 ns   3186819504
# | BM_CmpLess_int_uchar           0.331 ns        0.331 ns   2118205598
# | BM_CmpLess_int_short           0.220 ns        0.220 ns   3169547437
# | BM_CmpLess_int_ushort          0.332 ns        0.332 ns   2057865920
# | BM_CmpLess_int_int             0.221 ns        0.221 ns   3173979131
# | BM_CmpLess_int_uint            0.440 ns        0.440 ns   1589171707
# | BM_CmpLess_uint_schar          0.386 ns        0.386 ns   1815552403
# | BM_CmpLess_uint_uchar          0.220 ns        0.220 ns   3181579607
# | BM_CmpLess_uint_short          0.386 ns        0.386 ns   1815228192
# | BM_CmpLess_uint_ushort         0.220 ns        0.220 ns   3158796956
# | BM_CmpLess_uint_int            0.445 ns        0.445 ns   1594482892
# | BM_CmpLess_uint_uint           0.221 ns        0.221 ns   3179936252
# `-----------------------------
# .---command stderr------------
# | 2025-09-16T19:43:01+08:00
# | Running /home/kisuhorikka/Source/llvm-project/build/libcxx/test/benchmarks/utility/Output/[cmp.bench.cpp.dir/t.tmp.exe](https://cmp.bench.cpp.dir/t.tmp.exe)
# | Run on (16 X 2304 MHz CPU s)
# | CPU Caches:
# |   L1 Data 48 KiB (x8)
# |   L1 Instruction 32 KiB (x8)
# |   L2 Unified 1280 KiB (x8)
# |   L3 Unified 24576 KiB (x1)
# | Load Average: 0.00, 0.18, 0.25
# `-----------------------------

After:

| --------------------------------------------------------------------
| Benchmark                          Time             CPU   Iterations
| --------------------------------------------------------------------
| BM_CmpEqual_schar_schar        0.221 ns        0.221 ns   3170066917
| BM_CmpEqual_schar_uchar        0.232 ns        0.232 ns   3025025997
| BM_CmpEqual_schar_short        0.280 ns        0.280 ns   2507488723
| BM_CmpEqual_schar_ushort       0.361 ns        0.361 ns   2086768552
| BM_CmpEqual_schar_int          0.220 ns        0.220 ns   3123871432
| BM_CmpEqual_schar_uint         0.233 ns        0.233 ns   3023305113
| BM_CmpEqual_uchar_schar        0.221 ns        0.221 ns   3180711887
| BM_CmpEqual_uchar_uchar        0.220 ns        0.220 ns   3177773411
| BM_CmpEqual_uchar_short        0.221 ns        0.221 ns   3178208863
| BM_CmpEqual_uchar_ushort       0.295 ns        0.295 ns   2331090257
| BM_CmpEqual_uchar_int          0.221 ns        0.221 ns   3156844998
| BM_CmpEqual_uchar_uint         0.220 ns        0.220 ns   3169867220
| BM_CmpEqual_short_schar        0.662 ns        0.662 ns   1057277878
| BM_CmpEqual_short_uchar        0.232 ns        0.232 ns   3019119818
| BM_CmpEqual_short_short        0.222 ns        0.222 ns   3171932178
| BM_CmpEqual_short_ushort       0.282 ns        0.282 ns   2495630490
| BM_CmpEqual_short_int          0.221 ns        0.221 ns   3165298446
| BM_CmpEqual_short_uint         0.368 ns        0.368 ns   2151247764
| BM_CmpEqual_ushort_schar       0.280 ns        0.280 ns   2503964124
| BM_CmpEqual_ushort_uchar       0.221 ns        0.221 ns   3164632430
| BM_CmpEqual_ushort_short       0.281 ns        0.281 ns   2502613193
| BM_CmpEqual_ushort_ushort      0.221 ns        0.221 ns   3164295092
| BM_CmpEqual_ushort_int         0.221 ns        0.221 ns   3170047435
| BM_CmpEqual_ushort_uint        0.222 ns        0.222 ns   3172795286
| BM_CmpEqual_int_schar          0.221 ns        0.221 ns   3170067649
| BM_CmpEqual_int_uchar          0.221 ns        0.221 ns   3162764124
| BM_CmpEqual_int_short          0.222 ns        0.222 ns   3149384461
| BM_CmpEqual_int_ushort         0.221 ns        0.221 ns   3170863296
| BM_CmpEqual_int_int            0.221 ns        0.221 ns   3173623511
| BM_CmpEqual_int_uint           0.349 ns        0.349 ns   2074548798
| BM_CmpEqual_uint_schar         0.221 ns        0.221 ns   3172609898
| BM_CmpEqual_uint_uchar         0.221 ns        0.221 ns   3167865061
| BM_CmpEqual_uint_short         0.221 ns        0.221 ns   3166421901
| BM_CmpEqual_uint_ushort        0.221 ns        0.221 ns   3154601627
| BM_CmpEqual_uint_int           0.222 ns        0.222 ns   3170922460
| BM_CmpEqual_uint_uint          0.221 ns        0.221 ns   3178919527
| BM_CmpLess_schar_schar         0.220 ns        0.220 ns   3166946919
| BM_CmpLess_schar_uchar         0.328 ns        0.328 ns   1989530669
| BM_CmpLess_schar_short         0.280 ns        0.280 ns   2481635006
| BM_CmpLess_schar_ushort        0.280 ns        0.280 ns   2494961754
| BM_CmpLess_schar_int           0.221 ns        0.221 ns   3170387379
| BM_CmpLess_schar_uint          0.232 ns        0.232 ns   3020929987
| BM_CmpLess_uchar_schar         0.339 ns        0.339 ns   2301816456
| BM_CmpLess_uchar_uchar         0.221 ns        0.221 ns   3168381752
| BM_CmpLess_uchar_short         0.221 ns        0.221 ns   3172130050
| BM_CmpLess_uchar_ushort        0.226 ns        0.226 ns   3177268420
| BM_CmpLess_uchar_int           0.221 ns        0.221 ns   3171207495
| BM_CmpLess_uchar_uint          0.221 ns        0.221 ns   3175834030
| BM_CmpLess_short_schar         0.663 ns        0.663 ns   1047532663
| BM_CmpLess_short_uchar         0.301 ns        0.301 ns   2210991543
| BM_CmpLess_short_short         0.220 ns        0.220 ns   3178165314
| BM_CmpLess_short_ushort        0.280 ns        0.280 ns   2502057388
| BM_CmpLess_short_int           0.223 ns        0.223 ns   3163647242
| BM_CmpLess_short_uint          0.233 ns        0.233 ns   3015274874
| BM_CmpLess_ushort_schar        0.280 ns        0.280 ns   2494740988
| BM_CmpLess_ushort_uchar        0.316 ns        0.316 ns   2124921107
| BM_CmpLess_ushort_short        0.280 ns        0.280 ns   2503617728
| BM_CmpLess_ushort_ushort       0.221 ns        0.221 ns   3166649427
| BM_CmpLess_ushort_int          0.221 ns        0.221 ns   3109921684
| BM_CmpLess_ushort_uint         0.221 ns        0.221 ns   3173252017
| BM_CmpLess_int_schar           0.222 ns        0.222 ns   3164772816
| BM_CmpLess_int_uchar           0.221 ns        0.221 ns   3141461765
| BM_CmpLess_int_short           0.221 ns        0.221 ns   3165358991
| BM_CmpLess_int_ushort          0.221 ns        0.221 ns   3172701309
| BM_CmpLess_int_int             0.222 ns        0.222 ns   3152883866
| BM_CmpLess_int_uint            0.232 ns        0.232 ns   2991786154
| BM_CmpLess_uint_schar          0.221 ns        0.221 ns   3174540447
| BM_CmpLess_uint_uchar          0.221 ns        0.220 ns   3177358990
| BM_CmpLess_uint_short          0.300 ns        0.300 ns   2362252788
| BM_CmpLess_uint_ushort         0.221 ns        0.221 ns   3178372133
| BM_CmpLess_uint_int            0.221 ns        0.221 ns   3177162541
| BM_CmpLess_uint_uint           0.221 ns        0.221 ns   3177473622
`-----------------------------
.---command stderr------------
| 2025-09-09T20:03:38+08:00
| Running /home/kisuhorikka/Source/llvm-project/build/libcxx/test/benchmarks/utility/Output/cmp.bench.cpp.dir/t.tmp.exe
| Run on (16 X 2304 MHz CPU s)
| CPU Caches:
|   L1 Data 48 KiB (x8)
|   L1 Instruction 32 KiB (x8)
|   L2 Unified 1280 KiB (x8)
|   L3 Unified 24576 KiB (x1)
| Load Average: 0.12, 0.25, 0.30
`-----------------------------

@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch 2 times, most recently from 452ecc7 to c6aec70 Compare October 12, 2025 02:40
@kisuhorikka kisuhorikka force-pushed the improve_integer_comparison branch from c6aec70 to 4afea89 Compare October 12, 2025 03:03
@frederick-vs-ja
Copy link
Contributor

Can we show that the regression for the BM_CmpLess_ushort_uchar case is not real? Assembly didn't change for that case, so the performance should be unchanged.

@kisuhorikka
Copy link
Contributor Author

Can we show that the regression for the BM_CmpLess_ushort_uchar case is not real? Assembly didn't change for that case, so the performance should be unchanged.

There are only few instructions for each comparison whose benchmarks might be disturbed by cache misses. So more test cases are added in the new commit to average the impact of cache misses.

Updated benchmark:

Benchmark Pre Time Pre CPU Pre Iterations After Time After CPU After Iterations
BM_CmpEqual_schar_schar 0.565 ns 0.565 ns 1233039780 0.474 ns 0.474 ns 1468418942
BM_CmpEqual_schar_uchar 1.75 ns 1.75 ns 400155443 0.699 ns 0.699 ns 997185643
BM_CmpEqual_schar_short 0.878 ns 0.878 ns 800160398 0.934 ns 0.934 ns 749695517
BM_CmpEqual_schar_ushort 1.04 ns 1.04 ns 683038714 0.661 ns 0.661 ns 1057169539
BM_CmpEqual_schar_int 0.444 ns 0.444 ns 1568209863 0.446 ns 0.446 ns 1570215890
BM_CmpEqual_schar_uint 0.765 ns 0.765 ns 915786318 0.658 ns 0.658 ns 1067457472
BM_CmpEqual_uchar_schar 1.75 ns 1.75 ns 400115640 0.656 ns 0.656 ns 1066708506
BM_CmpEqual_uchar_uchar 0.458 ns 0.458 ns 1528459955 0.565 ns 0.565 ns 1238801105
BM_CmpEqual_uchar_short 1.09 ns 1.09 ns 640197875 0.657 ns 0.657 ns 1047385030
BM_CmpEqual_uchar_ushort 0.765 ns 0.765 ns 913656892 0.765 ns 0.765 ns 914754774
BM_CmpEqual_uchar_int 0.765 ns 0.765 ns 911019869 0.438 ns 0.438 ns 1599120034
BM_CmpEqual_uchar_uint 0.438 ns 0.438 ns 1599326064 0.437 ns 0.437 ns 1600127121
BM_CmpEqual_short_schar 0.879 ns 0.879 ns 796392841 0.927 ns 0.927 ns 796470208
BM_CmpEqual_short_uchar 1.09 ns 1.09 ns 640033800 0.687 ns 0.687 ns 1014051054
BM_CmpEqual_short_short 0.459 ns 0.459 ns 1525846444 0.459 ns 0.459 ns 1527749243
BM_CmpEqual_short_ushort 1.75 ns 1.75 ns 400188894 0.656 ns 0.656 ns 1067239715
BM_CmpEqual_short_int 0.448 ns 0.448 ns 1572556343 0.444 ns 0.444 ns 1575796330
BM_CmpEqual_short_uint 0.765 ns 0.765 ns 914780248 0.656 ns 0.656 ns 1059166493
BM_CmpEqual_ushort_schar 0.991 ns 0.991 ns 705007570 0.657 ns 0.657 ns 1065794851
BM_CmpEqual_ushort_uchar 0.767 ns 0.767 ns 913270819 0.792 ns 0.792 ns 914912270
BM_CmpEqual_ushort_short 1.75 ns 1.75 ns 400160762 0.675 ns 0.675 ns 826779785
BM_CmpEqual_ushort_ushort 0.461 ns 0.461 ns 1524731230 0.458 ns 0.458 ns 1520422303
BM_CmpEqual_ushort_int 0.766 ns 0.766 ns 913543043 0.450 ns 0.450 ns 1556490937
BM_CmpEqual_ushort_uint 0.444 ns 0.444 ns 1575337932 0.448 ns 0.449 ns 1559408306
BM_CmpEqual_int_schar 0.450 ns 0.450 ns 1555542389 0.439 ns 0.439 ns 1589627414
BM_CmpEqual_int_uchar 0.766 ns 0.766 ns 914219323 0.441 ns 0.441 ns 1591568358
BM_CmpEqual_int_short 0.446 ns 0.446 ns 1555307308 0.445 ns 0.445 ns 1574117270
BM_CmpEqual_int_ushort 0.766 ns 0.766 ns 912586292 0.447 ns 0.447 ns 1572599409
BM_CmpEqual_int_int 0.458 ns 0.458 ns 1527788155 0.458 ns 0.458 ns 1527290043
BM_CmpEqual_int_uint 0.876 ns 0.876 ns 799678831 0.656 ns 0.656 ns 1066967531
BM_CmpEqual_uint_schar 0.766 ns 0.766 ns 914258190 0.656 ns 0.656 ns 1066473247
BM_CmpEqual_uint_uchar 0.441 ns 0.441 ns 1584339677 0.441 ns 0.441 ns 1590570574
BM_CmpEqual_uint_short 0.766 ns 0.766 ns 912997947 0.656 ns 0.656 ns 1066021076
BM_CmpEqual_uint_ushort 0.444 ns 0.444 ns 1576572759 0.447 ns 0.447 ns 1575352333
BM_CmpEqual_uint_int 0.876 ns 0.876 ns 799382283 0.656 ns 0.656 ns 1067275969
BM_CmpEqual_uint_uint 0.459 ns 0.459 ns 1525103200 0.458 ns 0.458 ns 1526065716
BM_CmpLess_schar_schar 0.657 ns 0.657 ns 1066096863 0.656 ns 0.656 ns 1066888645
BM_CmpLess_schar_uchar 1.09 ns 1.09 ns 639049433 0.711 ns 0.711 ns 984500890
BM_CmpLess_schar_short 0.876 ns 0.876 ns 798632631 0.875 ns 0.875 ns 799695677
BM_CmpLess_schar_ushort 0.985 ns 0.985 ns 710587065 0.766 ns 0.766 ns 911917914
BM_CmpLess_schar_int 0.657 ns 0.657 ns 1065800676 0.656 ns 0.656 ns 1066724225
BM_CmpLess_schar_uint 0.876 ns 0.876 ns 799193308 0.656 ns 0.656 ns 1066925623
BM_CmpLess_uchar_schar 1.09 ns 1.09 ns 639360593 0.656 ns 0.656 ns 1067359715
BM_CmpLess_uchar_uchar 0.657 ns 0.657 ns 1064608553 0.656 ns 0.656 ns 1066927054
BM_CmpLess_uchar_short 1.21 ns 1.21 ns 587066740 0.656 ns 0.656 ns 1060213733
BM_CmpLess_uchar_ushort 0.766 ns 0.766 ns 912133787 0.766 ns 0.766 ns 915267718
BM_CmpLess_uchar_int 0.918 ns 0.918 ns 765842907 0.656 ns 0.656 ns 1066515884
BM_CmpLess_uchar_uint 0.657 ns 0.657 ns 1066164185 0.656 ns 0.656 ns 1066312026
BM_CmpLess_short_schar 0.880 ns 0.880 ns 795944920 0.882 ns 0.882 ns 796635875
BM_CmpLess_short_uchar 1.10 ns 1.10 ns 639911831 0.711 ns 0.711 ns 984710970
BM_CmpLess_short_short 0.657 ns 0.657 ns 1065421183 0.656 ns 0.656 ns 1067412074
BM_CmpLess_short_ushort 1.75 ns 1.75 ns 400147821 0.765 ns 0.765 ns 914291793
BM_CmpLess_short_int 0.656 ns 0.656 ns 1066208047 0.656 ns 0.656 ns 1067475427
BM_CmpLess_short_uint 0.875 ns 0.875 ns 799147834 0.656 ns 0.656 ns 1067376739
BM_CmpLess_ushort_schar 1.24 ns 1.24 ns 550669777 0.765 ns 0.765 ns 914655591
BM_CmpLess_ushort_uchar 0.766 ns 0.766 ns 913008081 0.766 ns 0.766 ns 914354468
BM_CmpLess_ushort_short 1.75 ns 1.75 ns 399870485 0.765 ns 0.765 ns 914008345
BM_CmpLess_ushort_ushort 0.656 ns 0.656 ns 1066632307 0.656 ns 0.656 ns 1066777173
BM_CmpLess_ushort_int 0.910 ns 0.910 ns 775972168 0.656 ns 0.656 ns 1067070080
BM_CmpLess_ushort_uint 0.657 ns 0.657 ns 1066934713 0.656 ns 0.656 ns 1067076782
BM_CmpLess_int_schar 0.657 ns 0.657 ns 1066399193 0.656 ns 0.656 ns 1067115416
BM_CmpLess_int_uchar 0.876 ns 0.876 ns 800107882 0.656 ns 0.656 ns 1067440494
BM_CmpLess_int_short 0.657 ns 0.657 ns 1066094298 0.656 ns 0.656 ns 1067358217
BM_CmpLess_int_ushort 0.876 ns 0.876 ns 800038603 0.657 ns 0.657 ns 1066994675
BM_CmpLess_int_int 0.656 ns 0.656 ns 1066439046 0.656 ns 0.656 ns 1067275611
BM_CmpLess_int_uint 1.75 ns 1.75 ns 400174197 0.656 ns 0.656 ns 1065684240
BM_CmpLess_uint_schar 0.875 ns 0.875 ns 799707846 0.656 ns 0.656 ns 1067283748
BM_CmpLess_uint_uchar 0.658 ns 0.658 ns 1066549911 0.660 ns 0.660 ns 1066946373
BM_CmpLess_uint_short 0.876 ns 0.876 ns 787226463 0.659 ns 0.659 ns 1057090897
BM_CmpLess_uint_ushort 0.656 ns 0.656 ns 1066207543 0.657 ns 0.657 ns 1066428372
BM_CmpLess_uint_int 1.75 ns 1.75 ns 400133388 0.657 ns 0.657 ns 1066849311
BM_CmpLess_uint_uint 0.656 ns 0.656 ns 1066344448 0.659 ns 0.659 ns 1067416566

Copy link
Contributor

@frederick-vs-ja frederick-vs-ja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with PR message updated to mention new benchmark results (before/after).

@philnik777
Copy link
Contributor

Could you provide a table with the relative speedups? You should be able to use libcxx/utils/compare-benchmarks for that. (See https://libcxx.llvm.org/TestingLibcxx.html#benchmarks)

@kisuhorikka
Copy link
Contributor Author

Could you provide a table with the relative speedups? You should be able to use libcxx/utils/compare-benchmarks for that. (See https://libcxx.llvm.org/TestingLibcxx.html#benchmarks)

Benchmark Baseline Candidate Difference % Difference
BM_CmpEqual_int_int 0.46 0.46 -0.00 -0.62
BM_CmpEqual_int_schar 0.45 0.45 -0.00 -0.40
BM_CmpEqual_int_short 0.45 0.45 0.00 0.34
BM_CmpEqual_int_uchar 0.78 0.44 -0.34 -43.18
BM_CmpEqual_int_uint 0.90 0.66 -0.24 -26.84
BM_CmpEqual_int_ushort 0.78 0.45 -0.33 -42.20
BM_CmpEqual_schar_int 0.45 0.45 -0.00 -0.77
BM_CmpEqual_schar_schar 0.54 0.57 0.03 5.64
BM_CmpEqual_schar_short 0.92 0.88 -0.04 -4.80
BM_CmpEqual_schar_uchar 1.84 0.66 -1.18 -64.16
BM_CmpEqual_schar_uint 0.78 0.66 -0.12 -15.18
BM_CmpEqual_schar_ushort 1.01 0.66 -0.35 -34.53
BM_CmpEqual_short_int 0.45 0.45 0.00 0.03
BM_CmpEqual_short_schar 0.89 0.88 -0.01 -0.80
BM_CmpEqual_short_short 0.47 0.46 -0.01 -1.28
BM_CmpEqual_short_uchar 1.11 0.66 -0.45 -40.63
BM_CmpEqual_short_uint 0.77 0.66 -0.12 -14.88
BM_CmpEqual_short_ushort 1.76 0.66 -1.10 -62.64
BM_CmpEqual_uchar_int 0.79 0.44 -0.35 -44.06
BM_CmpEqual_uchar_schar 1.76 0.66 -1.11 -62.68
BM_CmpEqual_uchar_short 1.11 0.66 -0.45 -40.33
BM_CmpEqual_uchar_uchar 0.57 0.51 -0.06 -10.61
BM_CmpEqual_uchar_uint 0.45 0.44 -0.01 -1.74
BM_CmpEqual_uchar_ushort 0.77 0.77 -0.00 -0.64
BM_CmpEqual_uint_int 0.88 0.66 -0.23 -25.69
BM_CmpEqual_uint_schar 0.77 0.66 -0.11 -14.85
BM_CmpEqual_uint_short 0.77 0.66 -0.11 -14.56
BM_CmpEqual_uint_uchar 0.44 0.44 -0.00 -0.57
BM_CmpEqual_uint_uint 0.47 0.51 0.04 8.62
BM_CmpEqual_uint_ushort 0.45 0.44 -0.00 -0.47
BM_CmpEqual_ushort_int 0.77 0.45 -0.33 -42.02
BM_CmpEqual_ushort_schar 1.02 0.66 -0.36 -35.30
BM_CmpEqual_ushort_short 1.76 0.66 -1.10 -62.60
BM_CmpEqual_ushort_uchar 0.78 0.77 -0.01 -1.84
BM_CmpEqual_ushort_uint 0.45 0.45 0.00 0.24
BM_CmpEqual_ushort_ushort 0.46 0.51 0.05 11.00
BM_CmpLess_int_int 0.67 0.66 -0.01 -0.99
BM_CmpLess_int_schar 0.66 0.66 -0.01 -0.86
BM_CmpLess_int_short 0.66 0.66 -0.00 -0.57
BM_CmpLess_int_uchar 0.88 0.66 -0.23 -25.48
BM_CmpLess_int_uint 1.76 0.66 -1.11 -62.68
BM_CmpLess_int_ushort 0.89 0.66 -0.23 -25.50
BM_CmpLess_schar_int 0.66 0.66 -0.00 -0.44
BM_CmpLess_schar_schar 0.66 0.66 -0.00 -0.40
BM_CmpLess_schar_short 0.88 0.88 -0.00 -0.50
BM_CmpLess_schar_uchar 1.10 0.71 -0.39 -35.24
BM_CmpLess_schar_uint 0.89 0.66 -0.23 -25.66
BM_CmpLess_schar_ushort 0.99 0.77 -0.22 -22.49
BM_CmpLess_short_int 0.66 0.66 -0.00 -0.35
BM_CmpLess_short_schar 0.89 0.88 -0.00 -0.48
BM_CmpLess_short_short 0.66 0.66 -0.00 -0.34
BM_CmpLess_short_uchar 1.10 0.71 -0.39 -35.36
BM_CmpLess_short_uint 0.88 0.66 -0.22 -25.39
BM_CmpLess_short_ushort 1.77 0.77 -1.00 -56.42
BM_CmpLess_uchar_int 0.97 0.66 -0.31 -31.95
BM_CmpLess_uchar_schar 1.11 0.66 -0.44 -40.17
BM_CmpLess_uchar_short 1.19 0.66 -0.53 -44.59
BM_CmpLess_uchar_uchar 0.66 0.66 -0.00 -0.67
BM_CmpLess_uchar_uint 0.67 0.66 -0.01 -1.19
BM_CmpLess_uchar_ushort 0.77 0.77 -0.00 -0.40
BM_CmpLess_uint_int 1.76 0.66 -1.10 -62.59
BM_CmpLess_uint_schar 0.89 0.66 -0.23 -25.99
BM_CmpLess_uint_short 0.88 0.66 -0.22 -25.41
BM_CmpLess_uint_uchar 0.66 0.66 -0.01 -0.81
BM_CmpLess_uint_uint 0.66 0.66 -0.00 -0.71
BM_CmpLess_uint_ushort 0.66 0.66 -0.00 -0.29
BM_CmpLess_ushort_int 0.98 0.66 -0.32 -33.00
BM_CmpLess_ushort_schar 1.29 0.77 -0.52 -40.56
BM_CmpLess_ushort_short 1.77 0.77 -1.00 -56.55
BM_CmpLess_ushort_uchar 0.77 0.77 -0.01 -0.72
BM_CmpLess_ushort_uint 0.66 0.66 -0.00 -0.46
BM_CmpLess_ushort_ushort 0.66 0.66 -0.00 -0.71

@frederick-vs-ja frederick-vs-ja merged commit 95eb2db into llvm:main Oct 16, 2025
78 checks passed
Copy link

@kisuhorikka Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@ldionne
Copy link
Member

ldionne commented Oct 17, 2025

@frederick-vs-ja When we merge commits like this, let's try not to include full benchmark output in the commit message, or if we do, try to make it aligned properly. The commit message ended up being very long and difficult to read because the table got mis-aligned by Github upon squashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[libc++] std::cmp_less and other integer comparison functions could be improved

7 participants