Speed up compilation of common uses of std::visit() #164196

higher-performance · 2025-10-20T02:00:58Z

std::visit on my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:

#include <variant>

int main(int argc, char* argv[]) {
  std::variant<char, unsigned char, int> v;
  int n = 0;
#define X(V) \
  ++n;       \
  std::visit([](int) {}, V)
#ifdef NEW_VERSION
  // clang-format off
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
  (void)v;
#endif
#undef X

  return n;
}

This PR hard-codes common cases to speed up compilation by roughly ~8x for them.

llvmbot · 2025-10-20T02:01:33Z

@llvm/pr-subscribers-libcxx

Author: None (higher-performance)

Changes

std::visit on my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:

#include &lt;variant&gt;

int main(int argc, char* argv[]) {
  std::variant&lt;char, unsigned char, int&gt; v;
  int n = 0;
#define X(V) \
  ++n;       \
  std::visit([](int) {}, V)
#ifdef NEW_VERSION
  // clang-format off
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
  (void)v;
#endif
#undef X

  return n;
}

This PR hard-codes common cases to speed up compilation by roughly ~8x for them.

Full diff: https://github.com/llvm/llvm-project/pull/164196.diff

1 Files Affected:

(modified) libcxx/include/variant (+42-5)

diff --git a/libcxx/include/variant b/libcxx/include/variant
index 9beef146f203c..ef5bca4c2fda0 100644
--- a/libcxx/include/variant
+++ b/libcxx/include/variant
@@ -1578,11 +1578,48 @@ _LIBCPP_HIDE_FROM_ABI constexpr void __throw_if_valueless(_Vs&&... __vs) {
   }
 }
 
-template < class _Visitor, class... _Vs, typename>
-_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor, _Vs&&... __vs) {
-  using __variant_detail::__visitation::__variant;
-  std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
-  return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...);
+template <class _Visitor, class... _Vs, typename>
+_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor,
+                                                     _Vs&&... __vs) {
+#define _XDispatchIndex(_I)                                              \
+  case _I:                                                               \
+    if constexpr (__variant_size::value > _I) {                          \
+      return __visitor(                                                  \
+          __variant::__get_alt<_I>(std::forward<_Vs>(__vs)...).__value); \
+    }                                                                    \
+    [[__fallthrough__]]
+#define _XDispatchMax 7 // Speed up compilation for the common cases
+  if constexpr (sizeof...(_Vs) == 1) {
+    if constexpr (variant_size<__remove_cvref_t<_Vs>...>::value <=
+                  _XDispatchMax) {
+      using __variant_detail::__access::__variant;
+      using __variant_size = variant_size<__remove_cvref_t<_Vs>...>;
+      const size_t __indexes[] = {__vs.index()...};
+      switch (__indexes[0]) {
+        _XDispatchIndex(_XDispatchMax - 7);
+        _XDispatchIndex(_XDispatchMax - 6);
+        _XDispatchIndex(_XDispatchMax - 5);
+        _XDispatchIndex(_XDispatchMax - 4);
+        _XDispatchIndex(_XDispatchMax - 3);
+        _XDispatchIndex(_XDispatchMax - 2);
+        _XDispatchIndex(_XDispatchMax - 1);
+        _XDispatchIndex(_XDispatchMax - 0);
+        default:
+          __throw_bad_variant_access();
+      }
+    } else {
+      static_assert(
+          variant_size<__remove_cvref_t<_Vs>...>::value > _XDispatchMax,
+          "forgot to add dispatch case");
+    }
+  } else {
+    using __variant_detail::__visitation::__variant;
+    std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
+    return __variant::__visit_value(std::forward<_Visitor>(__visitor),
+                                    std::forward<_Vs>(__vs)...);
+  }
+#undef _XDispatchMax
+#undef _XDispatchIndex
 }
 
 #    if _LIBCPP_STD_VER >= 20

github-actions · 2025-10-20T02:02:37Z

✅ With the latest revision this PR passed the C/C++ code formatter.

libcxx/include/variant

higher-performance · 2025-11-12T22:07:00Z

Could someone please take a look at this? @philnik777 or anybody else involved?

philnik777

IIUC this basically fixes #62648. I'm not sure how to proceed here. Previously there were problems with this approach generating a lot of code, which is why it was reverted. This patch is a lot more conservative at least. I guess I'd really like if we had reflection here, since that would allow us to generate the perfect switch/case for any variant. I think I'd be fine with this as a temporary solution. I'd really like you to check compile time and code generation overhead of this though, since, as mentioned, this was a big problem previously.
Please run the variant benchmarks as well and share the results.

libcxx/include/variant

higher-performance · 2025-11-28T11:13:18Z

IIUC this basically fixes #62648. I'm not sure how to proceed here. Previously there were problems with this approach generating a lot of code, which is why it was reverted. This patch is a lot more conservative at least. I guess I'd really like if we had reflection here, since that would allow us to generate the perfect switch/case for any variant. I think I'd be fine with this as a temporary solution. I'd really like you to check compile time and code generation overhead of this though, since, as mentioned, this was a big problem previously. Please run the variant benchmarks as well and share the results.

Sounds good, thanks.
Re: #62648, note that this is only special-casing the (common) case of std::visiting 1 variant with up to 8 types. Invocations for 2+ variants or more than 8 types will still retain whatever problems or other characteristics they may have had before.

higher-performance · 2025-11-28T16:32:06Z

Done.

First, re: the runtime benchmarks, I had to run them a bit ad-hoc via googlebenchmark since I don't have the official setup handy, but regardless -- they actually indicate a speedup for < 8 elements:

Before:

Benchmark                       Time(ns)      CPU(ns)  Iterations
BM_Visit<1, 1>_mean              2.13           2.13    25000000  
BM_Visit<1, 2>_mean              3.22           3.22    25000000  
BM_Visit<1, 3>_mean              3.20           3.20    25000000  
BM_Visit<1, 4>_mean              3.21           3.21    25000000  
BM_Visit<1, 5>_mean              3.21           3.20    25000000  
BM_Visit<1, 6>_mean              3.22           3.22    25000000  
BM_Visit<1, 7>_mean              3.20           3.20    25000000  
BM_Visit<1, 8>_mean              3.21           3.21    25000000

After:

Benchmark                       Time(ns)      CPU(ns)  Iterations
BM_Visit<1, 1>_mean              2.19           2.19    25000000  
BM_Visit<1, 2>_mean              2.20           2.20    25000000  
BM_Visit<1, 3>_mean              2.18           2.18    25000000  
BM_Visit<1, 4>_mean              2.18           2.18    25000000  
BM_Visit<1, 5>_mean              2.22           2.22    25000000  
BM_Visit<1, 6>_mean              2.19           2.19    25000000  
BM_Visit<1, 7>_mean              2.19           2.19    25000000  
BM_Visit<1, 8>_mean              3.27           3.27    25000000

As for compile-time benchmarking, I also tested it like this:

#include <variant>

int main(int argc, char* argv[]) {
  std::variant<char, unsigned char, int> v;
  v.emplace<0>(3);
  int n = 0;
  unsigned int r = 1;
#define X(V) \
  ++n;       \
  std::visit([&](int x) { r *= x; }, V)
  (void)--n, X(v);
#ifdef NEW_VERSION
  // clang-format off
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
  (void)v;
#endif
#undef X

  return r % 1000 == 1 ? -1 : n;
}

Under -O3 I got:

Baseline: only 1 variant call: 5216 bytes
64 extra calls (new implementation): 5216 bytes, +0.1 ms
64 extra calls (old implementation): 54104 bytes, +0.43 ms

My setup/system is a bit different from last time, so it's not quite 8x here, but still, it's a huge win.

tl;dr: it's a strict win on every axis I measure. @philnik777

…ases

higher-performance requested a review from philnik777 October 20, 2025 02:00

higher-performance requested a review from a team as a code owner October 20, 2025 02:00

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 20, 2025

higher-performance force-pushed the variant-compile-speedup branch from 6cfb3c0 to cc3f6cd Compare October 20, 2025 02:05

frederick-vs-ja reviewed Oct 20, 2025

View reviewed changes

libcxx/include/variant Outdated Show resolved Hide resolved

higher-performance marked this pull request as draft October 20, 2025 03:11

higher-performance force-pushed the variant-compile-speedup branch 8 times, most recently from 350f45c to 7c3c8e6 Compare October 20, 2025 06:55

higher-performance marked this pull request as ready for review October 20, 2025 14:49

philnik777 reviewed Nov 28, 2025

View reviewed changes

libcxx/include/variant Outdated Show resolved Hide resolved

libcxx/include/variant Outdated Show resolved Hide resolved

higher-performance force-pushed the variant-compile-speedup branch 2 times, most recently from ed6da80 to 186c480 Compare November 28, 2025 15:52

higher-performance changed the title ~~Speed up compilation of common uses of std::visit() by ~8x~~ Speed up compilation of common uses of std::visit() Nov 28, 2025

higher-performance force-pushed the variant-compile-speedup branch from 186c480 to cb49b31 Compare November 28, 2025 15:59

higher-performance force-pushed the variant-compile-speedup branch 5 times, most recently from ab3fa02 to d4b0d1e Compare November 29, 2025 16:21

Speed up compilation of std::visit() by hard-coding the most common c…

5d47da1

…ases

higher-performance force-pushed the variant-compile-speedup branch from d4b0d1e to 5d47da1 Compare November 29, 2025 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up compilation of common uses of std::visit() #164196

Speed up compilation of common uses of std::visit() #164196

higher-performance commented Oct 20, 2025

Uh oh!

llvmbot commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

higher-performance commented Nov 12, 2025

Uh oh!

philnik777 left a comment

Uh oh!

Uh oh!

Uh oh!

higher-performance commented Nov 28, 2025 •

edited

Loading

Uh oh!

higher-performance commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Speed up compilation of common uses of std::visit() #164196

Are you sure you want to change the base?

Speed up compilation of common uses of std::visit() #164196

Conversation

higher-performance commented Oct 20, 2025

Uh oh!

llvmbot commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

higher-performance commented Nov 12, 2025

Uh oh!

philnik777 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

higher-performance commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

higher-performance commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Oct 20, 2025 •

edited

Loading

higher-performance commented Nov 28, 2025 •

edited

Loading