Skip to content

Conversation

@higher-performance
Copy link
Contributor

std::visit on my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:

#include <variant>

int main(int argc, char* argv[]) {
  std::variant<char, unsigned char, int> v;
  int n = 0;
#define X(V) \
  ++n;       \
  std::visit([](int) {}, V)
#ifdef NEW_VERSION
  // clang-format off
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
  (void)v;
#endif
#undef X

  return n;
}

This PR hard-codes common cases to speed up compilation by roughly ~8x for them.

@higher-performance higher-performance requested a review from a team as a code owner October 20, 2025 02:00
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 20, 2025

@llvm/pr-subscribers-libcxx

Author: None (higher-performance)

Changes

std::visit on my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:

#include &lt;variant&gt;

int main(int argc, char* argv[]) {
  std::variant&lt;char, unsigned char, int&gt; v;
  int n = 0;
#define X(V) \
  ++n;       \
  std::visit([](int) {}, V)
#ifdef NEW_VERSION
  // clang-format off
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
  X(v); X(v); X(v); X(v); X(v); X(v); X(v); X(v);
// clang-format on
#else
  (void)v;
#endif
#undef X

  return n;
}

This PR hard-codes common cases to speed up compilation by roughly ~8x for them.


Full diff: https://github.com/llvm/llvm-project/pull/164196.diff

1 Files Affected:

  • (modified) libcxx/include/variant (+42-5)
diff --git a/libcxx/include/variant b/libcxx/include/variant
index 9beef146f203c..ef5bca4c2fda0 100644
--- a/libcxx/include/variant
+++ b/libcxx/include/variant
@@ -1578,11 +1578,48 @@ _LIBCPP_HIDE_FROM_ABI constexpr void __throw_if_valueless(_Vs&&... __vs) {
   }
 }
 
-template < class _Visitor, class... _Vs, typename>
-_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor, _Vs&&... __vs) {
-  using __variant_detail::__visitation::__variant;
-  std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
-  return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...);
+template <class _Visitor, class... _Vs, typename>
+_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor,
+                                                     _Vs&&... __vs) {
+#define _XDispatchIndex(_I)                                              \
+  case _I:                                                               \
+    if constexpr (__variant_size::value > _I) {                          \
+      return __visitor(                                                  \
+          __variant::__get_alt<_I>(std::forward<_Vs>(__vs)...).__value); \
+    }                                                                    \
+    [[__fallthrough__]]
+#define _XDispatchMax 7 // Speed up compilation for the common cases
+  if constexpr (sizeof...(_Vs) == 1) {
+    if constexpr (variant_size<__remove_cvref_t<_Vs>...>::value <=
+                  _XDispatchMax) {
+      using __variant_detail::__access::__variant;
+      using __variant_size = variant_size<__remove_cvref_t<_Vs>...>;
+      const size_t __indexes[] = {__vs.index()...};
+      switch (__indexes[0]) {
+        _XDispatchIndex(_XDispatchMax - 7);
+        _XDispatchIndex(_XDispatchMax - 6);
+        _XDispatchIndex(_XDispatchMax - 5);
+        _XDispatchIndex(_XDispatchMax - 4);
+        _XDispatchIndex(_XDispatchMax - 3);
+        _XDispatchIndex(_XDispatchMax - 2);
+        _XDispatchIndex(_XDispatchMax - 1);
+        _XDispatchIndex(_XDispatchMax - 0);
+        default:
+          __throw_bad_variant_access();
+      }
+    } else {
+      static_assert(
+          variant_size<__remove_cvref_t<_Vs>...>::value > _XDispatchMax,
+          "forgot to add dispatch case");
+    }
+  } else {
+    using __variant_detail::__visitation::__variant;
+    std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
+    return __variant::__visit_value(std::forward<_Visitor>(__visitor),
+                                    std::forward<_Vs>(__vs)...);
+  }
+#undef _XDispatchMax
+#undef _XDispatchIndex
 }
 
 #    if _LIBCPP_STD_VER >= 20

@github-actions
Copy link

github-actions bot commented Oct 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

using __variant_detail::__visitation::__variant;
std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...);
# define _XDispatchIndex(_I) \
Copy link
Contributor

@frederick-vs-ja frederick-vs-ja Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that we can use the same technique for the visit<R> overload added in C++20.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I deliberately avoided doing more work to make it easier to review and get feedback first. It adds edge cases to testing so I'm not sure how folks feel about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philnik777 is this approach something you'd be okay with?

@higher-performance higher-performance marked this pull request as draft October 20, 2025 03:11
@higher-performance higher-performance force-pushed the variant-compile-speedup branch 7 times, most recently from 59fa7b6 to 350f45c Compare October 20, 2025 06:39
@higher-performance higher-performance marked this pull request as ready for review October 20, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants