Skip to content

Conversation

@cjdb
Copy link
Contributor

@cjdb cjdb commented Aug 26, 2025

tl;dr We can significantly improve the runtime performance of std::vector by changing its representation from three pointers to one pointer and two integers. This document explains the details of this change, along with the justifications for making it. See the RFC for more information.

This commit changes std::vector's representation in a similar manner to __split_buffer in #139632. We introduce a layout type that tracks implementation details that differ between the pointer-based vector and size-based vector representations. vector does not parameterise its layout type, unlike __split_buffer. __split_buffer is used by both vector and deque, and being able to take an ABI break on vector doesn't automatically mean that users can also tolerate a break on deque. The most convenient way to work around this problem is to parameterise the layout type. Users of std::vector should not depend on its layout, and libc++ has no internal reason to toggle it, so we keep the representation concrete.

To use this feature, you'll need to configure libc++ so that it's built with -DLIBCXX_UNSTABLE_ABI=Yes.

@cjdb cjdb requested review from a team and JDevlieghere as code owners August 26, 2025 00:17
@cjdb cjdb added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 26, 2025
@llvmbot llvmbot added the lldb label Aug 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-libcxx

Author: Christopher Di Bella (cjdb)

Changes

tl;dr We can significantly improve the runtime performance of std::vector by changing its representation from three pointers to one pointer and two integers. This document explains the details of this change, along with the justifications for making it. See the RFC for more information.

This commit changes std::vector's representation in a similar manner to __split_buffer in #139632. We introduce a layout type that tracks implementation details that differ between the pointer-based vector and size-based vector representations. vector does not parameterise its layout type, unlike __split_buffer. __split_buffer is used by both vector and deque, and being able to take an ABI break on vector doesn't automatically mean that users can also tolerate a break on deque. The most convenient way to work around this problem is to parameterise the layout type. Users of std::vector should not depend on its layout, and libc++ has no internal reason to toggle it, so we keep the representation concrete.

To use this feature, you'll need to configure libc++ so that it's built with -DLIBCXX_UNSTABLE_ABI=Yes.


Patch is 113.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155330.diff

6 Files Affected:

  • (modified) libcxx/include/__split_buffer (+479-215)
  • (modified) libcxx/include/__vector/vector.h (+561-275)
  • (modified) libcxx/include/deque (+11-23)
  • (modified) libcxx/test/std/algorithms/alg.modifying.operations/alg.swap/ranges.swap_ranges.pass.cpp (+26-19)
  • (modified) libcxx/test/tools/clang_tidy_checks/CMakeLists.txt (+1)
  • (modified) lldb/examples/synthetic/libcxx.py (+49-12)
diff --git a/libcxx/include/__split_buffer b/libcxx/include/__split_buffer
index 21e58f4abc6b3..04bbfd6c67670 100644
--- a/libcxx/include/__split_buffer
+++ b/libcxx/include/__split_buffer
@@ -13,10 +13,12 @@
 #include <__algorithm/max.h>
 #include <__algorithm/move.h>
 #include <__algorithm/move_backward.h>
+#include <__assert>
 #include <__config>
 #include <__iterator/distance.h>
 #include <__iterator/iterator_traits.h>
 #include <__iterator/move_iterator.h>
+#include <__memory/addressof.h>
 #include <__memory/allocate_at_least.h>
 #include <__memory/allocator.h>
 #include <__memory/allocator_traits.h>
@@ -45,13 +47,149 @@ _LIBCPP_PUSH_MACROS
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-// __split_buffer allocates a contiguous chunk of memory and stores objects in the range [__begin_, __end_).
-// It has uninitialized memory in the ranges  [__first_, __begin_) and [__end_, __cap_). That allows
-// it to grow both in the front and back without having to move the data.
+template <class _SplitBuffer, class _Tp, class _Allocator>
+class __split_buffer_pointer_layout {
+protected:
+  using value_type                     = _Tp;
+  using allocator_type                 = _Allocator;
+  using __alloc_rr _LIBCPP_NODEBUG     = __libcpp_remove_reference_t<allocator_type>;
+  using __alloc_traits _LIBCPP_NODEBUG = allocator_traits<__alloc_rr>;
+  using reference                      = value_type&;
+  using const_reference                = const value_type&;
+  using size_type                      = typename __alloc_traits::size_type;
+  using difference_type                = typename __alloc_traits::difference_type;
+  using pointer                        = typename __alloc_traits::pointer;
+  using const_pointer                  = typename __alloc_traits::const_pointer;
+  using iterator                       = pointer;
+  using const_iterator                 = const_pointer;
 
-template <class _Tp, class _Allocator = allocator<_Tp> >
-struct __split_buffer {
 public:
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __split_buffer_pointer_layout() = default;
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20
+  _LIBCPP_HIDE_FROM_ABI explicit __split_buffer_pointer_layout(const allocator_type& __alloc)
+      : __alloc_(__alloc) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer __data() _NOEXCEPT { return __data_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_pointer __data() const _NOEXCEPT { return __data_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer begin() _NOEXCEPT { return __begin_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_pointer begin() const _NOEXCEPT { return __begin_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer end() _NOEXCEPT { return __end_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer end() const _NOEXCEPT { return __end_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type size() const _NOEXCEPT {
+    return static_cast<size_type>(__end_ - __begin_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI bool empty() const _NOEXCEPT { return __begin_ == __end_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type capacity() const _NOEXCEPT {
+    return static_cast<size_type>(__cap_ - __data_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI allocator_type& __get_allocator() _NOEXCEPT { return __alloc_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI allocator_type const& __get_allocator() const _NOEXCEPT {
+    return __alloc_;
+  }
+
+  // Returns the sentinel object directly. Should be used in conjunction with automatic type deduction,
+  // not explicit types.
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer __sentinel() const _NOEXCEPT { return __end_; }
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer __raw_capacity() const _NOEXCEPT { return __cap_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_data(pointer __new_first) _NOEXCEPT {
+    __data_ = __new_first;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
+  __set_valid_range(pointer __new_begin, pointer __new_end) _NOEXCEPT {
+    __begin_ = __new_begin;
+    __end_   = __new_end;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
+  __set_valid_range(pointer __new_begin, size_type __new_size) _NOEXCEPT {
+    __begin_ = __new_begin;
+    __end_   = __begin_ + __new_size;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_sentinel(pointer __new_end) _NOEXCEPT {
+    _LIBCPP_ASSERT_INTERNAL(__data_ <= __new_end, "__new_end cannot precede __data_");
+    __end_ = __new_end;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_sentinel(size_type __new_size) _NOEXCEPT {
+    __end_ = __begin_ + __new_size;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_capacity(size_type __new_capacity) _NOEXCEPT {
+    __cap_ = __data_ + __new_capacity;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_capacity(pointer __new_capacity) _NOEXCEPT {
+    __cap_ = __new_capacity;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __front_spare() const _NOEXCEPT {
+    return static_cast<size_type>(__begin_ - __data_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __back_spare() const _NOEXCEPT {
+    return static_cast<size_type>(__cap_ - __end_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI reference back() _NOEXCEPT { return *(__end_ - 1); }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_reference back() const _NOEXCEPT { return *(__end_ - 1); }
+
+  template <class _OtherLayout>
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __swap_without_allocator(_OtherLayout& __other) _NOEXCEPT {
+    std::swap(__data_, __other.__data_);
+    std::swap(__begin_, __other.__begin_);
+    std::swap(__cap_, __other.__cap_);
+    std::swap(__end_, __other.__end_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void swap(__split_buffer_pointer_layout& __other) _NOEXCEPT {
+    __swap_without_allocator(__other);
+    std::__swap_allocator(__alloc_, __other.__alloc_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __reset() _NOEXCEPT {
+    __data_  = nullptr;
+    __begin_ = nullptr;
+    __end_   = nullptr;
+    __cap_   = nullptr;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
+  __copy_without_alloc(__split_buffer_pointer_layout const& __other)
+      _NOEXCEPT_(is_nothrow_copy_assignable<pointer>::value) {
+    __data_  = __other.__data_;
+    __begin_ = __other.__begin_;
+    __cap_   = __other.__cap_;
+    __end_   = __other.__end_;
+  }
+
+private:
+  pointer __data_  = nullptr;
+  pointer __begin_ = nullptr;
+  pointer __end_   = nullptr;
+  _LIBCPP_COMPRESSED_PAIR(pointer, __cap_ = nullptr, allocator_type, __alloc_);
+
+  template <class, class, class>
+  friend class __split_buffer_pointer_layout;
+};
+
+template <class _SplitBuffer, class _Tp, class _Allocator>
+class __split_buffer_size_layout {
+protected:
   using value_type                     = _Tp;
   using allocator_type                 = _Allocator;
   using __alloc_rr _LIBCPP_NODEBUG     = __libcpp_remove_reference_t<allocator_type>;
@@ -65,6 +203,170 @@ public:
   using iterator                       = pointer;
   using const_iterator                 = const_pointer;
 
+public:
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __split_buffer_size_layout() = default;
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __split_buffer_size_layout(const allocator_type& __alloc)
+      : __alloc_(__alloc) {}
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer __data() _NOEXCEPT { return __data_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_pointer __data() const _NOEXCEPT { return __data_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer begin() _NOEXCEPT { return __begin_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_pointer begin() const _NOEXCEPT { return __begin_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer end() _NOEXCEPT { return __begin_ + __size_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI pointer end() const _NOEXCEPT { return __begin_ + __size_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type size() const _NOEXCEPT { return __size_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI bool empty() const _NOEXCEPT { return __size_ == 0; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type capacity() const _NOEXCEPT { return __cap_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI allocator_type& __get_allocator() _NOEXCEPT { return __alloc_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI allocator_type const& __get_allocator() const _NOEXCEPT {
+    return __alloc_;
+  }
+
+  // Returns the sentinel object directly. Should be used in conjunction with automatic type deduction,
+  // not explicit types.
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __sentinel() const _NOEXCEPT { return __size_; }
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __raw_capacity() const _NOEXCEPT { return __cap_; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_data(pointer __new_first) _NOEXCEPT {
+    __data_ = __new_first;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
+  __set_valid_range(pointer __new_begin, pointer __new_end) _NOEXCEPT {
+    // Size-based __split_buffers track their size directly: we need to explicitly update the size
+    // when the front is adjusted.
+    __size_ -= __new_begin - __begin_;
+    __begin_ = __new_begin;
+    __set_sentinel(__new_end);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
+  __set_valid_range(pointer __new_begin, size_type __new_size) _NOEXCEPT {
+    // Size-based __split_buffers track their size directly: we need to explicitly update the size
+    // when the front is adjusted.
+    __size_ -= __new_begin - __begin_;
+    __begin_ = __new_begin;
+    __set_sentinel(__new_size);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_sentinel(pointer __new_end) _NOEXCEPT {
+    _LIBCPP_ASSERT_INTERNAL(__data_ <= __new_end, "__new_end cannot precede __data_");
+    __size_ += __new_end - end();
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_sentinel(size_type __new_size) _NOEXCEPT {
+    __size_ = __new_size;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_capacity(size_type __new_capacity) _NOEXCEPT {
+    __cap_ = __new_capacity;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_capacity(pointer __new_capacity) _NOEXCEPT {
+    __cap_ = __new_capacity - __begin_;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __front_spare() const _NOEXCEPT {
+    return static_cast<size_type>(__begin_ - __data_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __back_spare() const _NOEXCEPT {
+    // `__cap_ - __end_` tells us the total number of spares when in size-mode. We need to remove
+    // the __front_spare from the count.
+    return __cap_ - __size_ - __front_spare();
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI reference back() _NOEXCEPT { return __begin_[__size_ - 1]; }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_reference back() const _NOEXCEPT {
+    return __begin_[__size_ - 1];
+  }
+
+  template <class _Data2>
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __swap_without_allocator(_Data2& __other) _NOEXCEPT {
+    std::swap(__data_, __other.__data_);
+    std::swap(__begin_, __other.__begin_);
+    std::swap(__cap_, __other.__cap_);
+    std::swap(__size_, __other.__size_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void swap(__split_buffer_size_layout& __other) _NOEXCEPT {
+    __swap_without_allocator(__other);
+    std::__swap_allocator(__alloc_, __other.__alloc_);
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __reset() _NOEXCEPT {
+    __data_  = nullptr;
+    __begin_ = nullptr;
+    __size_  = 0;
+    __cap_   = 0;
+  }
+
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
+  __copy_without_alloc(__split_buffer_size_layout const& __other)
+      _NOEXCEPT_(is_nothrow_copy_assignable<pointer>::value) {
+    __data_  = __other.__data_;
+    __begin_ = __other.__begin_;
+    __cap_   = __other.__cap_;
+    __size_  = __other.__size_;
+  }
+
+private:
+  pointer __data_   = nullptr;
+  pointer __begin_  = nullptr;
+  size_type __size_ = 0;
+  size_type __cap_  = 0;
+  _LIBCPP_NO_UNIQUE_ADDRESS allocator_type __alloc_;
+
+  template <class, class, class>
+  friend class __split_buffer_size_layout;
+};
+
+// __split_buffer allocates a contiguous chunk of memory and stores objects in the range [__begin_, __end_).
+// It has uninitialized memory in the ranges  [__data_, __begin_) and [__end_, __cap_). That allows
+// it to grow both in the front and back without having to move the data.
+
+template <class _Tp,
+          class _Allocator                             = allocator<_Tp>,
+          template <class, class, class> class _Layout = __split_buffer_pointer_layout>
+struct __split_buffer : _Layout<__split_buffer<_Tp, _Allocator, _Layout>, _Tp, _Allocator> {
+  using __base_type _LIBCPP_NODEBUG = _Layout<__split_buffer<_Tp, _Allocator, _Layout>, _Tp, _Allocator>;
+  using __base_type::__back_spare;
+  using __base_type::__copy_without_alloc;
+  using __base_type::__data;
+  using __base_type::__front_spare;
+  using __base_type::__get_allocator;
+  using __base_type::__reset;
+  using __base_type::__set_capacity;
+  using __base_type::__set_data;
+  using __base_type::__set_sentinel;
+  using __base_type::__set_valid_range;
+  using __base_type::__swap_without_allocator;
+
+  using typename __base_type::__alloc_rr;
+  using typename __base_type::__alloc_traits;
+  using typename __base_type::allocator_type;
+  using typename __base_type::const_iterator;
+  using typename __base_type::const_pointer;
+  using typename __base_type::const_reference;
+  using typename __base_type::difference_type;
+  using typename __base_type::iterator;
+  using typename __base_type::pointer;
+  using typename __base_type::reference;
+  using typename __base_type::size_type;
+  using typename __base_type::value_type;
+
   // A __split_buffer contains the following members which may be trivially relocatable:
   // - pointer: may be trivially relocatable, so it's checked
   // - allocator_type: may be trivially relocatable, so it's checked
@@ -78,23 +380,15 @@ public:
                       __split_buffer,
                       void>;
 
-  pointer __first_;
-  pointer __begin_;
-  pointer __end_;
-  _LIBCPP_COMPRESSED_PAIR(pointer, __cap_, allocator_type, __alloc_);
-
   __split_buffer(const __split_buffer&)            = delete;
   __split_buffer& operator=(const __split_buffer&) = delete;
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __split_buffer()
-      _NOEXCEPT_(is_nothrow_default_constructible<allocator_type>::value)
-      : __first_(nullptr), __begin_(nullptr), __end_(nullptr), __cap_(nullptr) {}
+  _LIBCPP_HIDE_FROM_ABI __split_buffer() = default;
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __split_buffer(__alloc_rr& __a)
-      : __first_(nullptr), __begin_(nullptr), __end_(nullptr), __cap_(nullptr), __alloc_(__a) {}
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __split_buffer(__alloc_rr& __a) : __base_type(__a) {}
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI explicit __split_buffer(const __alloc_rr& __a)
-      : __first_(nullptr), __begin_(nullptr), __end_(nullptr), __cap_(nullptr), __alloc_(__a) {}
+      : __base_type(__a) {}
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI
   __split_buffer(size_type __cap, size_type __start, __alloc_rr& __a);
@@ -111,36 +405,16 @@ public:
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI ~__split_buffer();
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI iterator begin() _NOEXCEPT { return __begin_; }
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_iterator begin() const _NOEXCEPT { return __begin_; }
-
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI iterator end() _NOEXCEPT { return __end_; }
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_iterator end() const _NOEXCEPT { return __end_; }
-
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void clear() _NOEXCEPT { __destruct_at_end(__begin_); }
-
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type size() const {
-    return static_cast<size_type>(__end_ - __begin_);
-  }
+  using __base_type::back;
+  using __base_type::begin;
+  using __base_type::capacity;
+  using __base_type::empty;
+  using __base_type::end;
+  using __base_type::size;
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI bool empty() const { return __end_ == __begin_; }
-
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type capacity() const {
-    return static_cast<size_type>(__cap_ - __first_);
-  }
-
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __front_spare() const {
-    return static_cast<size_type>(__begin_ - __first_);
-  }
-
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __back_spare() const {
-    return static_cast<size_type>(__cap_ - __end_);
-  }
-
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI reference front() { return *__begin_; }
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_reference front() const { return *__begin_; }
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI reference back() { return *(__end_ - 1); }
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_reference back() const { return *(__end_ - 1); }
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void clear() _NOEXCEPT { __destruct_at_end(begin()); }
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI reference front() { return *begin(); }
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI const_reference front() const { return *begin(); }
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void shrink_to_fit() _NOEXCEPT;
 
@@ -149,8 +423,8 @@ public:
   template <class... _Args>
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void emplace_back(_Args&&... __args);
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void pop_front() { __destruct_at_begin(__begin_ + 1); }
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void pop_back() { __destruct_at_end(__end_ - 1); }
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void pop_front() { __destruct_at_begin(begin() + 1); }
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void pop_back() { __destruct_at_end(end() - 1); }
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __construct_at_end(size_type __n);
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __construct_at_end(size_type __n, const_reference __x);
@@ -184,242 +458,232 @@ public:
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void swap(__split_buffer& __x)
       _NOEXCEPT_(!__alloc_traits::propagate_on_container_swap::value || __is_nothrow_swappable_v<__alloc_rr>);
 
-  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI bool __invariants() const;
+  _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI bool __invariants() const {
+    if (__data() == nullptr) {
+      if (begin() != nullptr)
+        return false;
+
+      if (!empty())
+        return false;
+
+      if (capacity() != 0)
+        return false;
+
+      return true;
+    } else {
+      if (begin() < __data())
+        return false;
+
+      if (capacity() < size())
+        return false;
+
+      if (end() < begin())
+        return false;
+
+      return true;
+    }
+  }
 
 private:
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __move_assign_alloc(__split_buffer& __c, true_type)
       _NOEXCEPT_(is_nothrow_move_assignable<allocator_type>::value) {
-    __alloc_ = std::move(__c.__alloc_);
+    __get_allocator() = std::move(__c.__get_allocator());
   }
 
   _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __move_assign...
[truncated]

@cjdb cjdb force-pushed the size-based-vector-part2 branch from 877d560 to ccc8d09 Compare August 26, 2025 00:18
@github-actions
Copy link

github-actions bot commented Aug 26, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@cjdb
Copy link
Contributor Author

cjdb commented Aug 26, 2025

The LLDB and GDB pretty-printers will require some tweaking again, but those changes haven't been applied yet.

@cjdb
Copy link
Contributor Author

cjdb commented Aug 27, 2025

After having attempted to build all of our code with this change, I've observed a few things. None of them are really surprising, but putting some countermeasures in place to nudge users in the right direction would be worthwhile.

  1. Users take the address of pointer-to-member functions for things in the standard library. I've reached out to Aaron to see if it's worth adding warning for detecting this.
  2. Users do have stray calls to things like v.size(), v.begin(), and v.end(). Ensuring that we liberally apply [[nodiscard]] may result in breakage, but definitely catches things that are unintended.
    a. The stray calls to v.end() seem to be a case of adapting v.erase(std::remove(v.begin(), v.end(), value), v.end()) to std::erase(v, value), v.end(). Based on the formatting, it wasn't clear that the , v.end() was out of place.
    b. v.size() was in if (v.size(), 0). That should've been a v.empty().
    c. v.begin(); was a complete statement.

@cjdb
Copy link
Contributor Author

cjdb commented Sep 12, 2025

Ping

@cjdb cjdb force-pushed the size-based-vector-part2 branch 4 times, most recently from d2b9ec3 to 1e7c6fe Compare September 18, 2025 20:19
@cjdb
Copy link
Contributor Author

cjdb commented Oct 20, 2025

@ldionne Would you mind reviewing this, please?

@cjdb cjdb requested a review from ldionne October 20, 2025 17:09
@cjdb cjdb requested review from jyknight and rnk November 3, 2025 19:42
Comment on lines 430 to 433
using __base::back;
using __base::capacity;
using __base::empty;
using __base::size;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although users aren't allowed to take the address of member functions from standard library types, there seems to be an endless supply of build breaks when I do this. The problem is that &std::vector<T>::size no longer exists, since the function is a member of __vector_layout.

Due to the overwhelming number of build breaks, I'm going to defer doing this to a later date, when a library-wide policy can be enacted. I'll update the PR to have std::vector::size call __vector_layout::size, etc. in a little bit, so that this PR isn't blamed for breaking ~everyone.

@cjdb cjdb force-pushed the size-based-vector-part2 branch 2 times, most recently from c15c6d7 to f314efd Compare November 11, 2025 19:46
@cjdb
Copy link
Contributor Author

cjdb commented Nov 11, 2025

@jyknight @rnk, the libc++ maintainers have said that they're in favour of the overall direction of size-based vector, but it looks like they don't have the bandwidth to review this PR. Would you mind assisting with the review process so that we can unblock the merging of this patch, please?

Other std::vector PRs are being merged, which makes it difficult to maintain this patch, so I'd really like to see #155330 merged ASAP.

@cjdb cjdb force-pushed the size-based-vector-part2 branch from f314efd to c90af22 Compare November 14, 2025 23:27
**tl;dr** We can significantly improve the runtime performance of
`std::vector` by changing its representation from three pointers to one
pointer and two integers. This document explains the details of this
change, along with the justifications for making it. See the [RFC] for
more information.

This commit changes `std::vector`'s representation in a similar manner
to `__split_buffer` in llvm#139632. We introduce a layout type that tracks
implementation details that differ between the pointer-based vector
and size-based vector representations. `vector` does not parameterise
its layout type, unlike `__split_buffer`. `__split_buffer` is used by
both `vector` and `deque`, and being able to take an ABI break on
`vector` doesn't automatically mean that users can also tolerate a break
on `deque`. The most convenient way to work around this problem is to
parameterise the layout type. Users of `std::vector` should not depend
on its layout, and libc++ has no internal reason to toggle it, so we
keep the representation concrete.

To use this feature, you'll need to configure libc++ so that it's built
with `-DLIBCXX_UNSTABLE_ABI=Yes`.

[RFC]: https://discourse.llvm.org/t/adding-a-size-based-vector-to-libc-s-unstable-abi/86306
@cjdb
Copy link
Contributor Author

cjdb commented Dec 9, 2025

@ldionne @jyknight @rnk @philnik777 @mordante Ping. This PR has been open for 3.5 months now: would it be too much trouble to get some feedback in the next day or so, please?

Copy link
Member

@jyknight jyknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a first pass over this, and have some design questions and comments.

Since I wasn't involved in previous discussions nor have I reviewed much libc++ code in the past, I apologize if my comments end up contradicting commonly accepted libc++ practices, or retreading already discussed issues.


_LIBCPP_BEGIN_NAMESPACE_STD

#ifdef _LIBCPP_ABI_SIZE_BASED_VECTOR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say why you created a separate __vector_layout type, vs ifdefs in the body of class vector? It feels like this factoring triggers the need for a LOT more modifications in this file than would otherwise be necessary. And I'm not sure what actual advantage is seen from having it.

For example, none of the diffs relating to __alloc_ and __begin_ would be here, if not for the addition of the __vector_layout type.

I think it certainly makes sense to group the ifdef-conditionalized code together by introducing internal helper functions -- as you have done. But that same effect can still be had without also introducing a new base class type.

On a trivial note, I'm not sure what the usual policy in libc++ is on ifdef ordering, but I was initially confused that the "old" layout was not textually the first ifdef'd half.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say why you created a separate __vector_layout type, vs ifdefs in the body of class vector? It feels like this factoring triggers the need for a LOT more modifications in this file than would otherwise be necessary. And I'm not sure what actual advantage is seen from having it.

For example, none of the diffs relating to __alloc_ and __begin_ would be here, if not for the addition of the __vector_layout type.

I think it certainly makes sense to group the ifdef-conditionalized code together by introducing internal helper functions -- as you have done. But that same effect can still be had without also introducing a new base class type.

Good question! The goal of __vector_layout was derived from __split_buffer_(pointer|size)_layout, which needs to be its own class because vector and deque require different layouts (there's currently no evidence that changing deque's ABI to use __split_buffer_pointer_layout is beneficial, and it caused problems somewhere). We decided that it would be best from a software engineering standpoint to have a little redundancy in having a separate class, to keep __split_buffer's logic as easily readable as possible (the trade-off was a more difficult PR diff, but substantially less context switching in the logic, and complete encapsulation). The changes that I made to vector rhyme very heavily with the changes made in #139632, so I figured it would be best to keep the design as similar as possible.

That said, I'm interested to explore your suggestion for two reasons:

  1. You think it might reduce the diff, and if you're right, it'll probably help speed up reviewing this patch.
  2. It might help with the debugger pretty-printers, something I'm still working on.

On a trivial note, I'm not sure what the usual policy in libc++ is on ifdef ordering, but I was initially confused that the "old" layout was not textually the first ifdef'd half.

Interestingly, you're the first person to have brought this up (I expected it to have been flagged in the first patch). My rationale is that libc++ might want to support other layouts, and that ifdeffing is easier when done in a positive form, than a negative form.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also going to question the need for a helper type, but I was coming at it from a debug info size perspective. If eliminating the __compressed_pair instantiation was valuable, eliminating __vector_layout_type might also be worth doing, if it's not providing valuable functionality or abstraction.

self.iterator = self._VectorBoolIterator(begin, self.length, bits_per_word)
else:
end = self.val["__end_"]
for i in val.type.fields():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is presumably necessary only because of the refactoring into the __vector_layout base class, and could be reverted if that was.

Also doesn't look like it has been updated to work with the new size-based ABI.

test_vector_bool<64>();
test_vector_bool<199>();
test_vector_bool<256>();
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's just moving code around without making any changes? Can this file be reverted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were compile-time failures due to compiler limits on constexpr stack depths. We have two alternatives here:

  1. Increase the limit Clang supports, which is a single-line change at the top of the file. Pro: single-line diff. Con: might need to increase the limit every so often.
  2. Refactor the code so that it doesn't require that limit in the first place. Pro: won't run into this problem ever again. Con: this diff.

I could always split the refactoring into another PR, if it's too noisy for this one. That would also add documentation on why the change was necessary, which is probably a good enough reason for me to do it anyway.


_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
__set_valid_range(pointer __new_begin, pointer __new_end) _NOEXCEPT {
// Size-based __split_buffers track their size directly: we need to explicitly update the size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment says you need to update the size, but said update was deleted by the PR?

__split_buffer_pointer_layout<__split_buffer<value_type, __alloc_rr&, __split_buffer_pointer_layout>,
value_type,
__alloc_rr&>& __other) _NOEXCEPT {
__split_buffer_size_layout<__split_buffer<value_type, __alloc_rr&, __split_buffer_size_layout>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can be a separate commit; I'm presuming this is simply a bugfix to the previous change adding size-based __split_buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was a bug-fix. We don't have a way to test that __split_buffer works with __split_buffer_size_layout without size-based vector (which is its only driver), so I'm kind of ambivalent on whether this happens in lock-step with vector or if it happens in a separate PR.

Again, the benefit of another commit is that it adds documentation, so I'm partial on that front.

Comment on lines +1729 to +1730
template <class _Allocator>
inline constexpr bool __format::__enable_insertable<vector<char, _Allocator>> = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated bugfix?

return __begin_;
}

[[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI size_type __raw_sentinel() const _NOEXCEPT {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this use of the word "sentinel", as meaning "one of end or size") to be confusing. Usually I think of sentinel as "in-band data indicating you've reached the end", not "ambiguously either a pointer to the end or offset of the end".

I also see this is the term which was already merged for the split_buffer patch, so maybe it's fine to leave alone.

I don't have an ideal alternative suggestion, but IMO it may be better to just use an somewhat awkward name such as __end_ptr_or_size, instead of the concise-but-confusing __raw_sentinel. And perhaps the same thing for __raw_capacity, to use __capacity_ptr_or_size.

return __cap_ - __size_;
}

[[__nodiscard__]] _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI bool __is_full() const _NOEXCEPT {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, also can be written outside the ifdef.

__size_ = __size;
}

_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void __set_sentinel(size_type __size) _NOEXCEPT {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels odd to me to have two __set_sentinel overloads, instead of e.g. __set_end_size(size_type) and __set_end_ptr(pointer), paired with a __set_sentinel (or, if renamed per comment above, __set_end_ptr_or_size function which only accepts the type which __raw_sentinel returns in the current #ifdef branch.

That is, in any code which is always passing a pointer or size could call the appropriate named setter directly; only code generically dealing with the unknown-typed sentinel uses set_sentinel.

And same comment for __set_capacity below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. lldb

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants