-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[libcxx] Recalculate union-member sizes in basic_string #168742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fixes llvm#51158: instantiating a `basic_string` with a character type of 3, 5, or greater than 8 wouldn't work properly, because the two members of its internal union disagreed on the location of the bit flag indicating which member was live. This commit completely rewrites the calculations of how much to pad the two sub-structures so that their lengths match. Two new static_asserts check that, after all the calculation is done, the compiler agrees that the structures really did end up the same size. A new test instantiates `basic_string` with every size of character type up to 16, and four different alignment requirements. The new code now potentially has to pad _both_ sub-structures, depending on what is and is not divisible by the size of the character type. This also means that the `__padding` type must have a method to zero it out (otherwise the `__grow_by` method stops being constexpr, which an existing test depends on). I've also added some extra checks to the existing test `push_back.pass.cpp`, which _could_ have found this issue by checking that pushing huge characters on to a string updated the length in the appropriate way.
|
@llvm/pr-subscribers-libcxx Author: Simon Tatham (statham-arm) ChangesFixes #51158: instantiating a This commit completely rewrites the calculations of how much to pad the two sub-structures so that their lengths match. Two new static_asserts check that, after all the calculation is done, the compiler agrees that the structures really did end up the same size. A new test instantiates The new code now potentially has to pad both sub-structures, depending on what is and is not divisible by the size of the character type. This also means that the I've also added some extra checks to the existing test Full diff: https://github.com/llvm/llvm-project/pull/168742.diff 3 Files Affected:
diff --git a/libcxx/include/string b/libcxx/include/string
index 09fc6228c4fdb..50fcf5dc6fa44 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -719,10 +719,16 @@ struct __init_with_sentinel_tag {};
template <size_t _PaddingSize>
struct __padding {
char __padding_[_PaddingSize];
+ constexpr void clear() {
+ __padding __initialized = {0};
+ *this = __initialized;
+ }
};
template <>
-struct __padding<0> {};
+struct __padding<0> {
+ constexpr void clear() {}
+};
template <class _CharT, class _Traits, class _Allocator>
class basic_string {
@@ -819,6 +825,33 @@ private:
# ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
+ // The __long structure contains at least a `pointer` to an allocated buffer,
+ // and two `size_type` (the second one being a bitfield container). This
+ // struct, only used by `sizeof`, is the minimum that will fit those fields.
+ struct __long_min {
+ pointer __data_;
+ size_type __size_;
+ size_type __bitfield_container;
+ };
+
+ // The __short structure has a byte-sized bitfield container at the end, and
+ // the rest of it can be taken up by value_type. We want at least two
+ // value_type, or more if they will fit.
+ static constexpr size_t __fit_cap = (sizeof(__long_min) - 1) / sizeof(value_type);
+ static constexpr size_t __min_cap = __fit_cap > 2 ? __fit_cap : 2;
+
+ // Now we know how many value_type will fit, calculate how much space they
+ // take up in total; add one byte for the final bitfield container; and round
+ // up to the nearest multiple of the alignment. That's the total size of the
+ // structure.
+ static constexpr size_t __short_packed_size = sizeof(value_type) * __min_cap + 1;
+ union __union_alignment_check { __long_min __long_; value_type v; };
+ static constexpr size_t __union_alignment = _LIBCPP_ALIGNOF(__union_alignment_check);
+ static constexpr size_t __full_size = (__short_packed_size + __union_alignment - 1) & -__union_alignment;
+
+ // Now define both structures for real, with padding to ensure they are both
+ // exactly the calculated size.
+
struct __long {
__long() = default;
@@ -829,19 +862,23 @@ private:
pointer __data_;
size_type __size_;
+ _LIBCPP_NO_UNIQUE_ADDRESS __padding<__full_size - sizeof(__long_min)> __padding_;
size_type __cap_ : sizeof(size_type) * CHAR_BIT - 1;
size_type __is_long_ : 1;
};
- enum { __min_cap = (sizeof(__long) - 1) / sizeof(value_type) > 2 ? (sizeof(__long) - 1) / sizeof(value_type) : 2 };
-
struct __short {
value_type __data_[__min_cap];
- _LIBCPP_NO_UNIQUE_ADDRESS __padding<sizeof(value_type) - 1> __padding_;
+ _LIBCPP_NO_UNIQUE_ADDRESS __padding<__full_size - __short_packed_size> __padding_;
unsigned char __size_ : 7;
unsigned char __is_long_ : 1;
};
+ // Finally, check that we got it all right, and that both structures have
+ // exactly the expected size.
+ static_assert(sizeof(__long) == __full_size, "Miscalculated size for __long");
+ static_assert(sizeof(__short) == __full_size, "Miscalculated size for __short");
+
// The __endian_factor is required because the field we use to store the size
// has one fewer bit than it would if it were not a bitfield.
//
@@ -887,6 +924,13 @@ private:
};
size_type __size_;
pointer __data_;
+
+ // No padding is needed in this version of the structure, but we add a
+ // zero-sized __padding_ member anyway to match the
+ // `_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT` version, so that
+ // `__padding_.clear()` can be performed unconditionally in code common to
+ // both layouts.
+ _LIBCPP_NO_UNIQUE_ADDRESS __padding<0> __padding_;
};
enum { __min_cap = (sizeof(__long) - 1) / sizeof(value_type) > 2 ? (sizeof(__long) - 1) / sizeof(value_type) : 2 };
@@ -900,10 +944,10 @@ private:
value_type __data_[__min_cap];
};
-# endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
-
static_assert(sizeof(__short) == (sizeof(value_type) * (__min_cap + 1)), "__short has an unexpected size.");
+# endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT
+
union __rep {
__short __s;
__long __l;
@@ -2764,6 +2808,7 @@ _LIBCPP_DEPRECATED_("use __grow_by_without_replace") basic_string<_CharT, _Trait
// This is -1 to make sure the caller sets the size properly, since old versions of this function didn't set the size
// at all.
__buffer.__size_ = -1;
+ __buffer.__padding_.clear();
__reset_internal_buffer(__buffer);
}
diff --git a/libcxx/test/std/strings/basic.string/awkward-char-types.pass.cpp b/libcxx/test/std/strings/basic.string/awkward-char-types.pass.cpp
new file mode 100644
index 0000000000000..a4e258680cd2e
--- /dev/null
+++ b/libcxx/test/std/strings/basic.string/awkward-char-types.pass.cpp
@@ -0,0 +1,114 @@
+#include <stdint.h>
+#include <string>
+#include <algorithm>
+#include <cassert>
+
+#include "test_macros.h"
+
+template<typename Char>
+void test_string() {
+ // Make a test string.
+ std::basic_string<Char> s;
+ LIBCPP_ASSERT(s.size() == 0);
+
+ // Append enough chars to it that we must have switched over from a short
+ // string stored internally to a long one pointing to a dynamic buffer,
+ // causing a reallocation.
+ unsigned n = sizeof(s) / sizeof(Char) + 1;
+ for (unsigned i = 0; i < n; i++) {
+ s.push_back(Char::from_integer(i));
+ LIBCPP_ASSERT(s.size() == i + 1);
+ }
+
+ // Check that all the chars were correctly copied during the realloc.
+ for (unsigned i = 0; i < n; i++) {
+ LIBCPP_ASSERT(s[i] == Char::from_integer(i));
+ }
+}
+
+template<typename Integer, size_t N>
+struct TestChar {
+ Integer values[N];
+
+ static TestChar from_integer(unsigned index) {
+ TestChar ch;
+ for (size_t i = 0; i < N; i++)
+ ch.values[i] = index + i;
+ return ch;
+ }
+
+ bool operator==(const TestChar &other) const {
+ return 0 == memcmp(values, other.values, sizeof(values));
+ }
+ bool operator<(const TestChar &other) const {
+ return 0 < memcmp(values, other.values, sizeof(values));
+ }
+};
+
+template<typename Integer, size_t N>
+struct std::char_traits<TestChar<Integer, N>> {
+ using char_type = TestChar<Integer, N>;
+ using int_type = int;
+ using off_type = streamoff;
+ using pos_type = streampos;
+ using state_type = mbstate_t;
+
+ static TEST_CONSTEXPR_CXX20 void assign(char_type& c1, const char_type& c2) { c1 = c2; }
+ static bool eq(char_type c1, char_type c2);
+ static bool lt(char_type c1, char_type c2);
+
+ static int compare(const char_type* s1, const char_type* s2, std::size_t n);
+ static std::size_t length(const char_type* s);
+ static const char_type* find(const char_type* s, std::size_t n, const char_type& a);
+ static char_type* move(char_type* s1, const char_type* s2, std::size_t n);
+ static TEST_CONSTEXPR_CXX20 char_type* copy(char_type* s1, const char_type* s2, std::size_t n) {
+ std::copy_n(s2, n, s1);
+ return s1;
+ }
+ static TEST_CONSTEXPR_CXX20 char_type* assign(char_type* s, std::size_t n, char_type a) {
+ std::fill_n(s, n, a);
+ return s;
+ }
+
+ static int_type not_eof(int_type c);
+ static char_type to_char_type(int_type c);
+ static int_type to_int_type(char_type c);
+ static bool eq_int_type(int_type c1, int_type c2);
+ static int_type eof();
+};
+
+int main(int, char**) {
+ test_string<TestChar<uint8_t, 1>>();
+ test_string<TestChar<uint8_t, 2>>();
+ test_string<TestChar<uint8_t, 3>>();
+ test_string<TestChar<uint8_t, 4>>();
+ test_string<TestChar<uint8_t, 5>>();
+ test_string<TestChar<uint8_t, 6>>();
+ test_string<TestChar<uint8_t, 7>>();
+ test_string<TestChar<uint8_t, 8>>();
+ test_string<TestChar<uint8_t, 9>>();
+ test_string<TestChar<uint8_t, 10>>();
+ test_string<TestChar<uint8_t, 11>>();
+ test_string<TestChar<uint8_t, 12>>();
+ test_string<TestChar<uint8_t, 13>>();
+ test_string<TestChar<uint8_t, 14>>();
+ test_string<TestChar<uint8_t, 15>>();
+ test_string<TestChar<uint8_t, 16>>();
+
+ test_string<TestChar<uint16_t, 1>>();
+ test_string<TestChar<uint16_t, 2>>();
+ test_string<TestChar<uint16_t, 3>>();
+ test_string<TestChar<uint16_t, 4>>();
+ test_string<TestChar<uint16_t, 5>>();
+ test_string<TestChar<uint16_t, 6>>();
+ test_string<TestChar<uint16_t, 7>>();
+ test_string<TestChar<uint16_t, 8>>();
+
+ test_string<TestChar<uint32_t, 1>>();
+ test_string<TestChar<uint32_t, 2>>();
+ test_string<TestChar<uint32_t, 3>>();
+ test_string<TestChar<uint32_t, 4>>();
+
+ test_string<TestChar<uint64_t, 1>>();
+ test_string<TestChar<uint64_t, 2>>();
+}
diff --git a/libcxx/test/std/strings/basic.string/string.modifiers/string_append/push_back.pass.cpp b/libcxx/test/std/strings/basic.string/string.modifiers/string_append/push_back.pass.cpp
index 8c2324c9d1759..34669c28d624c 100644
--- a/libcxx/test/std/strings/basic.string/string.modifiers/string_append/push_back.pass.cpp
+++ b/libcxx/test/std/strings/basic.string/string.modifiers/string_append/push_back.pass.cpp
@@ -84,9 +84,13 @@ TEST_CONSTEXPR_CXX20 bool test() {
// https://llvm.org/PR31454
std::basic_string<VeryLarge> s;
VeryLarge vl = {};
+ LIBCPP_ASSERT(s.size() == 0);
s.push_back(vl);
+ LIBCPP_ASSERT(s.size() == 1);
s.push_back(vl);
+ LIBCPP_ASSERT(s.size() == 2);
s.push_back(vl);
+ LIBCPP_ASSERT(s.size() == 3);
}
return true;
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
| void test_string() { | ||
| // Make a test string. | ||
| std::basic_string<Char> s; | ||
| LIBCPP_ASSERT(s.size() == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not assert? I believe this is standard-guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other test I was modifying used LIBCPP_ASSERT, and I assumed that was local convention in libc++ tests. But now I look more closely, the purpose of LIBCPP_ASSERT seems to be to switch between runtime and static assertions, and this test isn't trying to do anything static at compile time.
One of the buildbot failures was objecting to a constexpr function returning void, I guess because that's from a later C++ version.
Another CI failure occurred because of compiling in C++03 mode, where 'constexpr' isn't even a keyword. Replaced with _LIBCPP_CONSTEXPR_SINCE_20 where feasible, and enum otherwise.
This one is particularly annoying because my local run of git-clang-format didn't pick it up and I had to round-trip through official CI to even find out there was a problem!
Spaced out all the >> in template suffixes so that C++03 won't parse them as a right shift operator.
This wasn't spotted until stage 3 of buildbots: having used `enum` to define all those constants, you now can't subtract one of them from another without provoking a warning about distinct integer types being used together. static_casting them all back to size_t makes the generic-abi-unstable build happy again.
Ah! Now I understand why my local run keeps missing these formatting changes. It's because by default it only does C++ formatting on files that end in `.cpp`, so it doesn't search `string` by default, which is where I keep making changes. It's not that CI is running a more up-to-date clang-format which disagrees on style. It's that it's using a longer command line which reminds it to check extensionless C++ standard headers.
The ASan failure in CI complained that a buffer had been allocated as 7 chars (in the case where sizeof(char_type)==5) and freed as only 6. That's because the test was running in a configuration with __endian_factor==2, and __align_allocation_size is supposed to always return an even number in that situation, and it wasn't. Fixed by introducing a new more careful code path for non-power-of-2 char sizes, and also adjusting the caller so that it actually remembers the value it got back from __align_allocation_size, to pass the same value to free later. (I'm also suspicious of the __allocate_at_least call, which isn't using the returned count field. That too might be odd, which would stop us from storing it in the __long size field. At the moment this isn't failing because __allocate_at_least never allocates more than it's asked to.)
Fixes #51158: instantiating a
basic_stringwith a character type of 3, 5, or greater than 8 wouldn't work properly, because the two members of its internal union disagreed on the location of the bit flag indicating which member was live.This commit completely rewrites the calculations of how much to pad the two sub-structures so that their lengths match. Two new static_asserts check that, after all the calculation is done, the compiler agrees that the structures really did end up the same size. A new test instantiates
basic_stringwith every size of character type up to 16, and four different alignment requirements.The new code now potentially has to pad both sub-structures, depending on what is and is not divisible by the size of the character type. This also means that the
__paddingtype must have a method to zero it out (otherwise the__grow_bymethod stops being constexpr, which an existing test depends on).I've also added some extra checks to the existing test
push_back.pass.cpp, which could have found this issue by checking that pushing huge characters on to a string updated the length in the appropriate way.