Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,22 @@ C++ Specific Potentially Breaking Changes
template <typename T>
void f();

- During constant evaluation, comparisons between different evaluations of the
same string literal are now correctly treated as non-constant, and comparisons
between string literals that cannot possibly overlap in memory are now treated
as constant.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mention the CWG issue here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


.. code-block:: c++

constexpr const char *f() { return "hello"; }
constexpr const char *g() { return "world"; }
// Used to evaluate to false, now error: non-constant comparison.
constexpr bool a = f() == f();
// Might evaluate to true or false, as before.
bool at_runtime() { return f() == f(); }
// Was error, now evaluates to false.
constexpr bool b = f() == g();

ABI Changes in This Version
---------------------------

Expand Down
11 changes: 11 additions & 0 deletions clang/include/clang/AST/ASTContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,13 @@ class ASTContext : public RefCountedBase<ASTContext> {
/// This is lazily created. This is intentionally not serialized.
mutable llvm::StringMap<StringLiteral *> StringLiteralCache;

/// The next string literal "version" to allocate during constant evaluation.
/// This is used to distinguish between repeated evaluations of the same
/// string literal.
///
/// TODO: Ensure version numbers don't collide when deserialized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test case involving modules to make sure that behavior is reasonable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems troublesome. The modules case here is somewhat concerning to me. I wonder if we could figure out some way to encode/serialize the "how we got here" via source information to provide the number? I realize I'm being a little hand-wavy, but perhaps back to the 'line + col that caused this evaluation'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not too hard to contrive a situation where there's an observable misbehavior. For example:

export module A;
export constexpr const char *f() { return "hello"; }
export module B;
import A;
export constexpr const char *p = f();
export module C;
import A;
export constexpr const char *q = f();
import B;
import C;
// Should be non-constant, but these both have version 0 and
// the same `StringLiteral`, so presumably passes.
static_assert(p == q);
// Might return false at runtime in practice.
bool g() { return p == q; }

It's a somewhat low-probability misbehavior at least -- you need the version numbers to collide across modules. (They can also collide with this patch after unsigned overflow, but I don't think we need to worry about that.)

Maybe this has too high a risk of collisions still, but we could initialize this counter to a hash of the current module name instead of 0. :)

If we want to do this properly, we could track the greatest version used in each module, and offset the versions in each module by the sum of the versions of earlier-loaded modules (and likewise offset the version for the current file by the sum across all modules). Alternatively we could try to recompute versions at import time, building a map of {module ID, imported version} -> new version. But it's not quite that simple, because given

export module A;
export constexpr const char *p = "foo";
export module B;
import A;
export constexpr const char *q = p;

we need the constant value of p imported from A to be the same as the constant value of q imported from B.

... So in the former case we would need to do proper remapping not just adding an offset, and in the latter case we'd need to track which module ID the imported version originally came from. Both options seem expensive.

Maybe we can think of a better way? (Initializing the counter to a hash is sounding a bit better now, huh?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can think of a better way? (Initializing the counter to a hash is sounding a bit better now, huh?)

Thats probably at least as good as my line/column/file info suggestion, but based on the above/thinking further, I think anything besides "start at the same number for all" is perfectly acceptable.

unsigned NextStringLiteralVersion = 0;

/// MD5 hash of CUID. It is calculated when first used and cached by this
/// data member.
mutable std::string CUIDHash;
Expand Down Expand Up @@ -3278,6 +3285,10 @@ class ASTContext : public RefCountedBase<ASTContext> {
/// PredefinedExpr to cache evaluated results.
StringLiteral *getPredefinedStringLiteralFromCache(StringRef Key) const;

/// Return the next version number to be used for a string literal evaluated
/// as part of constant evaluation.
unsigned getNextStringLiteralVersion() { return NextStringLiteralVersion++; }

/// Return a declaration for the global GUID object representing the given
/// GUID value.
MSGuidDecl *getMSGuidDecl(MSGuidDeclParts Parts) const;
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Basic/DiagnosticASTKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ def note_constexpr_pointer_constant_comparison : Note<
"at runtime">;
def note_constexpr_literal_comparison : Note<
"comparison of addresses of literals has unspecified value">;
def note_constexpr_opaque_call_comparison : Note<
"comparison against opaque constant address '%0' can only be performed at "
"runtime">;
def note_constexpr_pointer_weak_comparison : Note<
"comparison against address of weak declaration '%0' can only be performed "
"at runtime">;
Expand Down
130 changes: 115 additions & 15 deletions clang/lib/AST/ExprConstant.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,10 @@
#include "clang/Basic/DiagnosticSema.h"
#include "clang/Basic/TargetInfo.h"
#include "llvm/ADT/APFixedPoint.h"
#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/SaveAndRestore.h"
#include "llvm/Support/SipHash.h"
Expand Down Expand Up @@ -2061,15 +2063,21 @@ static bool EvaluateIgnoredValue(EvalInfo &Info, const Expr *E) {
return true;
}

/// Should this call expression be treated as a no-op?
static bool IsNoOpCall(const CallExpr *E) {
/// Should this call expression be treated as forming an opaque constant?
static bool IsOpaqueConstantCall(const CallExpr *E) {
unsigned Builtin = E->getBuiltinCallee();
return (Builtin == Builtin::BI__builtin___CFStringMakeConstantString ||
Builtin == Builtin::BI__builtin___NSStringMakeConstantString ||
Builtin == Builtin::BI__builtin_ptrauth_sign_constant ||
Comment on lines 2069 to 2071
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get test coverage for these cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Builtin == Builtin::BI__builtin_function_start);
}

static bool IsOpaqueConstantCall(const LValue &LVal) {
const auto *BaseExpr =
llvm::dyn_cast_if_present<CallExpr>(LVal.Base.dyn_cast<const Expr *>());
return BaseExpr && IsOpaqueConstantCall(BaseExpr);
}

static bool IsGlobalLValue(APValue::LValueBase B) {
// C++11 [expr.const]p3 An address constant expression is a prvalue core
// constant expression of pointer type that evaluates to...
Expand Down Expand Up @@ -2115,7 +2123,7 @@ static bool IsGlobalLValue(APValue::LValueBase B) {
case Expr::ObjCBoxedExprClass:
return cast<ObjCBoxedExpr>(E)->isExpressibleAsConstantInitializer();
case Expr::CallExprClass:
return IsNoOpCall(cast<CallExpr>(E));
return IsOpaqueConstantCall(cast<CallExpr>(E));
// For GCC compatibility, &&label has static storage duration.
case Expr::AddrLabelExprClass:
return true;
Expand All @@ -2142,11 +2150,91 @@ static const ValueDecl *GetLValueBaseDecl(const LValue &LVal) {
return LVal.Base.dyn_cast<const ValueDecl*>();
}

static bool IsLiteralLValue(const LValue &Value) {
if (Value.getLValueCallIndex())
// Information about an LValueBase that is some kind of string.
struct LValueBaseString {
std::string ObjCEncodeStorage;
StringRef Bytes;
int CharWidth;
};

// Gets the lvalue base of LVal as a string.
static bool GetLValueBaseAsString(const EvalInfo &Info, const LValue &LVal,
LValueBaseString &AsString) {
const auto *BaseExpr = LVal.Base.dyn_cast<const Expr *>();
if (!BaseExpr)
return false;

// For ObjCEncodeExpr, we need to compute and store the string.
if (const auto *EE = dyn_cast<ObjCEncodeExpr>(BaseExpr)) {
Info.Ctx.getObjCEncodingForType(EE->getEncodedType(),
AsString.ObjCEncodeStorage);
AsString.Bytes = AsString.ObjCEncodeStorage;
AsString.CharWidth = 1;
return true;
}

// Otherwise, we have a StringLiteral.
const auto *Lit = dyn_cast<StringLiteral>(BaseExpr);
if (const auto *PE = dyn_cast<PredefinedExpr>(BaseExpr))
Lit = PE->getFunctionName();

if (!Lit)
return false;
const Expr *E = Value.Base.dyn_cast<const Expr*>();
return E && !isa<MaterializeTemporaryExpr>(E);

AsString.Bytes = Lit->getBytes();
AsString.CharWidth = Lit->getCharByteWidth();
return true;
}

// Determine whether two string literals potentially overlap. This will be the
// case if they agree on the values of all the bytes on the overlapping region
// between them.
//
// The overlapping region is the portion of the two string literals that must
// overlap in memory if the pointers actually point to the same address at
// runtime. For example, if LHS is "abcdef" + 3 and RHS is "cdef\0gh" + 1 then
// the overlapping region is "cdef\0", which in this case does agree, so the
// strings are potentially overlapping. Conversely, for "foobar" + 3 versus
// "bazbar" + 3, the overlapping region contains all of both strings, so they
// are not potentially overlapping, even though they agree from the given
// addresses onwards.
//
// See open core issue CWG2765 which is discussing the desired rule here.
static bool ArePotentiallyOverlappingStringLiterals(const EvalInfo &Info,
const LValue &LHS,
const LValue &RHS) {
LValueBaseString LHSString, RHSString;
if (!GetLValueBaseAsString(Info, LHS, LHSString) ||
!GetLValueBaseAsString(Info, RHS, RHSString))
return false;

// This is the byte offset to the location of the first character of LHS
// within RHS. We don't need to look at the characters of one string that
// would appear before the start of the other string if they were merged.
CharUnits Offset = RHS.Offset - LHS.Offset;
if (Offset.isNegative())
LHSString.Bytes = LHSString.Bytes.drop_front(-Offset.getQuantity());
else
RHSString.Bytes = RHSString.Bytes.drop_front(Offset.getQuantity());

bool LHSIsLonger = LHSString.Bytes.size() > RHSString.Bytes.size();
StringRef Longer = LHSIsLonger ? LHSString.Bytes : RHSString.Bytes;
StringRef Shorter = LHSIsLonger ? RHSString.Bytes : LHSString.Bytes;
int ShorterCharWidth = (LHSIsLonger ? RHSString : LHSString).CharWidth;

// The null terminator isn't included in the string data, so check for it
// manually. If the longer string doesn't have a null terminator where the
// shorter string ends, they aren't potentially overlapping.
for (int nullByte : llvm::seq(ShorterCharWidth)) {
if (Shorter.size() + nullByte >= Longer.size())
break;
if (Longer[Shorter.size() + nullByte])
return false;
}

// Otherwise, they're potentially overlapping if and only if the overlapping
// region is the same.
return Shorter == Longer.take_front(Shorter.size());
}

static bool IsWeakLValue(const LValue &Value) {
Expand Down Expand Up @@ -8573,7 +8661,10 @@ class LValueExprEvaluator
bool VisitMaterializeTemporaryExpr(const MaterializeTemporaryExpr *E);
bool VisitCompoundLiteralExpr(const CompoundLiteralExpr *E);
bool VisitMemberExpr(const MemberExpr *E);
bool VisitStringLiteral(const StringLiteral *E) { return Success(E); }
bool VisitStringLiteral(const StringLiteral *E) {
uint64_t Version = Info.getASTContext().getNextStringLiteralVersion();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNextStringLiteralVersion() returns unsigned, not uint64_t?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. (Sorry, just some junk left behind from a previous approach.)

return Success(APValue::LValueBase(E, 0, static_cast<unsigned>(Version)));
}
bool VisitObjCEncodeExpr(const ObjCEncodeExpr *E) { return Success(E); }
bool VisitCXXTypeidExpr(const CXXTypeidExpr *E);
bool VisitCXXUuidofExpr(const CXXUuidofExpr *E);
Expand Down Expand Up @@ -9639,7 +9730,7 @@ static bool isOneByteCharacterType(QualType T) {

bool PointerExprEvaluator::VisitBuiltinCallExpr(const CallExpr *E,
unsigned BuiltinOp) {
if (IsNoOpCall(E))
if (IsOpaqueConstantCall(E))
return Success(E);

switch (BuiltinOp) {
Expand Down Expand Up @@ -13889,13 +13980,22 @@ EvaluateComparisonBinaryOperator(EvalInfo &Info, const BinaryOperator *E,
(!RHSValue.Base && !RHSValue.Offset.isZero()))
return DiagComparison(diag::note_constexpr_pointer_constant_comparison,
!RHSValue.Base);
// It's implementation-defined whether distinct literals will have
// distinct addresses. In clang, the result of such a comparison is
// unspecified, so it is not a constant expression. However, we do know
// that the address of a literal will be non-null.
if ((IsLiteralLValue(LHSValue) || IsLiteralLValue(RHSValue)) &&
LHSValue.Base && RHSValue.Base)
// C++2c [intro.object]/10:
// Two objects [...] may have the same address if [...] they are both
// potentially non-unique objects.
// C++2c [intro.object]/9:
// An object is potentially non-unique if it is a string literal object,
// the backing array of an initializer list, or a subobject thereof.
//
// This makes the comparison result unspecified, so it's not a constant
// expression.
//
// TODO: Do we need to handle the initializer list case here?
if (ArePotentiallyOverlappingStringLiterals(Info, LHSValue, RHSValue))
return DiagComparison(diag::note_constexpr_literal_comparison);
if (IsOpaqueConstantCall(LHSValue) || IsOpaqueConstantCall(RHSValue))
return DiagComparison(diag::note_constexpr_opaque_call_comparison,
!IsOpaqueConstantCall(LHSValue));
// We can't tell whether weak symbols will end up pointing to the same
// object.
if (IsWeakLValue(LHSValue) || IsWeakLValue(RHSValue))
Expand Down
3 changes: 2 additions & 1 deletion clang/test/AST/ByteCode/builtin-functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -966,7 +966,8 @@ namespace shufflevector {
namespace FunctionStart {
void a(void) {}
static_assert(__builtin_function_start(a) == a, ""); // both-error {{not an integral constant expression}} \
// both-note {{comparison of addresses of literals has unspecified value}}
// ref-note {{comparison against opaque constant address '&__builtin_function_start(a)'}} \
// expected-note {{comparison of addresses of literals has unspecified value}}
}

namespace BuiltinInImplicitCtor {
Expand Down
20 changes: 7 additions & 13 deletions clang/test/AST/ByteCode/cxx20.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ constexpr int f() {
static_assert(f());
#endif

/// Distinct literals have disctinct addresses.
/// Distinct literals have distinct addresses.
/// see https://github.com/llvm/llvm-project/issues/58754
constexpr auto foo(const char *p) { return p; }
constexpr auto p1 = "test1";
Expand All @@ -108,22 +108,16 @@ constexpr auto p2 = "test2";
constexpr bool b1 = foo(p1) == foo(p1);
static_assert(b1);

constexpr bool b2 = foo(p1) == foo(p2); // ref-error {{must be initialized by a constant expression}} \
// ref-note {{comparison of addresses of literals}} \
// ref-note {{declared here}}
static_assert(!b2); // ref-error {{not an integral constant expression}} \
// ref-note {{not a constant expression}}
constexpr bool b2 = foo(p1) == foo(p2);
static_assert(!b2);

constexpr auto name1() { return "name1"; }
constexpr auto name2() { return "name2"; }

constexpr auto b3 = name1() == name1();
static_assert(b3);
constexpr auto b4 = name1() == name2(); // ref-error {{must be initialized by a constant expression}} \
// ref-note {{has unspecified value}} \
// ref-note {{declared here}}
static_assert(!b4); // ref-error {{not an integral constant expression}} \
// ref-note {{not a constant expression}}
constexpr auto b3 = name1() == name1(); // ref-error {{must be initialized by a constant expression}} \
// ref-note {{comparison of addresses of literals}}
constexpr auto b4 = name1() == name2();
static_assert(!b4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just from looking at these few lines, I don't understand why b3 warns but b4 doesn't. They both compare the address of literals.

The bytecode interpreter simply creates global variables for string literals, so b3 here is simply true, like it was in the current interpreter before. Is this behavior wrong now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ model in general is that each evaluation of a string literal expression can produce a distinct object, and that these objects can fully or partially overlap each other in memory when they have suitable values; that's the model that this PR is implementing.

b3 is non-constant because each call to name1() can produce a distinct value, so the comparison result is unspecified. b4 is constant and false under this PR because the values returned by name1() and name2() cannot possibly be the same -- those two string literal evaluations can't produce the same value because the strings have different contents.


namespace UninitializedFields {
class A {
Expand Down
14 changes: 11 additions & 3 deletions clang/test/SemaCXX/builtins.cpp
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
// RUN: %clang_cc1 %s -fsyntax-only -verify -std=c++11 -fcxx-exceptions
// RUN: %clang_cc1 %s -fsyntax-only -verify -std=c++1z -fcxx-exceptions
// RUN: %clang_cc1 %s -fsyntax-only -verify -std=c++11 -fcxx-exceptions -fptrauth-intrinsics
// RUN: %clang_cc1 %s -fsyntax-only -verify -std=c++1z -fcxx-exceptions -fptrauth-intrinsics
typedef const struct __CFString * CFStringRef;
#define CFSTR __builtin___CFStringMakeConstantString
#define NSSTR __builtin___NSStringMakeConstantString

void f() {
#if !defined(__MVS__) && !defined(_AIX)
// Builtin function __builtin___CFStringMakeConstantString is currently
// unsupported on z/OS and AIX.
(void)CFStringRef(CFSTR("Hello"));

constexpr bool a = CFSTR("Hello") == CFSTR("Hello");
// expected-error@-1 {{constant expression}}
// expected-note@-2 {{comparison against opaque constant address '&__builtin___CFStringMakeConstantString("Hello")'}}
constexpr bool b = NSSTR("Hello") == NSSTR("Hello");
// expected-error@-1 {{constant expression}}
// expected-note@-2 {{comparison against opaque constant address '&__builtin___NSStringMakeConstantString("Hello")'}}
#endif
}

Expand Down Expand Up @@ -47,7 +55,7 @@ void a(void) {}
int n;
void *p = __builtin_function_start(n); // expected-error {{argument must be a function}}
static_assert(__builtin_function_start(a) == a, ""); // expected-error {{static assertion expression is not an integral constant expression}}
// expected-note@-1 {{comparison of addresses of literals has unspecified value}}
// expected-note@-1 {{comparison against opaque constant address '&__builtin_function_start(a)'}}
} // namespace function_start

void no_ms_builtins() {
Expand Down
41 changes: 32 additions & 9 deletions clang/test/SemaCXX/constant-expression-cxx11.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
// RUN: %clang_cc1 -std=c++20 -isystem %S/Inputs -fsyntax-only -verify=expected,cxx11_20,cxx20_23,pre-cxx23 -triple x86_64-linux -Wno-string-plus-int -Wno-pointer-arith -Wno-zero-length-array -Wno-c99-designator -fcxx-exceptions -pedantic %s -Wno-comment -Wno-tautological-pointer-compare -Wno-bool-conversion
// RUN: %clang_cc1 -std=c++11 -isystem %S/Inputs -fsyntax-only -verify=expected,cxx11_20,cxx11,pre-cxx23 -triple x86_64-linux -Wno-string-plus-int -Wno-pointer-arith -Wno-zero-length-array -Wno-c99-designator -fcxx-exceptions -pedantic %s -Wno-comment -Wno-tautological-pointer-compare -Wno-bool-conversion

// This macro forces its argument to be constant-folded, even if it's not
// otherwise a constant expression.
#define fold(x) (__builtin_constant_p(x) ? (x) : (x))

namespace StaticAssertFoldTest {

int x;
Expand Down Expand Up @@ -358,11 +362,36 @@ struct Str {

extern char externalvar[];
constexpr bool constaddress = (void *)externalvar == (void *)0x4000UL; // expected-error {{must be initialized by a constant expression}} expected-note {{reinterpret_cast}}
constexpr bool litaddress = "foo" == "foo"; // expected-error {{must be initialized by a constant expression}}
// expected-note@-1 {{comparison of addresses of literals has unspecified value}}
// cxx20_23-warning@-2 {{comparison between two arrays is deprecated}}
static_assert(0 != "foo", "");

// OK: These string literals cannot possibly overlap.
static_assert(+"foo" != +"bar", "");
static_assert("xfoo" + 1 != "yfoo" + 1, "");
static_assert(+"foot" != +"foo", "");
static_assert(+"foo\0bar" != +"foo\0baz", "");

// These can't overlap because the null terminator for UTF-16 is two bytes wide.
static_assert(fold((const char*)u"A" != (const char*)"\0A\0x"), "");
static_assert(fold((const char*)u"A" != (const char*)"A\0\0x"), "");

constexpr const char *string = "hello";
constexpr const char *also_string = string;
static_assert(string == string, "");
static_assert(string == also_string, "");

// These strings may overlap, and so the result of the comparison is unknown.
constexpr bool may_overlap_1 = +"foo" == +"foo"; // expected-error {{}} expected-note {{addresses of literals}}
constexpr bool may_overlap_2 = +"foo" == +"foo\0bar"; // expected-error {{}} expected-note {{addresses of literals}}
constexpr bool may_overlap_3 = +"foo" == "bar\0foo" + 4; // expected-error {{}} expected-note {{addresses of literals}}
constexpr bool may_overlap_4 = "xfoo" + 1 == "xfoo" + 1; // expected-error {{}} expected-note {{addresses of literals}}

// These may overlap even though they have different encodings.
// One of these two comparisons is non-constant, but due to endianness we don't
// know which one.
constexpr bool may_overlap_different_encoding[] =
{fold((const char*)u"A" != (const char*)"xA\0\0\0x" + 1), fold((const char*)u"A" != (const char*)"x\0A\0\0x" + 1)};
// expected-error@-2 {{}} expected-note@-1 {{addresses of literals}}

}

namespace MaterializeTemporary {
Expand Down Expand Up @@ -1543,16 +1572,10 @@ namespace MutableMembers {

namespace Fold {

// This macro forces its argument to be constant-folded, even if it's not
// otherwise a constant expression.
#define fold(x) (__builtin_constant_p(x) ? (x) : (x))

constexpr int n = (long)(char*)123; // expected-error {{constant expression}} expected-note {{reinterpret_cast}}
constexpr int m = fold((long)(char*)123); // ok
static_assert(m == 123, "");

#undef fold

}

namespace DR1454 {
Expand Down
Loading