Skip to content

Multicharacter constants are not handled correctly in C or C++ before C++23 #156269

@Halalaluyafail3

Description

@Halalaluyafail3

The following programs demonstrate the issues with how Clang handles multicharacter constants.

Program 1:

int main(void){
    L'ab';
}

Program 2:

int main(void){
    u'ab';
}

In program 1, multiple characters are used inside of a wchar_t character constant (prefixed with L). This is valid in C and valid in C++ before C++23. In program 2, multiple characters are used inside of a UTF-16 character constant (prefixed with u). This is valid in C before C23. This also applies to UTF-32 character constants (prefixed with U) as well. It seems that in Clang version 14 all language modes operate using the C++23 rules so that only character constants with no prefix can contain multiple characters. GCC correctly handles these cases, except that it doesn't reject program 2 in C23 which I have made a bug report for: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121739.

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:frontendLanguage frontend issues, e.g. anything involving "Sema"questionA question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions