Skip to content

Characters that are not valid in identifiers should be allowed as preprocessing tokens #121605

@Halalaluyafail3

Description

@Halalaluyafail3

C allows using characters (to be clear, character here means a unicode scalar value) or universal character names that are not valid at the start of an identifier as preprocessing tokens:

The categories of preprocessing tokens are: header names, identifiers,
preprocessing numbers, character constants, string literals, punctuators, and
both single universal character names as well as single non-white-space
characters that do not lexically match the other preprocessing token
categories.

Section 6.4 "Lexical elements" Paragraph 3 ISO/IEC 9899:2024

However, Clang does not follow this specifically for characters e.g. the following is incorrectly rejected:

#define STR(X)#X
#define CAT(X,Y)X##Y
int main(){
    const char*CAT(z,̈)=STR(☺);
}

Clang does accept this when the combining diaresis character and smiling face emoji are replaced by \u0308 and \u263A, respectively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:frontendLanguage frontend issues, e.g. anything involving "Sema"duplicateResolved as duplicate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions