Skip to content

Conversation

tsfn
Copy link

@tsfn tsfn commented Sep 13, 2025

This is another attempt to fix #88896, after #93753 and #95798. This PR now follows the approach descripted in #95798 (comment)
LangOptions::AllowLiteralDigitSeparator is added (following RawStringLiterals). It will be enabled by default for CPlusPlus14 or C23, and Scanner will set it explicitly to relax checks when lexing quoted numbers during preprocessing.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2025

@llvm/pr-subscribers-clang

Author: Yifan Fang (tsfn)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/158420.diff

4 Files Affected:

  • (modified) clang/include/clang/Basic/LangOptions.def (+2)
  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+1)
  • (modified) clang/lib/Lex/Lexer.cpp (+1-1)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+16)
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 84f5ab3443a59..7418181472977 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -438,6 +438,8 @@ LANGOPT(CXXAssumptions, 1, 1, NotCompatible, "Enable or disable codegen and comp
 
 LANGOPT(RawStringLiterals, 1, 1, NotCompatible, "Enable or disable raw string literals")
 
+LANGOPT(ScanDeps, 1, 0, NotCompatible, "True if we are scanning by DependencyDirectivesScanner")
+
 ENUM_LANGOPT(StrictFlexArraysLevel, StrictFlexArraysLevelKind, 2,
              StrictFlexArraysLevelKind::Default, NotCompatible,
              "Rely on strict definition of flexible arrays")
diff --git a/clang/lib/Lex/DependencyDirectivesScanner.cpp b/clang/lib/Lex/DependencyDirectivesScanner.cpp
index eee57c786442a..81f7c2edb66cc 100644
--- a/clang/lib/Lex/DependencyDirectivesScanner.cpp
+++ b/clang/lib/Lex/DependencyDirectivesScanner.cpp
@@ -74,6 +74,7 @@ struct Scanner {
     LangOpts.ObjC = true;
     LangOpts.LineComment = true;
     LangOpts.RawStringLiterals = true;
+    LangOpts.ScanDeps = true;
     // FIXME: we do not enable C11 or C++11, so we are missing u/u8/U"".
     return LangOpts;
   }
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index b282a600c0e56..e7bf8afb1f835 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -2087,7 +2087,7 @@ bool Lexer::LexNumericConstant(Token &Result, const char *CurPtr) {
   }
 
   // If we have a digit separator, continue.
-  if (C == '\'' && (LangOpts.CPlusPlus14 || LangOpts.C23)) {
+  if (C == '\'' && (LangOpts.CPlusPlus14 || LangOpts.C23 || LangOpts.ScanDeps)) {
     auto [Next, NextSize] = getCharAndSizeNoWarn(CurPtr + Size, LangOpts);
     if (isAsciiIdentifierContinue(Next)) {
       if (!isLexingRawMode())
diff --git a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
index ddc87921ea084..dd8ef8daef816 100644
--- a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
+++ b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
@@ -7,6 +7,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "clang/Lex/DependencyDirectivesScanner.h"
+#include "clang/Basic/TokenKinds.h"
 #include "llvm/ADT/SmallString.h"
 #include "gtest/gtest.h"
 
@@ -1000,6 +1001,21 @@ int z = 128'78;
   EXPECT_STREQ("#include <test.h>\n", Out.data());
 }
 
+TEST(MinimizeSourceToDependencyDirectivesTest, CharacterLiteralInPreprocessor) {
+  SmallVector<char, 128> Out;
+  SmallVector<dependency_directives_scan::Token, 8> Tokens;
+  SmallVector<Directive, 4> Directives;
+
+  StringRef Source = R"(
+    #if 1'2 == 12
+    #endif
+    )";
+  ASSERT_FALSE(minimizeSourceToDependencyDirectives(Source, Out, Tokens, Directives));
+  ASSERT_GE(Tokens.size(), 4u);
+  EXPECT_EQ(Tokens[2].Kind, tok::numeric_constant);
+  EXPECT_EQ(Tokens[3].Kind, tok::equalequal);
+}
+
 TEST(MinimizeSourceToDependencyDirectivesTest, PragmaOnce) {
   SmallVector<char, 128> Out;
   SmallVector<dependency_directives_scan::Token, 4> Tokens;

@tsfn tsfn changed the title [clang] Add LangOpts.ScanDeps to fix #88896 [clang] Add LangOpts.ScanDeps to fix (#88896) Sep 13, 2025
@tsfn tsfn changed the title [clang] Add LangOpts.ScanDeps to fix (#88896) [clang][clang-scan-deps] Add LangOpts.ScanDeps to fix (#88896) Sep 13, 2025
@cor3ntin
Copy link
Contributor

While I personally favors your approach, I don't think we ever came to a conclusion in #88896

@Sirraide @jansvoboda11 @nishithshah2211 @Bigcheese

(Regardless, please update the description of the PR so people understand what we are talking about here)

@jansvoboda11
Copy link
Contributor

I still think this is the wrong approach to the problem.

…and enable it by default for CPlusPlus14 or C23 and set it explicitly by Scanner
@tsfn tsfn changed the title [clang][clang-scan-deps] Add LangOpts.ScanDeps to fix (#88896) [clang][clang-scan-deps] Add LangOptions::AllowLiteralDigitSeparator to fix (#88896) Sep 16, 2025
@tsfn
Copy link
Author

tsfn commented Sep 16, 2025

@jansvoboda11 I changed LangOption to AllowLiteralDigitSeparator. What do you think about this?

@tsfn tsfn deleted the branch llvm:main September 16, 2025 14:51
@tsfn tsfn closed this Sep 16, 2025
@tsfn tsfn deleted the main branch September 16, 2025 14:51
@tsfn tsfn restored the main branch September 16, 2025 14:59
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, cheers!

@tsfn tsfn reopened this Sep 16, 2025
SmallVector<Directive, 4> Directives;

StringRef Source = R"(
#if 1'2 == 12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder do we have the same problem w/ binary literals which was also brought in via C++14 e.g. 0b10 and then we have the z suffix which came in w/ C++23.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Lexer seems to be just leaving out these checks to Preprocessor and Sema. Maybe it's better to postpone the check for digit separators in this case, too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious b/c this check was C++ version specific and was curious how other lexical features version related were handled.

CC @cor3ntin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clang-scan-deps] Error if integer literal contains a single quote in #if directive

5 participants