-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[clang][clang-scan-deps] Add LangOptions::AllowLiteralDigitSeparator to fix (#88896) #158420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang Author: Yifan Fang (tsfn) ChangesFull diff: https://github.com/llvm/llvm-project/pull/158420.diff 4 Files Affected:
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 84f5ab3443a59..7418181472977 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -438,6 +438,8 @@ LANGOPT(CXXAssumptions, 1, 1, NotCompatible, "Enable or disable codegen and comp
LANGOPT(RawStringLiterals, 1, 1, NotCompatible, "Enable or disable raw string literals")
+LANGOPT(ScanDeps, 1, 0, NotCompatible, "True if we are scanning by DependencyDirectivesScanner")
+
ENUM_LANGOPT(StrictFlexArraysLevel, StrictFlexArraysLevelKind, 2,
StrictFlexArraysLevelKind::Default, NotCompatible,
"Rely on strict definition of flexible arrays")
diff --git a/clang/lib/Lex/DependencyDirectivesScanner.cpp b/clang/lib/Lex/DependencyDirectivesScanner.cpp
index eee57c786442a..81f7c2edb66cc 100644
--- a/clang/lib/Lex/DependencyDirectivesScanner.cpp
+++ b/clang/lib/Lex/DependencyDirectivesScanner.cpp
@@ -74,6 +74,7 @@ struct Scanner {
LangOpts.ObjC = true;
LangOpts.LineComment = true;
LangOpts.RawStringLiterals = true;
+ LangOpts.ScanDeps = true;
// FIXME: we do not enable C11 or C++11, so we are missing u/u8/U"".
return LangOpts;
}
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index b282a600c0e56..e7bf8afb1f835 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -2087,7 +2087,7 @@ bool Lexer::LexNumericConstant(Token &Result, const char *CurPtr) {
}
// If we have a digit separator, continue.
- if (C == '\'' && (LangOpts.CPlusPlus14 || LangOpts.C23)) {
+ if (C == '\'' && (LangOpts.CPlusPlus14 || LangOpts.C23 || LangOpts.ScanDeps)) {
auto [Next, NextSize] = getCharAndSizeNoWarn(CurPtr + Size, LangOpts);
if (isAsciiIdentifierContinue(Next)) {
if (!isLexingRawMode())
diff --git a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
index ddc87921ea084..dd8ef8daef816 100644
--- a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
+++ b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
@@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#include "clang/Lex/DependencyDirectivesScanner.h"
+#include "clang/Basic/TokenKinds.h"
#include "llvm/ADT/SmallString.h"
#include "gtest/gtest.h"
@@ -1000,6 +1001,21 @@ int z = 128'78;
EXPECT_STREQ("#include <test.h>\n", Out.data());
}
+TEST(MinimizeSourceToDependencyDirectivesTest, CharacterLiteralInPreprocessor) {
+ SmallVector<char, 128> Out;
+ SmallVector<dependency_directives_scan::Token, 8> Tokens;
+ SmallVector<Directive, 4> Directives;
+
+ StringRef Source = R"(
+ #if 1'2 == 12
+ #endif
+ )";
+ ASSERT_FALSE(minimizeSourceToDependencyDirectives(Source, Out, Tokens, Directives));
+ ASSERT_GE(Tokens.size(), 4u);
+ EXPECT_EQ(Tokens[2].Kind, tok::numeric_constant);
+ EXPECT_EQ(Tokens[3].Kind, tok::equalequal);
+}
+
TEST(MinimizeSourceToDependencyDirectivesTest, PragmaOnce) {
SmallVector<char, 128> Out;
SmallVector<dependency_directives_scan::Token, 4> Tokens;
|
While I personally favors your approach, I don't think we ever came to a conclusion in #88896 @Sirraide @jansvoboda11 @nishithshah2211 @Bigcheese (Regardless, please update the description of the PR so people understand what we are talking about here) |
I still think this is the wrong approach to the problem. |
…and enable it by default for CPlusPlus14 or C23 and set it explicitly by Scanner
@jansvoboda11 I changed LangOption to AllowLiteralDigitSeparator. What do you think about this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, cheers!
SmallVector<Directive, 4> Directives; | ||
|
||
StringRef Source = R"( | ||
#if 1'2 == 12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder do we have the same problem w/ binary literals which was also brought in via C++14 e.g. 0b10
and then we have the z
suffix which came in w/ C++23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Lexer
seems to be just leaving out these checks to Preprocessor
and Sema
. Maybe it's better to postpone the check for digit separators in this case, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious b/c this check was C++ version specific and was curious how other lexical features version related were handled.
CC @cor3ntin
This is another attempt to fix #88896, after #93753 and #95798. This PR now follows the approach descripted in #95798 (comment)
LangOptions::AllowLiteralDigitSeparator
is added (followingRawStringLiterals
). It will be enabled by default forCPlusPlus14
orC23
, andScanner
will set it explicitly to relax checks when lexing quoted numbers during preprocessing.