Skip to content

Conversation

@naveen-seth
Copy link
Contributor

P2223R2 allows the line-continuation slash \ to be followed by additional whitespace. The Clang lexer already follows this behavior, also for versions prior to C++23. The dependency directive scanner however only implements it for #define directives (15d5f5d).

This fully implements P2223R2 for the dependency directive scanner (for any C++ standard) and aligns the dependency directive scanner's splicing behavior with that of the Clang lexer.

For example, the following code was previously not scanned correctly by clang-scan-deps but now works as expected:

import \<whitespace here>
A;

P2223R2 allows the line-continuation slash `\` to be followed by
additional whitespace. The Clang lexer already follows this behavior,
also for versions prior to C++23. The dependency directive scanner
however only implements it for `#define` directives (15d5f5d).

This fully implements P2223R2 for the dependency directive
scanner (for any C++ standard) and aligns the dependency directive
scanner's splicing behavior with that of the Clang lexer.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jun 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 12, 2025

@llvm/pr-subscribers-clang

Author: Naveen Seth Hanig (naveen-seth)

Changes

P2223R2 allows the line-continuation slash \ to be followed by additional whitespace. The Clang lexer already follows this behavior, also for versions prior to C++23. The dependency directive scanner however only implements it for #define directives (15d5f5d).

This fully implements P2223R2 for the dependency directive scanner (for any C++ standard) and aligns the dependency directive scanner's splicing behavior with that of the Clang lexer.

For example, the following code was previously not scanned correctly by clang-scan-deps but now works as expected:

import \&lt;whitespace here&gt;
A;

Full diff: https://github.com/llvm/llvm-project/pull/143950.diff

2 Files Affected:

  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+22-10)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+91)
diff --git a/clang/lib/Lex/DependencyDirectivesScanner.cpp b/clang/lib/Lex/DependencyDirectivesScanner.cpp
index 4606b85d42fe7..1b6b16c561141 100644
--- a/clang/lib/Lex/DependencyDirectivesScanner.cpp
+++ b/clang/lib/Lex/DependencyDirectivesScanner.cpp
@@ -323,10 +323,6 @@ static unsigned skipNewline(const char *&First, const char *End) {
   return Len;
 }
 
-static bool wasLineContinuation(const char *First, unsigned EOLLen) {
-  return *(First - (int)EOLLen - 1) == '\\';
-}
-
 static void skipToNewlineRaw(const char *&First, const char *const End) {
   for (;;) {
     if (First == End)
@@ -336,13 +332,16 @@ static void skipToNewlineRaw(const char *&First, const char *const End) {
     if (Len)
       return;
 
+    char LastNonWhitespace = ' ';
     do {
+      if (!isHorizontalWhitespace(*First))
+        LastNonWhitespace = *First;
       if (++First == End)
         return;
       Len = isEOL(First, End);
     } while (!Len);
 
-    if (First[-1] != '\\')
+    if (LastNonWhitespace != '\\')
       return;
 
     First += Len;
@@ -394,6 +393,7 @@ static bool isQuoteCppDigitSeparator(const char *const Start,
 }
 
 void Scanner::skipLine(const char *&First, const char *const End) {
+  char LastNonWhitespace = ' ';
   for (;;) {
     assert(First <= End);
     if (First == End)
@@ -419,6 +419,8 @@ void Scanner::skipLine(const char *&First, const char *const End) {
       // Iterate over comments correctly.
       if (*First != '/' || End - First < 2) {
         LastTokenPtr = First;
+        if (!isWhitespace(*First))
+          LastNonWhitespace = *First;
         ++First;
         continue;
       }
@@ -431,6 +433,8 @@ void Scanner::skipLine(const char *&First, const char *const End) {
 
       if (First[1] != '*') {
         LastTokenPtr = First;
+        if (!isWhitespace(*First))
+          LastNonWhitespace = *First;
         ++First;
         continue;
       }
@@ -442,8 +446,9 @@ void Scanner::skipLine(const char *&First, const char *const End) {
       return;
 
     // Skip over the newline.
-    unsigned Len = skipNewline(First, End);
-    if (!wasLineContinuation(First, Len)) // Continue past line-continuations.
+    skipNewline(First, End);
+
+    if (LastNonWhitespace != '\\')
       break;
   }
 }
@@ -468,9 +473,16 @@ static void skipWhitespace(const char *&First, const char *const End) {
     if (End - First < 2)
       return;
 
-    if (First[0] == '\\' && isVerticalWhitespace(First[1])) {
-      skipNewline(++First, End);
-      continue;
+    if (*First == '\\') {
+      const char *Ptr = First + 1;
+      while (Ptr < End && isHorizontalWhitespace(*Ptr))
+        ++Ptr;
+      if (Ptr != End && isVerticalWhitespace(*Ptr)) {
+        skipNewline(Ptr, End);
+        First = Ptr;
+        continue;
+      }
+      return;
     }
 
     // Check for a non-comment character.
diff --git a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
index 377c066f031d3..61f74929c1e98 100644
--- a/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
+++ b/clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
@@ -789,6 +789,97 @@ TEST(MinimizeSourceToDependencyDirectivesTest,
                Out.data());
 }
 
+TEST(MinimizeSourceToDependencyDirectivesTest,
+     WhitespaceAfterLineContinuationSlashLineComment) {
+  SmallVector<char, 128> Out;
+
+  ASSERT_FALSE(minimizeSourceToDependencyDirectives("// some comment \\  \n"
+                                                    "module A;\n",
+                                                    Out));
+  EXPECT_STREQ("", Out.data());
+}
+
+TEST(MinimizeSourceToDependencyDirectivesTest,
+     WhitespaceAfterLineContinuationSlashAllDirectives) {
+  SmallVector<char, 512> Out;
+  SmallVector<dependency_directives_scan::Token, 16> Tokens;
+  SmallVector<Directive, 16> Directives;
+
+  StringRef Input = "#define \\   \n"
+                    "A\n"
+                    "#undef\t\\   \n"
+                    "A\n"
+                    "#endif \\\t\t\n"
+                    "\n"
+                    "#if \\     \t\n"
+                    "A\n"
+                    "#ifdef\t\\   \n"
+                    "A\n"
+                    "#ifndef \\ \t\n"
+                    "A\n"
+                    "#elifdef \\  \n"
+                    "A\n"
+                    "#elifndef \\ \n"
+                    "A\n"
+                    "#elif \\\t\t \n"
+                    "A\n"
+                    "#else \\\t \t\n"
+                    "\n"
+                    "#include \\  \n"
+                    "<A>\n"
+                    "#include_next \\    \n"
+                    "<A>\n"
+                    "#__include_macros\\ \n"
+                    "<A>\n"
+                    "#import \\ \t\n"
+                    "<A>\n"
+                    "@import \\\t \n"
+                    "A;\n"
+                    "#pragma clang \\   \n"
+                    "module \\    \n"
+                    "import A\n"
+                    "#pragma \\   \n"
+                    "push_macro(A)\n"
+                    "#pragma \\\t \n"
+                    "pop_macro(A)\n"
+                    "#pragma \\   \n"
+                    "include_alias(<A>,\\ \n"
+                    "<B>)\n"
+                    "export \\    \n"
+                    "module m;\n"
+                    "import\t\\\t \n"
+                    "m;\n"
+                    "#pragma\t\\  \n"
+                    "clang\t\\  \t\n"
+                    "system_header\n";
+  ASSERT_FALSE(
+      minimizeSourceToDependencyDirectives(Input, Out, Tokens, Directives));
+
+  EXPECT_EQ(pp_define, Directives[0].Kind);
+  EXPECT_EQ(pp_undef, Directives[1].Kind);
+  EXPECT_EQ(pp_endif, Directives[2].Kind);
+  EXPECT_EQ(pp_if, Directives[3].Kind);
+  EXPECT_EQ(pp_ifdef, Directives[4].Kind);
+  EXPECT_EQ(pp_ifndef, Directives[5].Kind);
+  EXPECT_EQ(pp_elifdef, Directives[6].Kind);
+  EXPECT_EQ(pp_elifndef, Directives[7].Kind);
+  EXPECT_EQ(pp_elif, Directives[8].Kind);
+  EXPECT_EQ(pp_else, Directives[9].Kind);
+  EXPECT_EQ(pp_include, Directives[10].Kind);
+  EXPECT_EQ(pp_include_next, Directives[11].Kind);
+  EXPECT_EQ(pp___include_macros, Directives[12].Kind);
+  EXPECT_EQ(pp_import, Directives[13].Kind);
+  EXPECT_EQ(decl_at_import, Directives[14].Kind);
+  EXPECT_EQ(pp_pragma_import, Directives[15].Kind);
+  EXPECT_EQ(pp_pragma_push_macro, Directives[16].Kind);
+  EXPECT_EQ(pp_pragma_pop_macro, Directives[17].Kind);
+  EXPECT_EQ(pp_pragma_include_alias, Directives[18].Kind);
+  EXPECT_EQ(cxx_export_module_decl, Directives[19].Kind);
+  EXPECT_EQ(cxx_import_decl, Directives[20].Kind);
+  EXPECT_EQ(pp_pragma_system_header, Directives[21].Kind);
+  EXPECT_EQ(pp_eof, Directives[22].Kind);
+}
+
 TEST(MinimizeSourceToDependencyDirectivesTest, PoundWarningAndError) {
   SmallVector<char, 128> Out;
 

@naveen-seth
Copy link
Contributor Author

Hi @hyp, @Bigcheese, would you be available to review this?

Copy link
Contributor

@Bigcheese Bigcheese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks for the fix.

@Bigcheese Bigcheese merged commit 9d49b82 into llvm:main Jun 13, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 13, 2025

LLVM Buildbot has detected a new failure on builder ppc64le-flang-rhel-clang running on ppc64le-flang-rhel-test while building clang at step 6 "test-build-unified-tree-check-flang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/157/builds/30747

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-flang) failure: test (failure)
******************** TEST 'Flang :: Semantics/modfile75.F90' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -c -fhermetic-module-files -DWHICH=1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90 && /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -c -fhermetic-module-files -DWHICH=2 /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90 && /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -fc1 -fdebug-unparse /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90 | /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90 # RUN: at line 1
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -c -fhermetic-module-files -DWHICH=1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -c -fhermetic-module-files -DWHICH=2 /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/flang -fc1 -fdebug-unparse /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90
error: Semantic errors in /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90
/home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90:15:11: error: Must be a constant value
    integer(c_int) n
            ^^^^^
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/build/bin/FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-flang-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/flang/test/Semantics/modfile75.F90

--

********************


tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
llvm#143950)

P2223R2 allows the line-continuation slash `\` to be followed by
additional whitespace. The Clang lexer already follows this behavior,
also for versions prior to C++23. The dependency directive scanner
however only implements it for `#define` directives (15d5f5d).

This fully implements P2223R2 for the dependency directive scanner (for
any C++ standard) and aligns the dependency directive scanner's splicing
behavior with that of the Clang lexer.

For example, the following code was previously not scanned correctly by
`clang-scan-deps` but now works as expected:

```cpp
import \<whitespace here>
A;
```
@naveen-seth naveen-seth deleted the directive-scan-cxx23-line-splice branch June 18, 2025 12:01
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
llvm#143950)

P2223R2 allows the line-continuation slash `\` to be followed by
additional whitespace. The Clang lexer already follows this behavior,
also for versions prior to C++23. The dependency directive scanner
however only implements it for `#define` directives (15d5f5d).

This fully implements P2223R2 for the dependency directive scanner (for
any C++ standard) and aligns the dependency directive scanner's splicing
behavior with that of the Clang lexer.

For example, the following code was previously not scanned correctly by
`clang-scan-deps` but now works as expected:

```cpp
import \<whitespace here>
A;
```
ziqingluo-90 added a commit to ziqingluo-90/apple-llvm-project that referenced this pull request Jul 3, 2025
`clang-scan-deps` threw "unterminated conditional directive" error
falsely on the following example:

```

 #if defined(__TEST_DUMMY2)
  #pragma GCC warning                                                          \
      "Hello!"
 #else
  #pragma GCC error                                                            \
      "World!"
 #endif // defined(__TEST_DUMMY2)

```

The issue comes from PR llvm#143950, where the flag `LastNonWhitespace`
does not correctly represent the state in the example above.  The PR
aimed to support that a line-continuation can be followed by
whitespaces.  This fix uses a more straightforward but less efficient
way to do the same thing---it looks back for a line-continuation and
skips any number of whitespaces before that.

rdar://153742186
ziqingluo-90 added a commit that referenced this pull request Jul 4, 2025
`clang-scan-deps` threw "unterminated conditional directive" error
falsely on the following example:

```
#ifndef __TEST
#define __TEST

#if defined(__TEST_DUMMY)
 #if defined(__TEST_DUMMY2)
  #pragma GCC warning                                                          \
      "Hello!"
 #else
  #pragma GCC error                                                            \
      "World!"
 #endif // defined(__TEST_DUMMY2)
#endif  // defined(__TEST_DUMMY)

#endif // #ifndef __TEST
```

The issue comes from PR #143950, where the flag `LastNonWhitespace` does
not correctly represent the state for the example above. The PR aimed to
support that a line-continuation can be followed by whitespaces.  
This commit fixes the issue by moving the `LastNonWhitespace` variable
to the inner loop so that it will be correctly reset.

rdar://153742186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants