Skip to content

Conversation

@yronglin
Copy link
Contributor

@yronglin yronglin commented Sep 4, 2024

This PR implement the following papers:
P1857R3 Modules Dependency Discovery.
P3034R1 Module Declarations Shouldn’t be Macros.
CWG2947.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are
    • at the start of a logical line, or
    • preceded by an export at the start of the logical line.
  • Are followed by an identifier pp token (before macro expansion), or
    • <, ", or : (but not ::) pp tokens for import, or
    • ; for module
      Otherwise the token is treated as an identifier.

Additionally:

  • The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.
  • The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.
  • A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
  • Preprocessor conditionals shall not span a module declaration.

After this patch, we handle C++ module-import and module-declaration as a real pp-directive in preprocessor. Additionally, we refactor module name lexing, remove the complex state machine and read full module name during module/import directive handling. Possibly we can introduce a tok::annot_module_name token in the future, avoid duplicatly parsing module name in both preprocessor and parser, but it's makes error recovery much diffcult(eg. import a; import b; in same line).

This patch also introduce 2 new keyword __preprocessed_module and __preprocessed_import. These 2 keyword was generated during -E mode. This is useful to avoid confusion with module and import keyword in preprocessed output:

export module m;
struct import {};
#define EMPTY
EMPTY import foo;

Fixes #54047

@github-actions
Copy link

github-actions bot commented Sep 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

yronglin added a commit that referenced this pull request Apr 16, 2025
…tructures to `IdentifierLoc` (#135808)

I found this issue when I working on
#107168.

Currently we have many similiar data structures like:
 - `std::pair<IdentifierInfo *, SourceLocation>`.
 - Element type of `ModuleIdPath`.
 - `IdentifierLocPair`.
 - `IdentifierLoc`.
 
This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.

---------

Signed-off-by: yronglin <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 16, 2025
…like data structures to `IdentifierLoc` (#135808)

I found this issue when I working on
llvm/llvm-project#107168.

Currently we have many similiar data structures like:
 - `std::pair<IdentifierInfo *, SourceLocation>`.
 - Element type of `ModuleIdPath`.
 - `IdentifierLocPair`.
 - `IdentifierLoc`.

This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.

---------

Signed-off-by: yronglin <[email protected]>
yronglin added a commit that referenced this pull request Apr 17, 2025
… data structures to `IdentifierLoc` (#136077)

This PR reland #135808, fixed
some missed changes in LLDB.
I found this issue when I working on
#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 17, 2025
…` pair-like data structures to `IdentifierLoc` (#136077)

This PR reland llvm/llvm-project#135808, fixed
some missed changes in LLDB.
I found this issue when I working on
llvm/llvm-project#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <[email protected]>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
… data structures to `IdentifierLoc` (llvm#136077)

This PR reland llvm#135808, fixed
some missed changes in LLDB.
I found this issue when I working on
llvm#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <[email protected]>
@yronglin yronglin force-pushed the modules_dependency_discovery branch from dbc377e to d300a2b Compare May 31, 2025 07:36
@yronglin yronglin marked this pull request as ready for review May 31, 2025 07:43
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels May 31, 2025
@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang-modules

Author: None (yronglin)

Changes

Implement P1857R3 Modules Dependency Discovery.

  • Handle C++ module and import directive like other preprocessor directive.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are

    • ✅at the start of a logical line, or

    • ✅preceded by an export at the start of the logical line.

  • Are followed by an identifier pp token (before macro expansion), or

    • ✅<, ", or : (but not ::) pp tokens for import, or

    • ✅; for module

Otherwise the token is treated as an identifier.

Additionally:

  • ✅The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.

  • ✅The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.

  • ❌**[TODO]** A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)

  • ✅Preprocessor conditionals shall not span a module declaration.

Need add more test


Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff

43 Files Affected:

  • (modified) clang/examples/AnnotateFunctions/AnnotateFunctions.cpp (+1-1)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+15-1)
  • (modified) clang/include/clang/Basic/DiagnosticParseKinds.td (+2-4)
  • (modified) clang/include/clang/Basic/IdentifierTable.h (+22-4)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+1-1)
  • (modified) clang/include/clang/Lex/CodeCompletionHandler.h (+8)
  • (modified) clang/include/clang/Lex/Lexer.h (+5-5)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+95-26)
  • (modified) clang/include/clang/Lex/Token.h (+7)
  • (modified) clang/include/clang/Lex/TokenLexer.h (+3-4)
  • (modified) clang/include/clang/Parse/Parser.h (+2)
  • (modified) clang/include/clang/Sema/Sema.h (+4-2)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+7-3)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+8-1)
  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+20-8)
  • (modified) clang/lib/Lex/Lexer.cpp (+46-18)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+264-5)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+15-17)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+171-188)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+5-3)
  • (modified) clang/lib/Lex/TokenLexer.cpp (+7-6)
  • (modified) clang/lib/Parse/Parser.cpp (+30-63)
  • (modified) clang/lib/Sema/SemaModule.cpp (+30-45)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+1-1)
  • (modified) clang/test/CXX/basic/basic.link/p1.cpp (+109-36)
  • (modified) clang/test/CXX/basic/basic.link/p3.cpp (+40-27)
  • (modified) clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp (+56-26)
  • (modified) clang/test/CXX/lex/lex.pptoken/p3-2a.cpp (+10-5)
  • (modified) clang/test/CXX/module/basic/basic.def.odr/p6.cppm (+134-40)
  • (modified) clang/test/CXX/module/basic/basic.link/module-declaration.cpp (+35-29)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm (+27-11)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.interface/p1.cppm (+18-21)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p1.cpp (+30-14)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p5.cpp (+48-17)
  • (modified) clang/test/CXX/module/module.interface/p1.cpp (+24-18)
  • (modified) clang/test/CXX/module/module.interface/p2.cpp (+12-14)
  • (modified) clang/test/CXX/module/module.unit/p8.cpp (+28-20)
  • (modified) clang/test/Modules/pr121066.cpp (+1-2)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp (+1-1)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+5-5)
  • (modified) clang/unittests/Lex/ModuleDeclStateTest.cpp (+1-1)
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
index d872020c2d8a3..22a3eb97f938b 100644
--- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
+++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
@@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler {
     Token Tok;
     PP.LexUnexpandedToken(Tok);
     if (Tok.isNot(tok::eod))
-      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma";
+      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma";
 
     if (HandledDecl) {
       DiagnosticsEngine &D = PP.getDiagnostics();
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 723f5d48b4f5f..f975a63b369b5 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error<
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
+def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn<
+  "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>;
 
 def ext_pp_comma_expr : Extension<"comma operator in operand of #if">;
 def ext_pp_bad_vaargs_use : Extension<
@@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning<
 def ext_named_variadic_macro : Extension<
   "named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
 def err_embedded_directive : Error<
-  "embedding a #%0 directive within macro arguments is not supported">;
+  "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">;
 def ext_embedded_directive : Extension<
   "embedding a directive within macro arguments has undefined behavior">,
   InGroup<DiagGroup<"embedded-directive">>;
@@ -983,6 +985,18 @@ def warn_module_conflict : Warning<
   InGroup<ModuleConflict>;
 
 // C++20 modules
+def err_pp_expected_module_name_or_header_name : Error<
+  "expected module name or header name">;
+def err_pp_expected_semi_after_module_or_import : Error<
+  "'%select{module|import}0' directive must end with a ';' on the same line">;
+def err_module_decl_in_header : Error<
+  "module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl : Error<
+  "preprocessor conditionals shall not span a module declaration">;
+def err_pp_module_expected_ident : Error<
+  "expected a module name after '%select{module|import}0'">;
+def err_pp_unsupported_module_partition : Error<
+  "module partitions are only supported for C++20 onwards">;
 def err_header_import_semi_in_macro : Error<
   "semicolon terminating header import declaration cannot be produced "
   "by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 3aa36ad59d0b9..c06e2f090b429 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1760,8 +1760,8 @@ def ext_bit_int : Extension<
 } // end of Parse Issue category.
 
 let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
-  "module declaration can only appear at the top level">;
+def err_unexpected_module_import_decl : Error<
+  "%select{module|import}0 declaration can only appear at the top level">;
 def err_module_expected_ident : Error<
   "expected a module name after '%select{module|import}0'">;
 def err_attribute_not_module_attr : Error<
@@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error<
 def err_private_module_fragment_expected_semi : Error<
   "expected ';' after private module fragment declaration">;
 def err_missing_before_module_end : Error<"expected %0 at end of module">;
-def err_unsupported_module_partition : Error<
-  "module partitions are only supported for C++20 onwards">;
 def err_import_not_allowed_here : Error<
   "imports must immediately follow the module declaration">;
 def err_partition_import_outside_module : Error<
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 54540193cfcc0..add6c6ac629a1 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsModulesImport : 1;
 
+  // True if this is the 'module' contextual keyword.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsModulesDecl : 1;
+
   // True if this is a mangled OpenMP variant name.
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsMangledOpenMPVariantName : 1;
@@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
         IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
         IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
         RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
-        IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
-        IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+        IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+        IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+        IsKeywordInCpp(false) {}
 
 public:
   IdentifierInfo(const IdentifierInfo &) = delete;
@@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
       RecomputeNeedsHandleIdentifier();
   }
 
+  /// Determine whether this is the contextual keyword \c module.
+  bool isModulesDeclaration() const { return IsModulesDecl; }
+
+  /// Set whether this identifier is the contextual keyword \c module.
+  void setModulesDeclaration(bool I) {
+    IsModulesDecl = I;
+    if (I)
+      NeedsHandleIdentifier = true;
+    else
+      RecomputeNeedsHandleIdentifier();
+  }
+
   /// Determine whether this is the mangled name of an OpenMP variant.
   bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; }
 
@@ -745,10 +762,11 @@ class IdentifierTable {
     // contents.
     II->Entry = &Entry;
 
-    // If this is the 'import' contextual keyword, mark it as such.
+    // If this is the 'import' or 'module' contextual keyword, mark it as such.
     if (Name == "import")
       II->setModulesImport(true);
-
+    else if (Name == "module")
+      II->setModulesDeclaration(true);
     return *II;
   }
 
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 94e72fea56a68..7750c84dbef78 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,9 @@ PPKEYWORD(pragma)
 // C23 & C++26 #embed
 PPKEYWORD(embed)
 
+// C++20 Module Directive
+PPKEYWORD(module)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -1023,6 +1026,9 @@ ANNOTATION(module_include)
 ANNOTATION(module_begin)
 ANNOTATION(module_end)
 
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
 // Annotation for a header_name token that has been looked up and transformed
 // into the name of a header unit.
 ANNOTATION(header_unit)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 0ae490f0e8073..112d3b00160fd 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader {
   /// load it.
   ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
                                                  SourceLocation ImportLoc,
-                                                 SourceLocation ModuleNameLoc,
+                                                 SourceRange ModuleNameRange,
                                                  bool IsInclusionDirective);
 
   /// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
 #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
 #include "llvm/ADT/StringRef.h"
 
 namespace clang {
 
 class IdentifierInfo;
 class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
 
 /// Callback handler that receives notifications when performing code
 /// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
   /// file where we expect natural language, e.g., a comment, string, or
   /// \#error directive.
   virtual void CodeCompleteNaturalLanguage() { }
+
+  /// Callback invoked when performing code completion inside the module name
+  /// part of an import directive.
+  virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+                                        ModuleIdPath Path) {}
 };
 
 }
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index bb65ae010cffa..a595cda1eaa77 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
   //===--------------------------------------------------------------------===//
   // Context that changes as the file is lexed.
   // NOTE: any state that mutates when in raw mode must have save/restore code
-  // in Lexer::isNextPPTokenLParen.
+  // in Lexer::peekNextPPToken.
 
   // BufferPtr - Current pointer into the buffer.  This is the next character
   // to be lexed.
@@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
     BufferPtr = TokEnd;
   }
 
-  /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
-  /// tok::l_paren token, 0 if it is something else and 2 if there are no more
-  /// tokens in the buffer controlled by this lexer.
-  unsigned isNextPPTokenLParen();
+  /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+  /// buffer controlled by this lexer, otherwise return the next unexpanded
+  /// token.
+  std::optional<Token> peekNextPPToken();
 
   //===--------------------------------------------------------------------===//
   // Lexer character reading interfaces.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index f2dfd3a349b8b..79a75a116c418 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
@@ -82,6 +83,7 @@ class PreprocessorLexer;
 class PreprocessorOptions;
 class ScratchBuffer;
 class TargetInfo;
+class ModuleNameLoc;
 
 namespace Builtin {
 class Context;
@@ -332,8 +334,9 @@ class Preprocessor {
   /// lexed, if any.
   SourceLocation ModuleImportLoc;
 
-  /// The import path for named module that we're currently processing.
-  SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+  /// The source location of the \c module contextual keyword we just
+  /// lexed, if any.
+  SourceLocation ModuleDeclLoc;
 
   llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
   unsigned CheckPointCounter = 0;
@@ -344,6 +347,21 @@ class Preprocessor {
   /// Whether the last token we lexed was an '@'.
   bool LastTokenWasAt = false;
 
+  /// Whether we're importing a standard C++20 named Modules.
+  bool ImportingCXXNamedModules = false;
+
+  /// Whether we're declaring a standard C++20 named Modules.
+  bool DeclaringCXXNamedModules = false;
+
+  struct ExportContextualKeywordInfo {
+    Token ExportTok;
+    bool TokAtPhysicalStartOfLine;
+  };
+
+  /// Whether the last token we lexed was an 'export' keyword.
+  std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword =
+      std::nullopt;
+
   /// A position within a C++20 import-seq.
   class StdCXXImportSeq {
   public:
@@ -547,12 +565,7 @@ class Preprocessor {
         reset();
     }
 
-    void handleIdentifier(IdentifierInfo *Identifier) {
-      if (isModuleCandidate() && Identifier)
-        Name += Identifier->getName().str();
-      else if (!isNamedModule())
-        reset();
-    }
+    void handleModuleName(ModuleNameLoc *Path);
 
     void handleColon() {
       if (isModuleCandidate())
@@ -561,13 +574,6 @@ class Preprocessor {
         reset();
     }
 
-    void handlePeriod() {
-      if (isModuleCandidate())
-        Name += ".";
-      else if (!isNamedModule())
-        reset();
-    }
-
     void handleSemi() {
       if (!Name.empty() && isModuleCandidate()) {
         if (State == InterfaceCandidate)
@@ -622,10 +628,6 @@ class Preprocessor {
 
   ModuleDeclSeq ModuleDeclState;
 
-  /// Whether the module import expects an identifier next. Otherwise,
-  /// it expects a '.' or ';'.
-  bool ModuleImportExpectsIdentifier = false;
-
   /// The identifier and source location of the currently-active
   /// \#pragma clang arc_cf_code_audited begin.
   IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -1759,6 +1761,19 @@ class Preprocessor {
   /// Lex the parameters for an #embed directive, returns nullopt on error.
   std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
                                                              bool ForHasEmbed);
+  bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+                             SmallVectorImpl<IdentifierLoc> &Path,
+                             bool AllowMacroExpansion = true);
+  void HandleCXXImportDirective(Token Import);
+  void HandleCXXModuleDirective(Token Module);
+  
+  /// Callback invoked when the lexer sees one of export, import or module token
+  /// at the start of a line.
+  ///
+  /// This consumes the import, module directive, modifies the
+  /// lexer/preprocessor state, and advances the lexer(s) so that the next token
+  /// read is the correct one.
+  bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine);
 
   bool LexAfterModuleImport(Token &Result);
   void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks);
@@ -2282,7 +2297,9 @@ class Preprocessor {
   /// Determine whether the next preprocessor token to be
   /// lexed is a '('.  If so, consume the token and return true, if not, this
   /// method should have no observable side-effect on the lexed tokens.
-  bool isNextPPTokenLParen();
+  bool isNextPPTokenLParen() {
+    return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
+  }
 
 private:
   /// Identifiers used for SEH handling in Borland. These are only
@@ -2342,7 +2359,7 @@ class Preprocessor {
   ///
   /// \return The location of the end of the directive (the terminating
   /// newline).
-  SourceLocation CheckEndOfDirective(const char *DirType,
+  SourceLocation CheckEndOfDirective(StringRef DirType,
                                      bool EnableMacros = false);
 
   /// Read and discard all tokens remaining on the current line until
@@ -2424,11 +2441,12 @@ class Preprocessor {
   }
 
   /// If we're importing a standard C++20 Named Modules.
-  bool isInImportingCXXNamedModules() const {
-    // NamedModuleImportPath will be non-empty only if we're importing
-    // Standard C++ named modules.
-    return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules &&
-           !IsAtImport;
+  bool isImportingCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules;
+  }
+
+  bool isDeclaringCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules;
   }
 
   /// Allocate a new MacroInfo object with the provided SourceLocation.
@@ -2661,6 +2679,10 @@ class Preprocessor {
 
   void removeCachedMacroExpandedTokensOfLastLexer();
 
+  /// Peek the next token. If so, return the token, if not, this
+  /// method should have no observable side-effect on the lexed tokens.
+  std::optional<Token> peekNextPPToken();
+
   /// After reading "MACRO(", this method is invoked to read all of the formal
   /// arguments specified for the macro invocation.  Returns null on error.
   MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
@@ -3078,6 +3100,53 @@ struct EmbedAnnotationData {
   StringRef FileName;
 };
 
+/// Represents module name annotation data.
+///
+///     module-name:
+///           module-name-qualifier[opt] identifier
+///
+///     partition-name: [C++20]
+///           : module-name-qualifier[opt] identifier
+///
+///     module-name-qualifier
+///           module-name-qualifier[opt] identifier .
+class ModuleNameLoc final
+    : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+  friend TrailingObjects;
+  unsigned NumIdentifierLocs;
+
+  unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+    return getNumIdentifierLocs();
+  }
+
+  ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+    (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>());
+  }
+
+public:
+  static std::string stringFromModuleIdPath(ModuleIdPath Path);
+  static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+  static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path);
+  unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+  ModuleIdPath getModuleIdPath() const {
+    return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()};
+  }
+
+  SourceLocation getBeginLoc() const {
+    return getModuleIdPath().front().getLoc();
+  }
+  SourceLocation getEndLoc() const {
+    auto &Last = getModuleIdPath().back();
+    return Last.getLoc().getLocWithOffset(
+        Last.getIdentifierInfo()->getLength());
+  }
+  SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+
+  std::string str() const;
+  void print(llvm::raw_ostream &OS) const;
+  void dump() const { print(llvm::errs()); }
+};
+
 /// Registry of pragma handlers added by plugins
 using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>;
 
diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h
index 4f29fb7d11415..8e81207ddf8d7 100644
--- a/clang/include/clang/Lex/Token.h
+++ b/clang/include/clang/Lex/Token.h
@@ -231,6 +231,9 @@ class Token {
     PtrData = const_cast<char*>(Ptr);
   }
 
+  template <class T> T getAnnotationValueAs() const {
+    return static_cast<T>(getAnnotationValue());
+  }
   void *getAnnotationValue() const {
     assert(isAnnotation() && "Used AnnotVal on non-annotation token");
     return PtrData;
@@ -289,6 +292,10 @@ class Token {
   /// Return the ObjC keyword kind.
   tok::ObjCKeywordKind getObjCKeywordID() const;
 
+  /// Return true if we have an C++20 Modules contextual keyword(export, import
+  /// or module).
+  bool isModuleContextualKeyword(bool AllowExport = true) const;
+
   bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const;
 
   /// Return true if this token has trigraphs or escaped newlines in it.
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 4d229ae610674..777b4e6266c71 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -139,10 +139,9 @@ class TokenLexer {
   void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
             bool OwnsTokens, bool IsReinject);
 
-  /// If the next token lexed will pop this macro off the
-  /// expansion stack, return 2.  If the next unexpanded token is a '(', return
-  /// 1, otherwise return 0.
-  unsigned isNextTokenLParen() const;
+  /// If the next token lexed will pop this macro off the expansion stack,
+  /// return std::nullopt, otherwise return the next unexpanded token.
+  std::optional<Token> peekNextPPToken() const;
 
   /// Lex and return a token from this macro stream.
   bool Lex(Token &Tok);
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index c4bef4729fd36..a59a99bbac7c6 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler {
                                  unsigned ArgumentIndex) override;
   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-clang

Author: None (yronglin)

Changes

Implement P1857R3 Modules Dependency Discovery.

  • Handle C++ module and import directive like other preprocessor directive.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are

    • ✅at the start of a logical line, or

    • ✅preceded by an export at the start of the logical line.

  • Are followed by an identifier pp token (before macro expansion), or

    • ✅<, ", or : (but not ::) pp tokens for import, or

    • ✅; for module

Otherwise the token is treated as an identifier.

Additionally:

  • ✅The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.

  • ✅The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.

  • ❌**[TODO]** A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)

  • ✅Preprocessor conditionals shall not span a module declaration.

Need add more test


Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff

43 Files Affected:

  • (modified) clang/examples/AnnotateFunctions/AnnotateFunctions.cpp (+1-1)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+15-1)
  • (modified) clang/include/clang/Basic/DiagnosticParseKinds.td (+2-4)
  • (modified) clang/include/clang/Basic/IdentifierTable.h (+22-4)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+1-1)
  • (modified) clang/include/clang/Lex/CodeCompletionHandler.h (+8)
  • (modified) clang/include/clang/Lex/Lexer.h (+5-5)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+95-26)
  • (modified) clang/include/clang/Lex/Token.h (+7)
  • (modified) clang/include/clang/Lex/TokenLexer.h (+3-4)
  • (modified) clang/include/clang/Parse/Parser.h (+2)
  • (modified) clang/include/clang/Sema/Sema.h (+4-2)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+7-3)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+8-1)
  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+20-8)
  • (modified) clang/lib/Lex/Lexer.cpp (+46-18)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+264-5)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+15-17)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+171-188)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+5-3)
  • (modified) clang/lib/Lex/TokenLexer.cpp (+7-6)
  • (modified) clang/lib/Parse/Parser.cpp (+30-63)
  • (modified) clang/lib/Sema/SemaModule.cpp (+30-45)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+1-1)
  • (modified) clang/test/CXX/basic/basic.link/p1.cpp (+109-36)
  • (modified) clang/test/CXX/basic/basic.link/p3.cpp (+40-27)
  • (modified) clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp (+56-26)
  • (modified) clang/test/CXX/lex/lex.pptoken/p3-2a.cpp (+10-5)
  • (modified) clang/test/CXX/module/basic/basic.def.odr/p6.cppm (+134-40)
  • (modified) clang/test/CXX/module/basic/basic.link/module-declaration.cpp (+35-29)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm (+27-11)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.interface/p1.cppm (+18-21)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p1.cpp (+30-14)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p5.cpp (+48-17)
  • (modified) clang/test/CXX/module/module.interface/p1.cpp (+24-18)
  • (modified) clang/test/CXX/module/module.interface/p2.cpp (+12-14)
  • (modified) clang/test/CXX/module/module.unit/p8.cpp (+28-20)
  • (modified) clang/test/Modules/pr121066.cpp (+1-2)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp (+1-1)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+5-5)
  • (modified) clang/unittests/Lex/ModuleDeclStateTest.cpp (+1-1)
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
index d872020c2d8a3..22a3eb97f938b 100644
--- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
+++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
@@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler {
     Token Tok;
     PP.LexUnexpandedToken(Tok);
     if (Tok.isNot(tok::eod))
-      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma";
+      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma";
 
     if (HandledDecl) {
       DiagnosticsEngine &D = PP.getDiagnostics();
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 723f5d48b4f5f..f975a63b369b5 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error<
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
+def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn<
+  "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>;
 
 def ext_pp_comma_expr : Extension<"comma operator in operand of #if">;
 def ext_pp_bad_vaargs_use : Extension<
@@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning<
 def ext_named_variadic_macro : Extension<
   "named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
 def err_embedded_directive : Error<
-  "embedding a #%0 directive within macro arguments is not supported">;
+  "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">;
 def ext_embedded_directive : Extension<
   "embedding a directive within macro arguments has undefined behavior">,
   InGroup<DiagGroup<"embedded-directive">>;
@@ -983,6 +985,18 @@ def warn_module_conflict : Warning<
   InGroup<ModuleConflict>;
 
 // C++20 modules
+def err_pp_expected_module_name_or_header_name : Error<
+  "expected module name or header name">;
+def err_pp_expected_semi_after_module_or_import : Error<
+  "'%select{module|import}0' directive must end with a ';' on the same line">;
+def err_module_decl_in_header : Error<
+  "module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl : Error<
+  "preprocessor conditionals shall not span a module declaration">;
+def err_pp_module_expected_ident : Error<
+  "expected a module name after '%select{module|import}0'">;
+def err_pp_unsupported_module_partition : Error<
+  "module partitions are only supported for C++20 onwards">;
 def err_header_import_semi_in_macro : Error<
   "semicolon terminating header import declaration cannot be produced "
   "by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 3aa36ad59d0b9..c06e2f090b429 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1760,8 +1760,8 @@ def ext_bit_int : Extension<
 } // end of Parse Issue category.
 
 let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
-  "module declaration can only appear at the top level">;
+def err_unexpected_module_import_decl : Error<
+  "%select{module|import}0 declaration can only appear at the top level">;
 def err_module_expected_ident : Error<
   "expected a module name after '%select{module|import}0'">;
 def err_attribute_not_module_attr : Error<
@@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error<
 def err_private_module_fragment_expected_semi : Error<
   "expected ';' after private module fragment declaration">;
 def err_missing_before_module_end : Error<"expected %0 at end of module">;
-def err_unsupported_module_partition : Error<
-  "module partitions are only supported for C++20 onwards">;
 def err_import_not_allowed_here : Error<
   "imports must immediately follow the module declaration">;
 def err_partition_import_outside_module : Error<
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 54540193cfcc0..add6c6ac629a1 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsModulesImport : 1;
 
+  // True if this is the 'module' contextual keyword.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsModulesDecl : 1;
+
   // True if this is a mangled OpenMP variant name.
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsMangledOpenMPVariantName : 1;
@@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
         IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
         IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
         RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
-        IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
-        IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+        IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+        IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+        IsKeywordInCpp(false) {}
 
 public:
   IdentifierInfo(const IdentifierInfo &) = delete;
@@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
       RecomputeNeedsHandleIdentifier();
   }
 
+  /// Determine whether this is the contextual keyword \c module.
+  bool isModulesDeclaration() const { return IsModulesDecl; }
+
+  /// Set whether this identifier is the contextual keyword \c module.
+  void setModulesDeclaration(bool I) {
+    IsModulesDecl = I;
+    if (I)
+      NeedsHandleIdentifier = true;
+    else
+      RecomputeNeedsHandleIdentifier();
+  }
+
   /// Determine whether this is the mangled name of an OpenMP variant.
   bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; }
 
@@ -745,10 +762,11 @@ class IdentifierTable {
     // contents.
     II->Entry = &Entry;
 
-    // If this is the 'import' contextual keyword, mark it as such.
+    // If this is the 'import' or 'module' contextual keyword, mark it as such.
     if (Name == "import")
       II->setModulesImport(true);
-
+    else if (Name == "module")
+      II->setModulesDeclaration(true);
     return *II;
   }
 
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 94e72fea56a68..7750c84dbef78 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,9 @@ PPKEYWORD(pragma)
 // C23 & C++26 #embed
 PPKEYWORD(embed)
 
+// C++20 Module Directive
+PPKEYWORD(module)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -1023,6 +1026,9 @@ ANNOTATION(module_include)
 ANNOTATION(module_begin)
 ANNOTATION(module_end)
 
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
 // Annotation for a header_name token that has been looked up and transformed
 // into the name of a header unit.
 ANNOTATION(header_unit)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 0ae490f0e8073..112d3b00160fd 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader {
   /// load it.
   ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
                                                  SourceLocation ImportLoc,
-                                                 SourceLocation ModuleNameLoc,
+                                                 SourceRange ModuleNameRange,
                                                  bool IsInclusionDirective);
 
   /// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
 #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
 #include "llvm/ADT/StringRef.h"
 
 namespace clang {
 
 class IdentifierInfo;
 class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
 
 /// Callback handler that receives notifications when performing code
 /// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
   /// file where we expect natural language, e.g., a comment, string, or
   /// \#error directive.
   virtual void CodeCompleteNaturalLanguage() { }
+
+  /// Callback invoked when performing code completion inside the module name
+  /// part of an import directive.
+  virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+                                        ModuleIdPath Path) {}
 };
 
 }
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index bb65ae010cffa..a595cda1eaa77 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
   //===--------------------------------------------------------------------===//
   // Context that changes as the file is lexed.
   // NOTE: any state that mutates when in raw mode must have save/restore code
-  // in Lexer::isNextPPTokenLParen.
+  // in Lexer::peekNextPPToken.
 
   // BufferPtr - Current pointer into the buffer.  This is the next character
   // to be lexed.
@@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
     BufferPtr = TokEnd;
   }
 
-  /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
-  /// tok::l_paren token, 0 if it is something else and 2 if there are no more
-  /// tokens in the buffer controlled by this lexer.
-  unsigned isNextPPTokenLParen();
+  /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+  /// buffer controlled by this lexer, otherwise return the next unexpanded
+  /// token.
+  std::optional<Token> peekNextPPToken();
 
   //===--------------------------------------------------------------------===//
   // Lexer character reading interfaces.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index f2dfd3a349b8b..79a75a116c418 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
@@ -82,6 +83,7 @@ class PreprocessorLexer;
 class PreprocessorOptions;
 class ScratchBuffer;
 class TargetInfo;
+class ModuleNameLoc;
 
 namespace Builtin {
 class Context;
@@ -332,8 +334,9 @@ class Preprocessor {
   /// lexed, if any.
   SourceLocation ModuleImportLoc;
 
-  /// The import path for named module that we're currently processing.
-  SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+  /// The source location of the \c module contextual keyword we just
+  /// lexed, if any.
+  SourceLocation ModuleDeclLoc;
 
   llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
   unsigned CheckPointCounter = 0;
@@ -344,6 +347,21 @@ class Preprocessor {
   /// Whether the last token we lexed was an '@'.
   bool LastTokenWasAt = false;
 
+  /// Whether we're importing a standard C++20 named Modules.
+  bool ImportingCXXNamedModules = false;
+
+  /// Whether we're declaring a standard C++20 named Modules.
+  bool DeclaringCXXNamedModules = false;
+
+  struct ExportContextualKeywordInfo {
+    Token ExportTok;
+    bool TokAtPhysicalStartOfLine;
+  };
+
+  /// Whether the last token we lexed was an 'export' keyword.
+  std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword =
+      std::nullopt;
+
   /// A position within a C++20 import-seq.
   class StdCXXImportSeq {
   public:
@@ -547,12 +565,7 @@ class Preprocessor {
         reset();
     }
 
-    void handleIdentifier(IdentifierInfo *Identifier) {
-      if (isModuleCandidate() && Identifier)
-        Name += Identifier->getName().str();
-      else if (!isNamedModule())
-        reset();
-    }
+    void handleModuleName(ModuleNameLoc *Path);
 
     void handleColon() {
       if (isModuleCandidate())
@@ -561,13 +574,6 @@ class Preprocessor {
         reset();
     }
 
-    void handlePeriod() {
-      if (isModuleCandidate())
-        Name += ".";
-      else if (!isNamedModule())
-        reset();
-    }
-
     void handleSemi() {
       if (!Name.empty() && isModuleCandidate()) {
         if (State == InterfaceCandidate)
@@ -622,10 +628,6 @@ class Preprocessor {
 
   ModuleDeclSeq ModuleDeclState;
 
-  /// Whether the module import expects an identifier next. Otherwise,
-  /// it expects a '.' or ';'.
-  bool ModuleImportExpectsIdentifier = false;
-
   /// The identifier and source location of the currently-active
   /// \#pragma clang arc_cf_code_audited begin.
   IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -1759,6 +1761,19 @@ class Preprocessor {
   /// Lex the parameters for an #embed directive, returns nullopt on error.
   std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
                                                              bool ForHasEmbed);
+  bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+                             SmallVectorImpl<IdentifierLoc> &Path,
+                             bool AllowMacroExpansion = true);
+  void HandleCXXImportDirective(Token Import);
+  void HandleCXXModuleDirective(Token Module);
+  
+  /// Callback invoked when the lexer sees one of export, import or module token
+  /// at the start of a line.
+  ///
+  /// This consumes the import, module directive, modifies the
+  /// lexer/preprocessor state, and advances the lexer(s) so that the next token
+  /// read is the correct one.
+  bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine);
 
   bool LexAfterModuleImport(Token &Result);
   void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks);
@@ -2282,7 +2297,9 @@ class Preprocessor {
   /// Determine whether the next preprocessor token to be
   /// lexed is a '('.  If so, consume the token and return true, if not, this
   /// method should have no observable side-effect on the lexed tokens.
-  bool isNextPPTokenLParen();
+  bool isNextPPTokenLParen() {
+    return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
+  }
 
 private:
   /// Identifiers used for SEH handling in Borland. These are only
@@ -2342,7 +2359,7 @@ class Preprocessor {
   ///
   /// \return The location of the end of the directive (the terminating
   /// newline).
-  SourceLocation CheckEndOfDirective(const char *DirType,
+  SourceLocation CheckEndOfDirective(StringRef DirType,
                                      bool EnableMacros = false);
 
   /// Read and discard all tokens remaining on the current line until
@@ -2424,11 +2441,12 @@ class Preprocessor {
   }
 
   /// If we're importing a standard C++20 Named Modules.
-  bool isInImportingCXXNamedModules() const {
-    // NamedModuleImportPath will be non-empty only if we're importing
-    // Standard C++ named modules.
-    return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules &&
-           !IsAtImport;
+  bool isImportingCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules;
+  }
+
+  bool isDeclaringCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules;
   }
 
   /// Allocate a new MacroInfo object with the provided SourceLocation.
@@ -2661,6 +2679,10 @@ class Preprocessor {
 
   void removeCachedMacroExpandedTokensOfLastLexer();
 
+  /// Peek the next token. If so, return the token, if not, this
+  /// method should have no observable side-effect on the lexed tokens.
+  std::optional<Token> peekNextPPToken();
+
   /// After reading "MACRO(", this method is invoked to read all of the formal
   /// arguments specified for the macro invocation.  Returns null on error.
   MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
@@ -3078,6 +3100,53 @@ struct EmbedAnnotationData {
   StringRef FileName;
 };
 
+/// Represents module name annotation data.
+///
+///     module-name:
+///           module-name-qualifier[opt] identifier
+///
+///     partition-name: [C++20]
+///           : module-name-qualifier[opt] identifier
+///
+///     module-name-qualifier
+///           module-name-qualifier[opt] identifier .
+class ModuleNameLoc final
+    : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+  friend TrailingObjects;
+  unsigned NumIdentifierLocs;
+
+  unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+    return getNumIdentifierLocs();
+  }
+
+  ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+    (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>());
+  }
+
+public:
+  static std::string stringFromModuleIdPath(ModuleIdPath Path);
+  static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+  static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path);
+  unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+  ModuleIdPath getModuleIdPath() const {
+    return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()};
+  }
+
+  SourceLocation getBeginLoc() const {
+    return getModuleIdPath().front().getLoc();
+  }
+  SourceLocation getEndLoc() const {
+    auto &Last = getModuleIdPath().back();
+    return Last.getLoc().getLocWithOffset(
+        Last.getIdentifierInfo()->getLength());
+  }
+  SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+
+  std::string str() const;
+  void print(llvm::raw_ostream &OS) const;
+  void dump() const { print(llvm::errs()); }
+};
+
 /// Registry of pragma handlers added by plugins
 using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>;
 
diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h
index 4f29fb7d11415..8e81207ddf8d7 100644
--- a/clang/include/clang/Lex/Token.h
+++ b/clang/include/clang/Lex/Token.h
@@ -231,6 +231,9 @@ class Token {
     PtrData = const_cast<char*>(Ptr);
   }
 
+  template <class T> T getAnnotationValueAs() const {
+    return static_cast<T>(getAnnotationValue());
+  }
   void *getAnnotationValue() const {
     assert(isAnnotation() && "Used AnnotVal on non-annotation token");
     return PtrData;
@@ -289,6 +292,10 @@ class Token {
   /// Return the ObjC keyword kind.
   tok::ObjCKeywordKind getObjCKeywordID() const;
 
+  /// Return true if we have an C++20 Modules contextual keyword(export, import
+  /// or module).
+  bool isModuleContextualKeyword(bool AllowExport = true) const;
+
   bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const;
 
   /// Return true if this token has trigraphs or escaped newlines in it.
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 4d229ae610674..777b4e6266c71 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -139,10 +139,9 @@ class TokenLexer {
   void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
             bool OwnsTokens, bool IsReinject);
 
-  /// If the next token lexed will pop this macro off the
-  /// expansion stack, return 2.  If the next unexpanded token is a '(', return
-  /// 1, otherwise return 0.
-  unsigned isNextTokenLParen() const;
+  /// If the next token lexed will pop this macro off the expansion stack,
+  /// return std::nullopt, otherwise return the next unexpanded token.
+  std::optional<Token> peekNextPPToken() const;
 
   /// Lex and return a token from this macro stream.
   bool Lex(Token &Tok);
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index c4bef4729fd36..a59a99bbac7c6 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler {
                                  unsigned ArgumentIndex) override;
   ...
[truncated]

@yronglin yronglin marked this pull request as draft May 31, 2025 08:02
@yronglin yronglin force-pushed the modules_dependency_discovery branch from d300a2b to 04ddbf6 Compare June 2, 2025 12:33
@yronglin yronglin marked this pull request as ready for review June 2, 2025 12:33
@llvmbot llvmbot added the clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' label Jun 2, 2025
@hubert-reinterpretcast
Copy link
Collaborator

It is hard to believe it is by design to allow "export module m; int n;" while we reject others. Is is possible to adjust the wording for it?

Why is that hard to believe? The feature originally was not envisioned as a preprocessor directive, so the "line" aspect is not strongly enforced. The later adjustments made "minimal" changes, which can be seen as a sign of intent not to affect the validity of existing code if said code was not "problematic" within the scope of the problems that the papers/issues were trying to address.

@h-vetinari
Copy link
Contributor

Why is that hard to believe?

Because there's no need to mix a completely new directive with fundamentally different instructions (and parsing) on the same line. All the complexity required to support this (much less the mental stumbling blocks for readers of the code) stand in contrast to the trivial alternative of "add a newline". The putative burden of doing so doesn't even come close to justifying the resulting complexity -- it's entirely self-defeating of the standard to allow this IMO.

@hubert-reinterpretcast
Copy link
Collaborator

Why is that hard to believe?

Because there's no need to mix a completely new directive with fundamentally different instructions (and parsing) on the same line. All the complexity required to support this (much less the mental stumbling blocks for readers of the code) stand in contrast to the trivial alternative of "add a newline". The putative burden of doing so doesn't even come close to justifying the resulting complexity -- it's entirely self-defeating of the standard to allow this IMO.

I believe I have explained why this was not standardized as a "new directive" from a design viewpoint. The directives were more a means to an end.

@Bigcheese
Copy link
Contributor

It is hard to believe it is by design to allow "export module m; int n;" while we reject others. Is is possible to adjust the wording for it?

The design was about what's required to be able to figure stuff out at the start of phase 4. export module m; int n; is easily handled by scanners.

If we really wanted we could fully duplicate the attribute grammar into the preprocessor to be able to fully restrict the line, but these weren't really intended to be preprocessor control lines. That was just a good way to restrict how they could be preprocessed.

@yronglin
Copy link
Contributor Author

yronglin commented Sep 30, 2025

Thanks for the review!
The issues from comments were fixed.

I have added examples from CWG2947 into clang/test/CXX/module/cpp.pre/p1.cpp, and except the following pattern, the others were support by current implementation:

  export module M [[
  attr1,
  attr2 ]] ;                 // OK in CWG2947, but clang reject, because ; not in same line.

 export module M
  [[ attr1,
  attr2 ]] ;                 // Same as above

But this restriction can be easily removed.

What do you think?

(A question about CWG2947, seems it has not yet been accepted into std proposal) ?

@Bigcheese
Copy link
Contributor

What do you think?

I think we should implement CWG2947 as written and allow that. CWG2947 has not yet been accepted, but I don't believe there will be issues there.

@hubert-reinterpretcast
Copy link
Collaborator

I don't believe there will be issues there.

I think the caveat is "unless the implementation attempt finds problems" (which is all the more reason to try it).

The way the Core Issue resolution is worded, implementing it may prove to be difficult (depending on how the existing preprocessor is structured) because, once we interpret it as performing the check only after macros have been replaced, the interpretation starts down the path where the check is performed only after further directives are processed (which actually tracks with the intent that programs that are valid pre-P1857 remain valid unless they cause "dependency discovery" difficulties). For example:
a.h:

;

a.cppm:

export module a
#include "a.h"

Anyhow, even without throwing CWG2947 into the mix, stuff like this is allowed:

export module x _Pragma("GCC warning \"Hi\"");

… with this change, we can currectly handle 'export module x BAR;', BAR is a macro expands to '.y'

Signed-off-by: yronglin <[email protected]>
@hubert-reinterpretcast
Copy link
Collaborator

Anyhow, even without throwing CWG2947 into the mix, stuff like this is allowed:

export module x _Pragma("GCC warning \"Hi\"");

The above is working, but the seemingly simpler case is not:

export module x; _Pragma("GCC warning \"hi\"");

Gives:

<stdin>:1:18: warning: extra tokens at end of 'module' directive [-Wextra-tokens]
    1 | export module x; _Pragma("GCC warning \"hi\"");
      |                  ^
      |                  //
<stdin>:1:18: error: a type specifier is required for all declarations
1 warning and 1 error generated.

Copy link
Collaborator

@hubert-reinterpretcast hubert-reinterpretcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs tests for -E.

The line separation is not getting maintained:

export module m; int x;
extern "C++" int *y = &x;

Output:

# 1 "<stdin>"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 665 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "<stdin>" 2
export module m; int x;extern "C++" int *y = &x;

@yronglin
Copy link
Contributor Author

I think we should implement CWG2947 as written and allow that. CWG2947 has not yet been accepted, but I don't believe there will be issues there.

I've temporarily added CWG2947 support in the patch.

This needs tests for -E.

The line separation is not getting maintained:

export module m; int x;
extern "C++" int *y = &x;

Output:

# 1 "<stdin>"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 665 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "<stdin>" 2
export module m; int x;extern "C++" int *y = &x;

Hmm yes, this is a bug in current implementation. I'll fix this later.

Comment on lines 996 to 999
def err_pp_module_decl_in_header
: Error<"module declaration must not come from an #include directive">;
def err_pp_cond_span_module_decl
: Error<"preprocessor conditionals shall not span a module declaration">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are both expressions of the fact that a module directive is neither a control-line nor a text-line. Which is to say that there should be a preprocessor diagnostic (effective with -E) that triggers whenever a module directive is encountered where a control-line or a text-line is required.

In particular, such a diagnostic should only refer to "module directive"s and not to "module declaration"s.

A diagnostic is required for the following:

module;
#if 0
export module m;
#endif
// expected-error@-2 {{}}
export module m2;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these 2 diagnostic's wording is not correct here, need update module declaration to module directive. But in this code snippet, module directive in a skipped conditional block. IIUC, we should not touch the text in skipped block? Did you mean #if 1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we should not touch the text in skipped block? Did you mean #if 1?

I don't mean #if 1. There is actually no "skipped block" in the above because the module directive is not a text-line (https://wg21.link/cpp.pre#2) nor anything else that is allowed in a group (https://eel.is/c++draft/cpp.pre#nt:group-part).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this behavior make sense?
Sorry I have another question, does it's only effective with -E mode?

#if 0 // #1
export module m; // expected-error {{preprocessor conditionals shall not span a module declaration}}
#else
export module m; // expected-error {{preprocessor conditionals shall not span a module declaration}} \
                 // expected-error {{module declaration must occur at the start of the translation unit}} \
                 // expected-note@#1 {{add 'module;'}}
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the behaviour makes sense (and the behaviour should be observed even when not using -E).

The wording of the message can use a little tweaking though. The wording currently focuses on the preprocessor conditionals, but the message points to the module "declaration".

Suggestion:

module directive lines are not allowed on lines controlled by preprocessor conditionals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Signed-off-by: yronglin <[email protected]>
@yronglin
Copy link
Contributor Author

Anyhow, even without throwing CWG2947 into the mix, stuff like this is allowed:

export module x _Pragma("GCC warning \"Hi\"");

The above is working, but the seemingly simpler case is not:

export module x; _Pragma("GCC warning \"hi\"");

Gives:

<stdin>:1:18: warning: extra tokens at end of 'module' directive [-Wextra-tokens]
    1 | export module x; _Pragma("GCC warning \"hi\"");
      |                  ^
      |                  //
<stdin>:1:18: error: a type specifier is required for all declarations
1 warning and 1 error generated.

Fixed.

@hubert-reinterpretcast
Copy link
Collaborator

It seems that the code is currently checking for the module or import at the start of the line after macro expansion:

#define IMPORT import
template <typename T>
struct import;

extern
IMPORT<int> a;

is supposed to compile (based on the wording).

Accepted with IMPORT: https://godbolt.org/z/v16sEd5jK
Rejected with import (except for Clang, which is waiting for this PR): https://godbolt.org/z/Gz11axThf

@yronglin
Copy link
Contributor Author

It seems that the code is currently checking for the module or import at the start of the line after macro expansion:

#define IMPORT import
template <typename T>
struct import;

extern
IMPORT<int> a;

is supposed to compile (based on the wording).

Accepted with IMPORT: https://godbolt.org/z/v16sEd5jK Rejected with import (except for Clang, which is waiting for this PR): https://godbolt.org/z/Gz11axThf

Yeah, after this PR, the import<int> a; can be compiled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clang C++20 Feature: P1857R3 - Modules Dependency Discovery

9 participants