Skip to content

Commit e6e874c

Browse files
authored
[clang] Allow trivial pp-directives before C++ module directive (#153641)
Consider the following code: ```cpp # 1 __FILE__ 1 3 export module a; ``` According to the wording in [P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html): ``` A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) ``` and the wording in [[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file) ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` `#` is the first pp-token in the translation unit, and it was rejected by clang, but they really should be exempted from this rule. The goal is to not allow any preprocessor conditionals or most state changes, but these don't fit that. State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a `TrivialDirectiveTracer` to trace the **State change** that described above and propose to exempt the following kind of directive: `#line`, GNU line marker, `#ident`, `#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma warning`, `#pragma execution_character_set`, `#pragma clang assume_nonnull` and builtin macro expansion. Fixes #145274 --------- Signed-off-by: yronglin <[email protected]>
1 parent 145e8aa commit e6e874c

File tree

14 files changed

+762
-101
lines changed

14 files changed

+762
-101
lines changed

clang/include/clang/Lex/Lexer.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,6 @@ class Lexer : public PreprocessorLexer {
143143
/// True if this is the first time we're lexing the input file.
144144
bool IsFirstTimeLexingFile;
145145

146-
/// True if current lexing token is the first pp-token.
147-
bool IsFirstPPToken;
148-
149146
// NewLinePtr - A pointer to new line character '\n' being lexed. For '\r\n',
150147
// it also points to '\n.'
151148
const char *NewLinePtr;
Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
//===--- NoTrivialPPDirectiveTracer.h ---------------------------*- C++ -*-===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
// This file defines the NoTrivialPPDirectiveTracer interface.
10+
//
11+
//===----------------------------------------------------------------------===//
12+
13+
#ifndef LLVM_CLANG_LEX_NO_TRIVIAL_PPDIRECTIVE_TRACER_H
14+
#define LLVM_CLANG_LEX_NO_TRIVIAL_PPDIRECTIVE_TRACER_H
15+
16+
#include "clang/Lex/PPCallbacks.h"
17+
18+
namespace clang {
19+
class Preprocessor;
20+
21+
/// Consider the following code:
22+
///
23+
/// # 1 __FILE__ 1 3
24+
/// export module a;
25+
///
26+
/// According to the wording in
27+
/// [P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html):
28+
///
29+
/// A module directive may only appear as the first preprocessing tokens in a
30+
/// file (excluding the global module fragment.)
31+
///
32+
/// and the wording in
33+
/// [[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file):
34+
/// module-file:
35+
/// pp-global-module-fragment[opt] pp-module group[opt]
36+
/// pp-private-module-fragment[opt]
37+
///
38+
/// `#` is the first pp-token in the translation unit, and it was rejected by
39+
/// clang, but they really should be exempted from this rule. The goal is to not
40+
/// allow any preprocessor conditionals or most state changes, but these don't
41+
/// fit that.
42+
///
43+
/// State change would mean most semantically observable preprocessor state,
44+
/// particularly anything that is order dependent. Global flags like being a
45+
/// system header/module shouldn't matter.
46+
///
47+
/// We should exempt a brunch of directives, even though it violates the current
48+
/// standard wording.
49+
///
50+
/// This class used to trace 'no-trivial' pp-directives in main file, which may
51+
/// change the preprocessing state.
52+
///
53+
/// FIXME: Once the wording of the standard is revised, we need to follow the
54+
/// wording of the standard. Currently this is just a workaround
55+
class NoTrivialPPDirectiveTracer : public PPCallbacks {
56+
Preprocessor &PP;
57+
58+
/// Whether preprocessing main file. We only focus on the main file.
59+
bool InMainFile = true;
60+
61+
/// Whether one or more conditional, include or other 'no-trivial'
62+
/// pp-directives has seen before.
63+
bool SeenNoTrivialPPDirective = false;
64+
65+
void setSeenNoTrivialPPDirective();
66+
67+
public:
68+
NoTrivialPPDirectiveTracer(Preprocessor &P) : PP(P) {}
69+
70+
bool hasSeenNoTrivialPPDirective() const;
71+
72+
/// Callback invoked whenever the \p Lexer moves to a different file for
73+
/// lexing. Unlike \p FileChanged line number directives and other related
74+
/// pragmas do not trigger callbacks to \p LexedFileChanged.
75+
///
76+
/// \param FID The \p FileID that the \p Lexer moved to.
77+
///
78+
/// \param Reason Whether the \p Lexer entered a new file or exited one.
79+
///
80+
/// \param FileType The \p CharacteristicKind of the file the \p Lexer moved
81+
/// to.
82+
///
83+
/// \param PrevFID The \p FileID the \p Lexer was using before the change.
84+
///
85+
/// \param Loc The location where the \p Lexer entered a new file from or the
86+
/// location that the \p Lexer moved into after exiting a file.
87+
void LexedFileChanged(FileID FID, LexedFileChangeReason Reason,
88+
SrcMgr::CharacteristicKind FileType, FileID PrevFID,
89+
SourceLocation Loc) override;
90+
91+
/// Callback invoked whenever an embed directive has been processed,
92+
/// regardless of whether the embed will actually find a file.
93+
///
94+
/// \param HashLoc The location of the '#' that starts the embed directive.
95+
///
96+
/// \param FileName The name of the file being included, as written in the
97+
/// source code.
98+
///
99+
/// \param IsAngled Whether the file name was enclosed in angle brackets;
100+
/// otherwise, it was enclosed in quotes.
101+
///
102+
/// \param File The actual file that may be included by this embed directive.
103+
///
104+
/// \param Params The parameters used by the directive.
105+
void EmbedDirective(SourceLocation HashLoc, StringRef FileName, bool IsAngled,
106+
OptionalFileEntryRef File,
107+
const LexEmbedParametersResult &Params) override {
108+
setSeenNoTrivialPPDirective();
109+
}
110+
111+
/// Callback invoked whenever an inclusion directive of
112+
/// any kind (\c \#include, \c \#import, etc.) has been processed, regardless
113+
/// of whether the inclusion will actually result in an inclusion.
114+
///
115+
/// \param HashLoc The location of the '#' that starts the inclusion
116+
/// directive.
117+
///
118+
/// \param IncludeTok The token that indicates the kind of inclusion
119+
/// directive, e.g., 'include' or 'import'.
120+
///
121+
/// \param FileName The name of the file being included, as written in the
122+
/// source code.
123+
///
124+
/// \param IsAngled Whether the file name was enclosed in angle brackets;
125+
/// otherwise, it was enclosed in quotes.
126+
///
127+
/// \param FilenameRange The character range of the quotes or angle brackets
128+
/// for the written file name.
129+
///
130+
/// \param File The actual file that may be included by this inclusion
131+
/// directive.
132+
///
133+
/// \param SearchPath Contains the search path which was used to find the file
134+
/// in the file system. If the file was found via an absolute include path,
135+
/// SearchPath will be empty. For framework includes, the SearchPath and
136+
/// RelativePath will be split up. For example, if an include of "Some/Some.h"
137+
/// is found via the framework path
138+
/// "path/to/Frameworks/Some.framework/Headers/Some.h", SearchPath will be
139+
/// "path/to/Frameworks/Some.framework/Headers" and RelativePath will be
140+
/// "Some.h".
141+
///
142+
/// \param RelativePath The path relative to SearchPath, at which the include
143+
/// file was found. This is equal to FileName except for framework includes.
144+
///
145+
/// \param SuggestedModule The module suggested for this header, if any.
146+
///
147+
/// \param ModuleImported Whether this include was translated into import of
148+
/// \p SuggestedModule.
149+
///
150+
/// \param FileType The characteristic kind, indicates whether a file or
151+
/// directory holds normal user code, system code, or system code which is
152+
/// implicitly 'extern "C"' in C++ mode.
153+
///
154+
void InclusionDirective(SourceLocation HashLoc, const Token &IncludeTok,
155+
StringRef FileName, bool IsAngled,
156+
CharSourceRange FilenameRange,
157+
OptionalFileEntryRef File, StringRef SearchPath,
158+
StringRef RelativePath, const Module *SuggestedModule,
159+
bool ModuleImported,
160+
SrcMgr::CharacteristicKind FileType) override {
161+
setSeenNoTrivialPPDirective();
162+
}
163+
164+
/// Callback invoked whenever there was an explicit module-import
165+
/// syntax.
166+
///
167+
/// \param ImportLoc The location of import directive token.
168+
///
169+
/// \param Path The identifiers (and their locations) of the module
170+
/// "path", e.g., "std.vector" would be split into "std" and "vector".
171+
///
172+
/// \param Imported The imported module; can be null if importing failed.
173+
///
174+
void moduleImport(SourceLocation ImportLoc, ModuleIdPath Path,
175+
const Module *Imported) override {
176+
setSeenNoTrivialPPDirective();
177+
}
178+
179+
/// Callback invoked when the end of the main file is reached.
180+
///
181+
/// No subsequent callbacks will be made.
182+
void EndOfMainFile() override { setSeenNoTrivialPPDirective(); }
183+
184+
/// Callback invoked when start reading any pragma directive.
185+
void PragmaDirective(SourceLocation Loc,
186+
PragmaIntroducerKind Introducer) override {}
187+
188+
/// Called by Preprocessor::HandleMacroExpandedIdentifier when a
189+
/// macro invocation is found.
190+
void MacroExpands(const Token &MacroNameTok, const MacroDefinition &MD,
191+
SourceRange Range, const MacroArgs *Args) override;
192+
193+
/// Hook called whenever a macro definition is seen.
194+
void MacroDefined(const Token &MacroNameTok,
195+
const MacroDirective *MD) override {
196+
setSeenNoTrivialPPDirective();
197+
}
198+
199+
/// Hook called whenever a macro \#undef is seen.
200+
/// \param MacroNameTok The active Token
201+
/// \param MD A MacroDefinition for the named macro.
202+
/// \param Undef New MacroDirective if the macro was defined, null otherwise.
203+
///
204+
/// MD is released immediately following this callback.
205+
void MacroUndefined(const Token &MacroNameTok, const MacroDefinition &MD,
206+
const MacroDirective *Undef) override {
207+
setSeenNoTrivialPPDirective();
208+
}
209+
210+
/// Hook called whenever the 'defined' operator is seen.
211+
/// \param MD The MacroDirective if the name was a macro, null otherwise.
212+
void Defined(const Token &MacroNameTok, const MacroDefinition &MD,
213+
SourceRange Range) override {
214+
setSeenNoTrivialPPDirective();
215+
}
216+
217+
/// Hook called whenever an \#if is seen.
218+
/// \param Loc the source location of the directive.
219+
/// \param ConditionRange The SourceRange of the expression being tested.
220+
/// \param ConditionValue The evaluated value of the condition.
221+
///
222+
// FIXME: better to pass in a list (or tree!) of Tokens.
223+
void If(SourceLocation Loc, SourceRange ConditionRange,
224+
ConditionValueKind ConditionValue) override {
225+
setSeenNoTrivialPPDirective();
226+
}
227+
228+
/// Hook called whenever an \#elif is seen.
229+
/// \param Loc the source location of the directive.
230+
/// \param ConditionRange The SourceRange of the expression being tested.
231+
/// \param ConditionValue The evaluated value of the condition.
232+
/// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive.
233+
// FIXME: better to pass in a list (or tree!) of Tokens.
234+
void Elif(SourceLocation Loc, SourceRange ConditionRange,
235+
ConditionValueKind ConditionValue, SourceLocation IfLoc) override {
236+
setSeenNoTrivialPPDirective();
237+
}
238+
239+
/// Hook called whenever an \#ifdef is seen.
240+
/// \param Loc the source location of the directive.
241+
/// \param MacroNameTok Information on the token being tested.
242+
/// \param MD The MacroDefinition if the name was a macro, null otherwise.
243+
void Ifdef(SourceLocation Loc, const Token &MacroNameTok,
244+
const MacroDefinition &MD) override {
245+
setSeenNoTrivialPPDirective();
246+
}
247+
248+
/// Hook called whenever an \#elifdef branch is taken.
249+
/// \param Loc the source location of the directive.
250+
/// \param MacroNameTok Information on the token being tested.
251+
/// \param MD The MacroDefinition if the name was a macro, null otherwise.
252+
void Elifdef(SourceLocation Loc, const Token &MacroNameTok,
253+
const MacroDefinition &MD) override {
254+
setSeenNoTrivialPPDirective();
255+
}
256+
/// Hook called whenever an \#elifdef is skipped.
257+
/// \param Loc the source location of the directive.
258+
/// \param ConditionRange The SourceRange of the expression being tested.
259+
/// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive.
260+
// FIXME: better to pass in a list (or tree!) of Tokens.
261+
void Elifdef(SourceLocation Loc, SourceRange ConditionRange,
262+
SourceLocation IfLoc) override {
263+
setSeenNoTrivialPPDirective();
264+
}
265+
266+
/// Hook called whenever an \#ifndef is seen.
267+
/// \param Loc the source location of the directive.
268+
/// \param MacroNameTok Information on the token being tested.
269+
/// \param MD The MacroDefiniton if the name was a macro, null otherwise.
270+
void Ifndef(SourceLocation Loc, const Token &MacroNameTok,
271+
const MacroDefinition &MD) override {
272+
setSeenNoTrivialPPDirective();
273+
}
274+
275+
/// Hook called whenever an \#elifndef branch is taken.
276+
/// \param Loc the source location of the directive.
277+
/// \param MacroNameTok Information on the token being tested.
278+
/// \param MD The MacroDefinition if the name was a macro, null otherwise.
279+
void Elifndef(SourceLocation Loc, const Token &MacroNameTok,
280+
const MacroDefinition &MD) override {
281+
setSeenNoTrivialPPDirective();
282+
}
283+
/// Hook called whenever an \#elifndef is skipped.
284+
/// \param Loc the source location of the directive.
285+
/// \param ConditionRange The SourceRange of the expression being tested.
286+
/// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive.
287+
// FIXME: better to pass in a list (or tree!) of Tokens.
288+
void Elifndef(SourceLocation Loc, SourceRange ConditionRange,
289+
SourceLocation IfLoc) override {
290+
setSeenNoTrivialPPDirective();
291+
}
292+
293+
/// Hook called whenever an \#else is seen.
294+
/// \param Loc the source location of the directive.
295+
/// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive.
296+
void Else(SourceLocation Loc, SourceLocation IfLoc) override {
297+
setSeenNoTrivialPPDirective();
298+
}
299+
300+
/// Hook called whenever an \#endif is seen.
301+
/// \param Loc the source location of the directive.
302+
/// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive.
303+
void Endif(SourceLocation Loc, SourceLocation IfLoc) override {
304+
setSeenNoTrivialPPDirective();
305+
}
306+
};
307+
308+
} // namespace clang
309+
310+
#endif // LLVM_CLANG_LEX_NO_TRIVIAL_PPDIRECTIVE_TRACER_H

clang/include/clang/Lex/Preprocessor.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ class PreprocessorLexer;
8282
class PreprocessorOptions;
8383
class ScratchBuffer;
8484
class TargetInfo;
85+
class NoTrivialPPDirectiveTracer;
8586

8687
namespace Builtin {
8788
class Context;
@@ -353,6 +354,11 @@ class Preprocessor {
353354
/// First pp-token source location in current translation unit.
354355
SourceLocation FirstPPTokenLoc;
355356

357+
/// A preprocessor directive tracer to trace whether the preprocessing
358+
/// state changed. These changes would mean most semantically observable
359+
/// preprocessor state, particularly anything that is order dependent.
360+
NoTrivialPPDirectiveTracer *DirTracer = nullptr;
361+
356362
/// A position within a C++20 import-seq.
357363
class StdCXXImportSeq {
358364
public:
@@ -609,6 +615,8 @@ class Preprocessor {
609615
return State == NamedModuleImplementation && !getName().contains(':');
610616
}
611617

618+
bool isNotAModuleDecl() const { return State == NotAModuleDecl; }
619+
612620
StringRef getName() const {
613621
assert(isNamedModule() && "Can't get name from a non named module");
614622
return Name;
@@ -3091,6 +3099,10 @@ class Preprocessor {
30913099
bool setDeserializedSafeBufferOptOutMap(
30923100
const SmallVectorImpl<SourceLocation> &SrcLocSeqs);
30933101

3102+
/// Whether we've seen pp-directives which may have changed the preprocessing
3103+
/// state.
3104+
bool hasSeenNoTrivialPPDirective() const;
3105+
30943106
private:
30953107
/// Helper functions to forward lexing to the actual lexer. They all share the
30963108
/// same signature.

clang/include/clang/Lex/Token.h

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -86,12 +86,12 @@ class Token {
8686
// macro stringizing or charizing operator.
8787
CommaAfterElided = 0x200, // The comma following this token was elided (MS).
8888
IsEditorPlaceholder = 0x400, // This identifier is a placeholder.
89-
90-
IsReinjected = 0x800, // A phase 4 token that was produced before and
91-
// re-added, e.g. via EnterTokenStream. Annotation
92-
// tokens are *not* reinjected.
93-
FirstPPToken = 0x1000, // This token is the first pp token in the
94-
// translation unit.
89+
IsReinjected = 0x800, // A phase 4 token that was produced before and
90+
// re-added, e.g. via EnterTokenStream. Annotation
91+
// tokens are *not* reinjected.
92+
HasSeenNoTrivialPPDirective =
93+
0x1000, // Whether we've seen any 'no-trivial' pp-directives before
94+
// current position.
9595
};
9696

9797
tok::TokenKind getKind() const { return Kind; }
@@ -321,8 +321,9 @@ class Token {
321321
/// lexer uses identifier tokens to represent placeholders.
322322
bool isEditorPlaceholder() const { return getFlag(IsEditorPlaceholder); }
323323

324-
/// Returns true if this token is the first pp-token.
325-
bool isFirstPPToken() const { return getFlag(FirstPPToken); }
324+
bool hasSeenNoTrivialPPDirective() const {
325+
return getFlag(HasSeenNoTrivialPPDirective);
326+
}
326327
};
327328

328329
/// Information about the conditional stack (\#if directives)

0 commit comments

Comments
 (0)