Skip to content

Commit 2d9e11f

Browse files
committed
[clang-format] Stop crashing when the input contains ??/\n
In the debug build, this kind of input caused the assertion in the function `countLeadingWhitespace` to fail. The release build without assertions outputted `?` `?` `/` separated by spaces. ```C #define A ??/ int i; ``` This is because the code in `countLeadingWhitespace` assumed that the underlying lexer recognized the entire `??/` sequence as a single token. In fact, the lexer recognized it as 3 separate tokens. The flag to make the lexer recognize trigraphs was never enabled. This patch enables the flag in the underlying lexer. This way, the program now either turns the trigraph into a single `\` or removes it altogether if the line is short enough. There are operators like the `??=` in C#. So the flag is not enabled for all input languages. Instead the check for the token size is moved from the assert line into the if line. The problem was introduced by my own patch 370bee4 from about 3 years ago. I added code to count the number of characters in the escape sequence probably just because the block of code used to have a comment saying someone should add the feature. Maybe I forgot to enable assertions when I ran the code. I found the problem because reviewing pull request 145243 made me look at the code again.
1 parent 113ea3d commit 2d9e11f

File tree

3 files changed

+33
-5
lines changed

3 files changed

+33
-5
lines changed

clang/lib/Format/Format.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4052,6 +4052,7 @@ LangOptions getFormattingLangOpts(const FormatStyle &Style) {
40524052
// the sequence "<::" will be unconditionally treated as "[:".
40534053
// Cf. Lexer::LexTokenInternal.
40544054
LangOpts.Digraphs = SinceCpp11;
4055+
LangOpts.Trigraphs = Style.isCpp();
40554056

40564057
LangOpts.LineComment = 1;
40574058
LangOpts.Bool = 1;

clang/lib/Format/FormatTokenLexer.cpp

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1220,12 +1220,11 @@ static size_t countLeadingWhitespace(StringRef Text) {
12201220
break;
12211221
// Splice found, consume it.
12221222
Cur = Lookahead + 1;
1223-
} else if (Cur[0] == '?' && Cur[1] == '?' && Cur[2] == '/' &&
1224-
(Cur[3] == '\n' || Cur[3] == '\r')) {
1223+
} else if (End - Cur >= 4u && Cur[0] == '?' && Cur[1] == '?' &&
1224+
Cur[2] == '/' && (Cur[3] == '\n' || Cur[3] == '\r')) {
12251225
// Newlines can also be escaped by a '?' '?' '/' trigraph. By the way, the
12261226
// characters are quoted individually in this comment because if we write
12271227
// them together some compilers warn that we have a trigraph in the code.
1228-
assert(End - Cur >= 4);
12291228
Cur += 4;
12301229
} else {
12311230
break;
@@ -1300,8 +1299,11 @@ FormatToken *FormatTokenLexer::getNextToken() {
13001299
case '\\':
13011300
case '?':
13021301
case '/':
1303-
// The text was entirely whitespace when this loop was entered. Thus
1304-
// this has to be an escape sequence.
1302+
// The code preceding the loop and in the countLeadingWhitespace
1303+
// function guarantees that Text is entirely whitespace, not including
1304+
// comments but including escaped newlines which may be escaped with a
1305+
// trigraph. So if 1 of these characters show up, then it has to be in
1306+
// an escape sequence.
13051307
assert(Text.substr(i, 4) == "\?\?/\r" ||
13061308
Text.substr(i, 4) == "\?\?/\n" ||
13071309
(i >= 1 && (Text.substr(i - 1, 4) == "\?\?/\r" ||

clang/unittests/Format/FormatTest.cpp

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6681,6 +6681,31 @@ TEST_F(FormatTest, EscapedNewlines) {
66816681
" int x(int a);",
66826682
AlignLeft);
66836683

6684+
// Escaped with a trigraph.
6685+
verifyFormat("#define A \\\n"
6686+
" int i; \\\n"
6687+
" int j;",
6688+
"#define A \?\?/\n"
6689+
"int i;\?\?/\n"
6690+
" int j;",
6691+
Narrow);
6692+
verifyFormat("#define A \\\r\n"
6693+
" int i; \\\r\n"
6694+
" int j;",
6695+
"#define A \?\?/\r\n"
6696+
"int i;\?\?/\r\n"
6697+
" int j;",
6698+
Narrow);
6699+
verifyFormat("#define A int i;", "#define A \?\?/\n"
6700+
"int i;");
6701+
verifyFormat("#define A int i;", "#define A \?\?/\r\n"
6702+
"int i;");
6703+
// In a language that does not support the trigraph, the program should not
6704+
// crash.
6705+
verifyNoCrash("#define A \?\?/\n"
6706+
"int i;",
6707+
getGoogleStyle(FormatStyle::LK_CSharp));
6708+
66846709
// CRLF line endings
66856710
verifyFormat("#define A \\\r\n int i; \\\r\n int j;",
66866711
"#define A \\\r\nint i;\\\r\n int j;", Narrow);

0 commit comments

Comments
 (0)