Skip to content

Commit b136726

Browse files
fix: accept unterminated block comments at EOF in scanner (#232)
The scanner previously rejected unterminated /* comments at EOF, causing tree-sitter to parse the comment delimiters as operators (multiplicative_expression, spread_expression). This matches JetBrains PSI behavior which recognizes unclosed /* as a BLOCK_COMMENT token. Fixes 4 cross-validation fixtures (BlockCommentAtBeginningOfFile 1-4), improving match rate from 97/124 (78.2%) to 101/126 (80.2%).
1 parent ec0da48 commit b136726

File tree

6 files changed

+66
-7
lines changed

6 files changed

+66
-7
lines changed

src/scanner.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,16 @@ static bool scan_multiline_comment(TSLexer *lexer) {
224224
}
225225
break;
226226
case '\0':
227+
// Accept unterminated block comments at EOF rather than rejecting them.
228+
// This matches JetBrains PSI behavior which recognizes unclosed /* as a
229+
// BLOCK_COMMENT token (plus an error element). Without this, the scanner
230+
// returns false and tree-sitter tries to parse the comment delimiters
231+
// as operators/expressions.
232+
if (lexer->eof(lexer)) {
233+
lexer->result_symbol = MULTILINE_COMMENT;
234+
lexer->mark_end(lexer);
235+
return true;
236+
}
227237
return false;
228238
default:
229239
advance(lexer);
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
==================
2+
BlockCommentAtBeginningOfFile1
3+
==================
4+
5+
// COMPILATION_ERRORS
6+
7+
/*
8+
---
9+
10+
(source_file
11+
(line_comment)
12+
(multiline_comment))
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
==================
2+
BlockCommentAtBeginningOfFile2
3+
==================
4+
5+
// COMPILATION_ERRORS
6+
7+
/*
8+
/*
9+
---
10+
11+
(source_file
12+
(line_comment)
13+
(multiline_comment))
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
==================
2+
BlockCommentAtBeginningOfFile3
3+
==================
4+
5+
// COMPILATION_ERRORS
6+
7+
/*
8+
9+
fooo
10+
---
11+
12+
(source_file
13+
(line_comment)
14+
(multiline_comment))
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
==================
2+
BlockCommentAtBeginningOfFile4
3+
==================
4+
5+
// COMPILATION_ERRORS
6+
7+
/*
8+
9+
/*foo*/
10+
11+
asdfas
12+
---
13+
14+
(source_file
15+
(line_comment)
16+
(multiline_comment))

tools/cross-validation/excluded.txt

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,9 @@
2020
# 3. Run: npm test (to verify the new corpus test passes)
2121

2222
# =============================================================================
23-
# MISMATCH: Grammar produces wrong AST structure (26 files)
23+
# MISMATCH: Grammar produces wrong AST structure (24 files)
2424
# =============================================================================
2525

26-
# --- unterminated_block_comment (tree-sitter limitation) ---
27-
BlockCommentAtBeginningOfFile3
28-
BlockCommentAtBeginningOfFile4
29-
3026
# --- duplicate_accessor (error recovery difference) ---
3127
DuplicateAccessor
3228

@@ -88,8 +84,6 @@ AbsentInnerType
8884
AnnotatedIntersections
8985
AssertNotNull
9086
BackslashInString
91-
BlockCommentAtBeginningOfFile1
92-
BlockCommentAtBeginningOfFile2
9387
CallsInWhen
9488
CollectionLiterals
9589
CommentsBinding

0 commit comments

Comments
 (0)