feat(typescript): upgrade to ANTLR v4 grammar with JSX/TSX support#40
feat(typescript): upgrade to ANTLR v4 grammar with JSX/TSX support#40
Conversation
- Update method calls and property accesses to match new grammar (e.g., `typeRef()` → `type_()`, `Identifier()` → `identifier()`) - Refactor parameter and variable handling to support new AST structures - Improve import parsing with new `importFromBlock` and namespace support - Enhance arrow function and decorator detection with parent context traversal - Add helper methods for position tracking and template string lexing
📝 WalkthroughWalkthroughThis pull request comprehensively updates the TypeScript ANTLR lexer and parser grammars to support extended type system features, JSX/TSX elements, bigint literals, and improved module import/export handling. Supporting listener and base classes are enhanced to correctly handle the new grammar rules and maintain null-safety. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR upgrades the TypeScript parser from an older ANTLR grammar to Andrew Leppard's ANTLR v4 grammar with comprehensive JSX/TSX support for React component parsing. The changes enable modern TypeScript syntax features and improve compatibility with JSX elements.
Changes:
- Upgraded grammar files (TypeScriptParser.g4 and TypeScriptLexer.g4) to ANTLR v4 with JSX/TSX support
- Updated listener implementations to adapt to new grammar rule names and structures
- Added JSX element parsing support with tag name matching and expression handling
- Enhanced TypeScript syntax support (definite assignment assertion, optional chaining, generic calls, etc.)
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| TypeScriptParser.g4 | New ANTLR v4 grammar with JSX/TSX rules, updated type syntax, and modern TypeScript features |
| TypeScriptLexer.g4 | Updated lexer with new tokens (QuestionMarkDot, Power, NullCoalesce, etc.) and template string handling |
| TypeScriptFullIdentListener.kt | Adapted to new grammar with improved export handling, arrow function detection, and JSX support |
| TypeScriptAstListener.kt | Updated type reference handling and decorator processing for new grammar |
| TypeScriptParserBase.java | Added JSX tag name tracking and new predicate methods |
| TypeScriptLexerBase.java | Added StartTemplateString() helper method |
Comments suppressed due to low confidence (2)
chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4:1
- The JSX grammar rules (lines 1089-1091) don't support unclosed tags like
<Tag>to avoid conflicts with TypeScript generic syntax<T>. This is mentioned in the comment at line 1086, but should also be documented in the PR description or a separate grammar documentation file as this is a significant parsing limitation that developers should be aware of.
/*
chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4:1
- The property declaration now supports both definite assignment assertion
!and optional?markers (line 616). The grammar allows both simultaneously likeprop?!: Type, which is semantically invalid in TypeScript. Consider adding a semantic predicate or validation to ensure only one modifier is present.
/*
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private inline fun <reified T> findParentOfType(ctx: org.antlr.v4.runtime.ParserRuleContext?, maxDepth: Int): T? { | ||
| var current: org.antlr.v4.runtime.tree.ParseTree? = ctx?.parent | ||
| for (i in 0 until maxDepth) { | ||
| if (current is T) return current as T | ||
| current = (current as? org.antlr.v4.runtime.ParserRuleContext)?.parent | ||
| } | ||
| return null | ||
| } |
There was a problem hiding this comment.
The findParentOfType function uses a magic number for maxDepth in multiple call sites (e.g., 5 in lines 582, 583, 584, 711). Consider extracting this as a named constant (e.g., MAX_PARENT_SEARCH_DEPTH) to improve maintainability and make the search depth configurable from a single location.
|
|
||
| is TypeScriptParser.MemberDotExpressionContext -> { | ||
| when (val child = singleExpr.singleExpression().first()) { | ||
| when (val child = singleExpr.singleExpression()) { |
There was a problem hiding this comment.
In the new grammar, MemberDotExpressionContext.singleExpression() returns a single SingleExpressionContext, not a list. The code correctly accesses it as a single element here, but line 925 in parseArguments also calls it.singleExpression() without .first(), which is inconsistent with the old grammar's list-based access pattern. Verify that all call sites have been updated consistently to match the new grammar's single-element return type.
| private fun hasExportPrefix(ctx: ParserRuleContext, maxDepth: Int): Boolean { | ||
| var current: ParserRuleContext? = ctx.parent as? ParserRuleContext | ||
| for (i in 0 until maxDepth) { | ||
| if (current != null && current.childCount > 0 && current.getChild(0).text == "export") { | ||
| return true | ||
| } | ||
| current = current?.parent as? ParserRuleContext | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
The hasExportPrefix function assumes that the 'export' keyword is always at getChild(0). This is fragile if the grammar changes or if there are decorators/comments before 'export'. Consider checking all children or using a more robust pattern-matching approach that looks for the Export token type rather than text matching.
| FilePath = filePath, | ||
| Name = grad.identifierName().text, | ||
| Parameters = this.buildArrowFunctionParameters(ctx.arrowFunctionParameters()), | ||
| Name = varDecl.identifierOrKeyWord()?.text ?: varDecl.getChild(0)?.text ?: "", |
There was a problem hiding this comment.
The fallback varDecl.getChild(0)?.text ?: \"\" could capture unexpected tokens if identifierOrKeyWord() is null. Consider adding a comment explaining when this fallback is necessary or logging a warning when the primary accessor fails to help with debugging unexpected grammar structures.
| when (expr) { | ||
| is TypeScriptParser.ObjectLiteralExpressionContext -> { | ||
| val objectLiteral = parseObjectLiteral(expr.objectLiteral()) | ||
| return@map CodeProperty( | ||
| TypeValue = expr.text, | ||
| TypeType = "object", | ||
| ObjectValue = objectLiteral | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| val typeValue: String = when (expr) { |
There was a problem hiding this comment.
The processArgumentList function has a when expression at line 1234 that returns early for ObjectLiteralExpressionContext, followed by another when expression at line 1245 that processes other cases. This creates redundant control flow. Consider combining these into a single when expression to improve readability and avoid the early return pattern.
| fun processRef(typeRef: TypeScriptParser.TypeRefContext?): String? { | ||
| if (typeRef == null) return null | ||
| return typeRef.text | ||
| } |
There was a problem hiding this comment.
The processRef function signature changed from returning String to String? (nullable). All call sites should null-check the result or use the Elvis operator. Review line 275 in TypeScriptFullIdentListener where processRef is called with ?: \"\" - this is correct, but verify all other call sites handle null properly.
| // Normalize: sometimes chained call parsing can emit an empty call name. | ||
| // If we already have a chain name (contains "->"), reuse it for blank entries. | ||
| val chainName = currentFunc.FunctionCalls.firstOrNull { it.FunctionName.contains("->") }?.FunctionName | ||
| if (!chainName.isNullOrBlank()) { | ||
| currentFunc.FunctionCalls = currentFunc.FunctionCalls.map { call -> | ||
| if (call.FunctionName.isBlank()) call.copy(FunctionName = chainName) else call | ||
| } | ||
| } |
There was a problem hiding this comment.
The normalization logic at lines 674-679 that replaces blank function names with chain names appears to be a workaround for incomplete parsing. This could mask underlying issues where function names should have been captured correctly. Consider investigating why blank function names occur and fixing the root cause rather than normalizing after the fact.
| // Normalize: sometimes chained call parsing can emit an empty call name. | |
| // If we already have a chain name (contains "->"), reuse it for blank entries. | |
| val chainName = currentFunc.FunctionCalls.firstOrNull { it.FunctionName.contains("->") }?.FunctionName | |
| if (!chainName.isNullOrBlank()) { | |
| currentFunc.FunctionCalls = currentFunc.FunctionCalls.map { call -> | |
| if (call.FunctionName.isBlank()) call.copy(FunctionName = chainName) else call | |
| } | |
| } | |
| // Drop any function calls that do not have a valid name instead of | |
| // attempting to normalize them based on chained call parsing. | |
| currentFunc.FunctionCalls = currentFunc.FunctionCalls.filter { !it.FunctionName.isBlank() } |
| var left = buildCallChain(expr.singleExpression()) | ||
| if (left.isBlank()) { | ||
| // Best-effort: recover base identifier from raw text like `axios<Module[]>({..})` | ||
| val raw = expr.singleExpression()?.text ?: "" | ||
| left = Regex("^[A-Za-z_$][A-Za-z0-9_$]*").find(raw)?.value ?: "" | ||
| } | ||
| if (left.isBlank()) { | ||
| // Fallback: recover from the full member expression text | ||
| val rawAll = expr.text | ||
| left = Regex("^[A-Za-z_$][A-Za-z0-9_$]*").find(rawAll)?.value ?: "" | ||
| } |
There was a problem hiding this comment.
The buildCallChain function has multiple fallback mechanisms with regex parsing (lines 874-883). This suggests the recursive parsing may not handle all cases correctly. The regex ^[A-Za-z_$][A-Za-z0-9_$]* is duplicated. Consider extracting it as a constant and documenting specific grammar patterns that trigger each fallback path.
| protected boolean notOpenBraceAndNotFunction() { | ||
| int nextTokenType = _input.LT(1).getType(); | ||
| return nextTokenType != TypeScriptParser.OpenBrace && nextTokenType != TypeScriptParser.Function; | ||
| return nextTokenType != TypeScriptParser.OpenBrace && nextTokenType != TypeScriptParser.Function_; |
There was a problem hiding this comment.
The token name changed from Function to Function_ (with underscore suffix). This is likely due to 'function' being a reserved word in some contexts. Verify this is intentional and consistent with the new grammar's token naming conventions.
| "get" | ||
| } else { | ||
| childCtx.setAccessor().identifierName()?.text ?: "set" | ||
| "set" |
There was a problem hiding this comment.
The getter/setter accessor handling changed to always return "get" or "set" as the function name (lines 286-288), ignoring the identifierName(). The comment at line 284 states this matches old behavior, but this means accessor property names are lost. Consider whether this is the desired behavior or if the property name should be preserved in a different field (e.g., as a parameter or in additional metadata).
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #40 +/- ##
============================================
- Coverage 76.07% 74.42% -1.65%
- Complexity 1089 1127 +38
============================================
Files 69 69
Lines 4201 4333 +132
Branches 764 828 +64
============================================
+ Hits 3196 3225 +29
- Misses 567 618 +51
- Partials 438 490 +52 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
chapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptAstListener.kt (1)
93-103: Potential NPE: Force unwrap on nullable parameter.
buildRestParameteraccepts a nullableRestParameterContext?but immediately force-unwraps it withrestCtx!!on line 95. While current callers always pass non-null values, this is fragile.Suggested fix
private fun buildRestParameter(restCtx: TypeScriptParser.RestParameterContext?): CodeProperty { + if (restCtx == null) { + return CodeProperty(TypeValue = "", TypeType = "") + } var paramType = "" - if (restCtx!!.typeAnnotation() != null) { + if (restCtx.typeAnnotation() != null) { paramType = buildTypeAnnotation(restCtx.typeAnnotation()) }chapi-ast-typescript/src/main/java/chapi/ast/antlr/TypeScriptParserBase.java (1)
181-185: Fix case-sensitive JSX tag matching inpopHtmlTagName.JSX tag names are case-sensitive —
<Div>(component) is distinct from<div>(HTML element). The current implementation usesequalsIgnoreCase, which incorrectly allows mismatched closing tags like<Div>...</div>to parse successfully. This violates JSX semantics and could lead to incorrect AST generation.Change to case-sensitive comparison:
Suggested fix
protected boolean popHtmlTagName(String tagName) { return tagName.equals(_tagNames.pop()); }
🤖 Fix all issues with AI agents
In `@chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4`:
- Line 293: Create a new fragment HexDigitNoSeparator: [0-9a-fA-F] and replace
usage of HexDigit in the escape sequence rules so that HexEscapeSequence,
UnicodeEscapeSequence, and ExtendedUnicodeEscapeSequence use HexDigitNoSeparator
(leaving the existing HexDigit fragment with underscore for numeric literal
rules only); update the rules named HexEscapeSequence, UnicodeEscapeSequence,
and ExtendedUnicodeEscapeSequence to reference HexDigitNoSeparator instead of
HexDigit to prevent underscores in escape sequences.
In `@chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4`:
- Around line 1114-1117: The htmlTagName rule only allows identifierOrKeyWord
and dotted segments but lacks hyphenated segments required for custom elements;
update htmlTagName so each name segment is defined as identifierOrKeyWord ('-'
identifierOrKeyWord)* and allow repeated dotted segments, i.e. make the rule
match identifierOrKeyWord ('-' identifierOrKeyWord)* ('.' identifierOrKeyWord
('-' identifierOrKeyWord)*)* to mirror htmlAttributeName's hyphen handling and
support custom element names like my-component and dotted names like Form.Item.
- Around line 758-759: The two alternatives in the grammar that currently share
the same label `MemberDotExpression` must be given unique labels; locate the two
alternatives `singleExpression '!'? '.' '#'? identifierName typeGeneric? #
MemberDotExpression` and `singleExpression '?'? '.' '#'? identifierName
typeGeneric? # MemberDotExpression` and rename one (for example change the
non-null variant to `# MemberDotExpressionNonNull` and the optional-chaining
variant to `# MemberDotExpressionOptional` or similar) so each alternative has a
distinct label and ANTLR can generate code successfully.
- Around line 708-716: The SpreadOperator alternative in the propertyAssignment
rule currently allows an optional Ellipsis (Ellipsis?), which causes bare
expressions to be parsed as spreads and overlaps with PropertyShorthand; update
the grammar so spread properties require the literal Ellipsis token by changing
the alternative from "Ellipsis? singleExpression" to "Ellipsis singleExpression"
(i.e., make Ellipsis mandatory) inside the propertyAssignment rule
(SpreadOperator), ensuring the rule now enforces the "..." prefix for object
spread properties and prevents conflicts with identifierOrKeyWord /
PropertyShorthand.
- Around line 1119-1124: The htmlAttribute rule currently allows a spread-like
brace without the dots by using '{' Ellipsis? singleExpression '}' — change it
to require the Ellipsis token so spread attributes are only parsed as '{'
Ellipsis singleExpression '}' and update/confirm the Ellipsis lexer token is
defined as '...' (or equivalent) so JSX spreads like {...props} are enforced;
reference htmlAttribute and the Ellipsis token when making this change.
🧹 Nitpick comments (1)
chapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptFullIdentListener.kt (1)
76-85: Consider using token type check instead of string comparison.
hasExportPrefixchecksgetChild(0).text == "export"which is string-based. Using token type comparison would be more robust.Alternative using token type
private fun hasExportPrefix(ctx: ParserRuleContext, maxDepth: Int): Boolean { var current: ParserRuleContext? = ctx.parent as? ParserRuleContext for (i in 0 until maxDepth) { val firstChild = current?.getChild(0) if (firstChild is TerminalNodeImpl && firstChild.symbol.type == TypeScriptParser.Export) { return true } current = current?.parent as? ParserRuleContext } return false }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4chapi-ast-typescript/src/main/java/chapi/ast/antlr/TypeScriptLexerBase.javachapi-ast-typescript/src/main/java/chapi/ast/antlr/TypeScriptParserBase.javachapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptAstListener.ktchapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptFullIdentListener.kt
🧰 Additional context used
🧬 Code graph analysis (1)
chapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptFullIdentListener.kt (3)
chapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptAstListener.kt (2)
buildPosition(33-41)buildAnnotation(129-144)chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaAstListener.kt (2)
buildPosition(42-49)buildAnnotation(15-40)chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonAstBaseListener.kt (2)
buildPosition(75-82)buildAnnotation(62-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: CodeQL analysis (java)
- GitHub Check: Agent
- GitHub Check: build (macos-latest)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: build (windows-latest)
🔇 Additional comments (20)
chapi-ast-typescript/src/main/java/chapi/ast/antlr/TypeScriptParserBase.java (1)
65-103: LGTM! New predicate methods are well-structured.The helper methods provide clear token exclusion checks for grammar parsing decisions. The naming is descriptive and the pattern is consistent.
chapi-ast-typescript/src/main/java/chapi/ast/antlr/TypeScriptLexerBase.java (1)
130-133: Empty hook method is acceptable but consider documenting intent.The
StartTemplateString()method is a no-op placeholder called from the lexer grammar. If this is intentional for future extension or debugging hooks, the current comment is sufficient. If actual logic is expected, this should be implemented.chapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptAstListener.kt (4)
43-55: LGTM! Type annotation resolution updated for new grammar.The change from
typeRef()totype_()aligns with the updated parser grammar. The null-safety improvements with early returns are clean.
57-60: LGTM! Nullable handling for processRef is appropriate.Returning
nullwhen input isnullis the correct approach for this helper.
62-91: LGTM! Parameter building logic is clearer with explicit cases.The refactored method properly handles rest-only, normal, and trailing rest parameter scenarios with appropriate null guards.
129-144: LGTM! Decorator annotation extraction updated for new grammar.Using
identifier()instead ofIdentifier()aligns with the updated parser naming. The safe access pattern with?.text ?: ""is appropriate.chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4 (4)
33-45: LGTM! Grammar preamble and options are well-configured.The ANTLR formatting directives, ERROR channel, and superClass specification follow best practices.
53-113: LGTM! Token definitions are comprehensive.The punctuation and operator tokens cover TypeScript/JavaScript syntax including modern operators like
??,?.,**=, and??=.
125-141: LGTM! Numeric literal support includes modern features.BigInt literals (
123n) and numeric separators are properly supported. The octal strict mode check is correctly implemented.
253-258: LGTM! Template string mode handling is correct.The TEMPLATE mode with depth tracking for nested template expressions (
${...}) is properly implemented with pushMode/popMode transitions.chapi-ast-typescript/src/main/kotlin/chapi/ast/typescriptast/TypeScriptFullIdentListener.kt (10)
45-74: LGTM! Variable statement handling properly separates export vs non-export paths.The logic correctly distinguishes between exported and non-exported variables, recording exports and fields appropriately.
192-242: LGTM! Class declaration handling properly adjusts position for decorators.The position adjustment using the
Class()token ensures consistent behavior with the old grammar where decorators were separate. The decorator handling for both standalone and inline decorators is comprehensive.
471-525: LGTM! New import syntax handling is comprehensive.The
enterImportFromBlockmethod handles various import forms: named imports, namespace imports, and default imports. The null-safety checks are appropriate.
529-548: LGTM! Import alias declarations properly handle require() and namespace forms.Both
import x = require("module")andimport ns = namespace.pathpatterns are correctly processed.
578-630: Arrow function detection uses appropriate parent traversal.The
findParentOfTypehelper provides clean detection of variable declarations and argument contexts. The fallback to mark unhandled cases as callbacks prevents stack mismatches.
632-640: LGTM! Return statement handling for JSX detection.Setting
IsReturnHtmlfor JSX files during return statement processing correctly identifies React component patterns.
740-747: LGTM! Generic parent traversal helper is useful.The inline reified generic function provides clean type-safe parent context lookup.
804-816: LGTM! New expression types properly handled.
GenericCallExpressionContextandOptionalCallExpressionContextare correctly processed for the updated grammar.
1263-1293: No duplicate exports are created—the skip mechanism inenterExportElementDirectlycorrectly prevents double-counting.The implementation correctly handles this:
enterVariableStatement(lines 45-74) detects exported variables usinghasExportPrefix()and records them as exports, whileenterExportElementDirectly(line 1268-1269) explicitly skipsVariableStatementContext. Test cases confirm the behavior—export const baseURL = '/api'produces exactly one export, not duplicates.
836-911: Call chain building logic correctly handles promise and member method notations.The
buildCallChainandnormalizeMemberCallNamemethods properly handle both promise chains (preserving->notation for.then(),.catch(),.finally()) and simple member calls (using.notation likeaxios.get). Edge cases with generics, optional chaining, and raw expression recovery via regex fallbacks are working correctly per the test suite.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
|
||
| fragment LineContinuation: '\\' [\r\n\u2028\u2029]+; | ||
|
|
||
| fragment HexDigit: [_0-9a-fA-F]; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's search for HexDigit usage in the grammar file
rg -n 'HexDigit' chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4 -A 2 -B 2Repository: phodal/chapi
Length of output: 1067
🏁 Script executed:
#!/bin/bash
# Let's verify ES2021 numeric separator specification
# Check if there's any documentation or comments in the grammar about this
grep -n -i "numeric separator\|escape.*sequence\|2021\|es2021" chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4 -iRepository: phodal/chapi
Length of output: 709
🏁 Script executed:
#!/bin/bash
# Let's also check if there are any BigInt or hex literal specific patterns
# to understand the full scope of the issue
cat -n chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4 | sed -n '130,145p'Repository: phodal/chapi
Length of output: 659
HexDigit fragment incorrectly allows underscores in escape sequences.
The HexDigit fragment includes _ which is correct for ES2021 numeric separators in hex literals (0x1_000_000), but this same fragment is incorrectly used in escape sequences where underscores are not permitted by the specification. This allows invalid syntax like \x1_FF and \u1_FFF to parse successfully.
Create a separate fragment HexDigitNoSeparator: [0-9a-fA-F] and use it in HexEscapeSequence (line 276), UnicodeEscapeSequence (lines 279-280), and ExtendedUnicodeEscapeSequence (line 283). Keep HexDigit with underscore for numeric literal rules only.
🤖 Prompt for AI Agents
In `@chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4` at line 293, Create a
new fragment HexDigitNoSeparator: [0-9a-fA-F] and replace usage of HexDigit in
the escape sequence rules so that HexEscapeSequence, UnicodeEscapeSequence, and
ExtendedUnicodeEscapeSequence use HexDigitNoSeparator (leaving the existing
HexDigit fragment with underscore for numeric literal rules only); update the
rules named HexEscapeSequence, UnicodeEscapeSequence, and
ExtendedUnicodeEscapeSequence to reference HexDigitNoSeparator instead of
HexDigit to prevent underscores in escape sequences.
| propertyAssignment | ||
| : propertyName (':' | '=') singleExpression # PropertyExpressionAssignment | ||
| | '[' singleExpression ']' ':' singleExpression # ComputedPropertyExpressionAssignment | ||
| | getAccessor # PropertyGetter | ||
| | setAccessor # PropertySetter | ||
| | generatorMethod # MethodProperty | ||
| | identifierOrKeyWord # PropertyShorthand | ||
| | Ellipsis? singleExpression # SpreadOperator | ||
| | restParameter # RestParameterInObject |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and read the TypeScriptParser.g4 file
git ls-files | grep -i "typescriptparser.g4"Repository: phodal/chapi
Length of output: 111
🏁 Script executed:
# Check the exact content around lines 708-716
cat -n chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | sed -n '700,730p'Repository: phodal/chapi
Length of output: 1290
🏁 Script executed:
# Let's also look at the propertyName and identifierOrKeyWord rules to understand potential overlaps
cat -n chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | sed -n '1,100p'Repository: phodal/chapi
Length of output: 3934
🏁 Script executed:
# Look for Ellipsis and related token definitions
rg -n "Ellipsis" chapi-ast-typescript/src/main/antlr/Repository: phodal/chapi
Length of output: 738
🏁 Script executed:
# Check for related propertyAssignment rule context and other related rules
cat -n chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | sed -n '680,750p'Repository: phodal/chapi
Length of output: 2233
🏁 Script executed:
# Check what singleExpression encompasses (look for the rule definition)
cat -n chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | sed -n '750,850p'Repository: phodal/chapi
Length of output: 6869
🏁 Script executed:
# Look for test files or examples that use object spread syntax
find . -type f -name "*.ts" -o -name "*.tsx" | head -20Repository: phodal/chapi
Length of output: 937
🏁 Script executed:
# Search for any ANTLR grammar conflict reports or test cases mentioning spread
rg -i "spread.*object|object.*spread" --type ts --type tsx -C 2 | head -40Repository: phodal/chapi
Length of output: 83
🏁 Script executed:
# Check identifierOrKeyWord to understand what it matches
rg -n "identifierOrKeyWord" chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | head -5Repository: phodal/chapi
Length of output: 370
🏁 Script executed:
# Look at the full singleExpression rule definition to understand scope
cat -n chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | sed -n '750,900p'Repository: phodal/chapi
Length of output: 8117
🏁 Script executed:
# Look for test cases with object spread syntax to confirm the issue
cat -n chapi-ast-typescript/src/test/resources/grammar/*.ts | grep -A 2 -B 2 "\.\.\."Repository: phodal/chapi
Length of output: 38
🏁 Script executed:
# Check if there's any documentation or comments explaining the rationale for optional Ellipsis
rg -B 3 -A 3 "Ellipsis\?" chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | head -50Repository: phodal/chapi
Length of output: 862
Require ... for object spread properties to align with TypeScript specification.
The optional Ellipsis? in the SpreadOperator alternative allows bare expressions to match this rule, creating overlap with PropertyShorthand (which matches identifierOrKeyWord). In TypeScript, spread properties must use the ... prefix; bare identifiers should only match shorthand properties. Removing the ? makes the ... token mandatory for spread, preventing semantic mismatches.
Fix
- | Ellipsis? singleExpression # SpreadOperator
+ | Ellipsis singleExpression # SpreadOperator🤖 Prompt for AI Agents
In `@chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4` around lines 708 -
716, The SpreadOperator alternative in the propertyAssignment rule currently
allows an optional Ellipsis (Ellipsis?), which causes bare expressions to be
parsed as spreads and overlaps with PropertyShorthand; update the grammar so
spread properties require the literal Ellipsis token by changing the alternative
from "Ellipsis? singleExpression" to "Ellipsis singleExpression" (i.e., make
Ellipsis mandatory) inside the propertyAssignment rule (SpreadOperator),
ensuring the rule now enforces the "..." prefix for object spread properties and
prevents conflicts with identifierOrKeyWord / PropertyShorthand.
| | singleExpression '!'? '.' '#'? identifierName typeGeneric? # MemberDotExpression | ||
| | singleExpression '?'? '.' '#'? identifierName typeGeneric? # MemberDotExpression |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check the exact content at lines 758-759 and surrounding context
sed -n '755,765p' chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 1016
🏁 Script executed:
# Search for other MemberDotExpression labels in the file to confirm duplication
rg 'MemberDotExpression' chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 231
🌐 Web query:
ANTLR duplicate alternative labels rule generation error unique labels per rule
💡 Result:
Cause
- ANTLR generates one Context class per alternative-label (the
#labelsyntax). If two alternatives in the same rule produce the same generated class name or conflict with an existing rule name, ANTLR reports a duplicate/ conflict error. All alternatives in a rule must either be unlabeled or each alternative must have labels that produce unique context-class names (unless you intentionally reuse one label on multiple alts to share the same context). [1][2]
Fixes
- Give each alternative a distinct label (rename the
#labelsso they don’t collide with each other or any rule name). - Or remove labels entirely for that rule (all-or-none rule).
- If you meant multiple alternatives to share the same context/event, explicitly reuse the same
#labelon those alternatives (valid), but avoid reusing a label that matches an existing rule name. [1][2]
Example
- Bad (conflict):
e : e '*' e # e // label 'e' conflicts with rule e → error. - Good:
e : e '*' e # Mult
| e '+' e # Add
| INT # Int
;
References
- ANTLR docs — Alternative Labels (parser-rules.md). [1]
- Discussion / examples on Stack Overflow (label conflicts / behavior). [2]
Sources
[1] ANTLR parser rules — Alternative Labels.
[2] Stack Overflow — examples and explanations of label conflicts.
Rename duplicate MemberDotExpression label—ANTLR requires unique labels per alternative.
Lines 758–759 both use # MemberDotExpression, causing ANTLR code generation to fail. The alternatives are semantically distinct (line 758: non-null assertion '!'?; line 759: optional chaining '?'?), so they should have separate labels rather than one being removed.
🛠️ Suggested fix (rename for clarity)
| singleExpression '!'? '.' '#'? identifierName typeGeneric? # MemberDotExpression
- | singleExpression '?'? '.' '#'? identifierName typeGeneric? # MemberDotExpression
+ | singleExpression '?'? '.' '#'? identifierName typeGeneric? # MemberOptionalChainExpression📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | singleExpression '!'? '.' '#'? identifierName typeGeneric? # MemberDotExpression | |
| | singleExpression '?'? '.' '#'? identifierName typeGeneric? # MemberDotExpression | |
| | singleExpression '!'? '.' '#'? identifierName typeGeneric? # MemberDotExpression | |
| | singleExpression '?'? '.' '#'? identifierName typeGeneric? # MemberOptionalChainExpression |
🤖 Prompt for AI Agents
In `@chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4` around lines 758 -
759, The two alternatives in the grammar that currently share the same label
`MemberDotExpression` must be given unique labels; locate the two alternatives
`singleExpression '!'? '.' '#'? identifierName typeGeneric? #
MemberDotExpression` and `singleExpression '?'? '.' '#'? identifierName
typeGeneric? # MemberDotExpression` and rename one (for example change the
non-null variant to `# MemberDotExpressionNonNull` and the optional-chaining
variant to `# MemberDotExpressionOptional` or similar) so each alternative has a
distinct label and ANTLR can generate code successfully.
| // Tag name: supports simple names and dotted names (e.g., Form.Item) | ||
| htmlTagName | ||
| : TagName | ||
| | keyword | ||
| | Identifier | ||
| | Identifier ('.' Identifier)* // bug fix: for <Form.Item></Form.Item> | ||
| : identifierOrKeyWord ('.' identifierOrKeyWord)* | ||
| ; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the lexer file
lexer_file=$(find . -name "TypeScriptLexer.g4" -type f)
echo "Lexer file: $lexer_file"
# Find the parser file
parser_file=$(find . -name "TypeScriptParser.g4" -type f)
echo "Parser file: $parser_file"
# Check if they exist
if [[ -n "$lexer_file" && -f "$lexer_file" ]]; then
echo -e "\n=== Lexer file found ==="
wc -l "$lexer_file"
fi
if [[ -n "$parser_file" && -f "$parser_file" ]]; then
echo -e "\n=== Parser file found ==="
wc -l "$parser_file"
fiRepository: phodal/chapi
Length of output: 372
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find TypeScriptLexer.g4
lexer_file=$(find . -name "TypeScriptLexer.g4" -type f | head -n 1)
if [[ -n "$lexer_file" ]]; then
echo "=== Identifier definitions in TypeScriptLexer.g4 ==="
rg -n "Identifier|IdentifierStart|IdentifierPart" "$lexer_file" -A 2
fiRepository: phodal/chapi
Length of output: 581
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find TypeScriptParser.g4
parser_file=$(find . -name "TypeScriptParser.g4" -type f | head -n 1)
if [[ -n "$parser_file" ]]; then
echo "=== htmlTagName and JSX tag related rules ==="
rg -n "htmlTagName|jsxElementName|jsxTagName" "$parser_file" -B 2 -A 5
fiRepository: phodal/chapi
Length of output: 1085
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for any rule that might handle hyphenated names or JSX tags
parser_file=$(find . -name "TypeScriptParser.g4" -type f | head -n 1)
if [[ -n "$parser_file" ]]; then
echo "=== Looking for JSX and custom element handling ==="
rg -n "jsx|JSX" "$parser_file" -i | head -n 50
fiRepository: phodal/chapi
Length of output: 586
🏁 Script executed:
#!/bin/bash
set -euo pipefail
parser_file=$(find . -name "TypeScriptParser.g4" -type f | head -n 1)
if [[ -n "$parser_file" ]]; then
echo "=== identifierOrKeyWord definition ==="
rg -n "identifierOrKeyWord" "$parser_file" -B 1 -A 5 | head -n 40
fiRepository: phodal/chapi
Length of output: 1171
🏁 Script executed:
#!/bin/bash
set -euo pipefail
parser_file=$(find . -name "TypeScriptParser.g4" -type f | head -n 1)
if [[ -n "$parser_file" ]]; then
echo "=== Check for Identifier token usage in htmlTagName context ==="
# Search for where Identifier is used in JSX/HTML context
rg -n "Identifier" "$parser_file" | grep -E "(html|jsx|tag|element)" -i | head -n 20
fiRepository: phodal/chapi
Length of output: 119
🏁 Script executed:
#!/bin/bash
set -euo pipefail
parser_file=$(find . -name "TypeScriptParser.g4" -type f | head -n 1)
if [[ -n "$parser_file" ]]; then
echo "=== identifier rule definition ==="
rg -n "^identifier" "$parser_file" -A 3
fiRepository: phodal/chapi
Length of output: 409
Support hyphenated custom element names in JSX tag rules.
The htmlTagName rule only allows identifierOrKeyWord and dots (for dotted names like Form.Item), but doesn't support hyphens required for custom elements like <my-component />. This inconsistency exists because htmlAttributeName already supports hyphens via ('-' identifierOrKeyWord)* (line 1127), but htmlTagName doesn't. Add hyphen support to align with JSX custom element naming conventions:
🛠️ Proposed fix
htmlTagName
- : identifierOrKeyWord ('.' identifierOrKeyWord)*
+ : identifierOrKeyWord (('.' | '-') identifierOrKeyWord)*
;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Tag name: supports simple names and dotted names (e.g., Form.Item) | |
| htmlTagName | |
| : TagName | |
| | keyword | |
| | Identifier | |
| | Identifier ('.' Identifier)* // bug fix: for <Form.Item></Form.Item> | |
| : identifierOrKeyWord ('.' identifierOrKeyWord)* | |
| ; | |
| // Tag name: supports simple names and dotted names (e.g., Form.Item) | |
| htmlTagName | |
| : identifierOrKeyWord (('.' | '-') identifierOrKeyWord)* | |
| ; |
🤖 Prompt for AI Agents
In `@chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4` around lines 1114 -
1117, The htmlTagName rule only allows identifierOrKeyWord and dotted segments
but lacks hyphenated segments required for custom elements; update htmlTagName
so each name segment is defined as identifierOrKeyWord ('-'
identifierOrKeyWord)* and allow repeated dotted segments, i.e. make the rule
match identifierOrKeyWord ('-' identifierOrKeyWord)* ('.' identifierOrKeyWord
('-' identifierOrKeyWord)*)* to mirror htmlAttributeName's hyphen handling and
support custom element names like my-component and dotted names like Form.Item.
| // JSX attributes | ||
| htmlAttribute | ||
| : htmlAttributeName '=' htmlAttributeValue | ||
| | htmlAttributeName | ||
| | objectLiteral | ||
| | '{' Ellipsis? singleExpression '}' // Spread attribute: {...props} | ||
| ; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let me check the file and lines in question
cd chapi-ast-typescript && head -c 500 src/main/antlr/TypeScriptParser.g4 && echo "..." && sed -n '1115,1130p' src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 914
🏁 Script executed:
# Get broader context around the htmlAttribute rule
sed -n '1100,1150p' src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 133
🏁 Script executed:
# Search for Ellipsis definition in the grammar
rg "Ellipsis\s*:" chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 -A 2Repository: phodal/chapi
Length of output: 38
🏁 Script executed:
# Check for any JSX-related tests or examples
find chapi-ast-typescript -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.test.*" -o -name "*test*" \) | head -20Repository: phodal/chapi
Length of output: 909
🏁 Script executed:
# Search for Ellipsis definition in the entire grammar file
rg "Ellipsis" chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | head -20Repository: phodal/chapi
Length of output: 343
🏁 Script executed:
# Check the BadSmellThreshold.tsx file for JSX examples
cat chapi-ast-typescript/src/test/resources/realworld/BadSmellThreshold.tsxRepository: phodal/chapi
Length of output: 3999
🏁 Script executed:
# Check for other TypeScript/JSX test files
find chapi-ast-typescript/src/test -name "*.tsx" -o -name "*.ts" | xargs grep -l "jsx\|JSX\|<" 2>/dev/null | head -5Repository: phodal/chapi
Length of output: 361
🏁 Script executed:
# Search in the grammar for related constructs
rg "fragment\s+Ellipsis|token\s+Ellipsis|Ellipsis\s*=" chapi-ast-typescript/src/main/antlr/ -A 2Repository: phodal/chapi
Length of output: 38
🏁 Script executed:
# Search for Ellipsis definition in the full grammar file with more context
grep -n "Ellipsis" chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 364
🏁 Script executed:
# Check the lexer section for token definitions
rg "fragment|token" chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4 | grep -i ellipsisRepository: phodal/chapi
Length of output: 38
🏁 Script executed:
# Look at lexer file if it exists
ls chapi-ast-typescript/src/main/antlr/ | head -20Repository: phodal/chapi
Length of output: 94
🏁 Script executed:
# Search for JSX test examples with spread attributes
find chapi-ast-typescript -type f -name "*.tsx" -o -name "*.ts" | xargs grep -l "{.*\.\..*}" 2>/dev/null | head -5Repository: phodal/chapi
Length of output: 38
🏁 Script executed:
# Search all test resources for JSX with spread attributes
find chapi-ast-typescript/src/test/resources -type f \( -name "*.tsx" -o -name "*.ts" -o -name "*.jsx" -o -name "*.js" \) -exec grep -l "\.\.\..*}" {} \;Repository: phodal/chapi
Length of output: 38
🏁 Script executed:
# Search for Ellipsis in the lexer file
rg "Ellipsis" chapi-ast-typescript/src/main/antlr/TypeScriptLexer.g4Repository: phodal/chapi
Length of output: 91
🏁 Script executed:
# Get context around line 679 where Ellipsis is required (not optional)
sed -n '675,685p' chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 331
🏁 Script executed:
# Get context for all the Ellipsis? usages to understand the pattern
sed -n '695,720p' chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 1057
🏁 Script executed:
# Get context around the JSX spread attribute rule
sed -n '1115,1135p' chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4Repository: phodal/chapi
Length of output: 530
Require ... for JSX spread attributes.
In JSX, spread attributes must include the ... operator: <Comp {...props} />. The grammar currently allows {expr} to parse as a spread attribute without ..., which is not valid JSX syntax.
🛠️ Suggested fix
- | '{' Ellipsis? singleExpression '}' // Spread attribute: {...props}
+ | '{' Ellipsis singleExpression '}' // Spread attribute: {...props}🤖 Prompt for AI Agents
In `@chapi-ast-typescript/src/main/antlr/TypeScriptParser.g4` around lines 1119 -
1124, The htmlAttribute rule currently allows a spread-like brace without the
dots by using '{' Ellipsis? singleExpression '}' — change it to require the
Ellipsis token so spread attributes are only parsed as '{' Ellipsis
singleExpression '}' and update/confirm the Ellipsis lexer token is defined as
'...' (or equivalent) so JSX spreads like {...props} are enforced; reference
htmlAttribute and the Ellipsis token when making this change.
Summary
Key Changes
Grammar Updates (TypeScriptParser.g4)
JSX/TSX Support:
htmlElement,htmlContent,htmlTagName,htmlAttributerules for JSX parsing<></>)pushHtmlTagName/popHtmlTagNamepredicates{expression}and spread attributes{...props}TypeScript Syntax Improvements:
!) in class properties:prop!: Typeobj?.method())func<T>(args))Export Statement Fixes:
export default identifier;Export?fromsourceElementruleExportElementDirectlyfor exported declarationsCompatibility Layer:
typeRef→type_,parameterizedTypeRef)Listener Updates (TypeScriptFullIdentListener.kt)
enterExportElementDirectlyParenthesizedExpressionContexthandling for function calls vs JSX returnsIsReturnHtmlflag for React componentsParser Base Updates (TypeScriptParserBase.java)
pushHtmlTagName/popHtmlTagNamefor JSX tag matchingnotOpenBraceAndNotStatementKeywordpredicatelineTerminatorAheadandcloseBracehelper methodsTest Plan
!assertion)Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.