-
Notifications
You must be signed in to change notification settings - Fork 47
perf(java): optimize Java parser performance #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
确保解析器对 Java 24 新语法(如未命名模式变量和 lambda 参数)不会崩溃。更新了词法和语法文件以支持这些特性。
- Replace immutable List with MutableList and use add() instead of += to reduce object creation - Add import index maps (importsByClassName, importsByFullSource) for O(1) lookup instead of O(n) - Remove default error listeners to reduce I/O overhead - Reuse ParseTreeWalker instance to reduce object allocation - Optimize string concatenation using string templates - Optimize warpTargetFullType() to use index for fast import lookup - Apply same optimizations to JavaBasicIdentListener Performance improvements: - Reduced GC pressure by avoiding intermediate list object creation - Import lookups optimized from O(n) to O(1) - Reduced I/O overhead by removing console error listeners - All existing tests pass
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdded UNDERSCORE token and updated parser rules to accept unnamed variables/patterns; converted several listener collections to mutable types with import indexing; introduced a shared ParseTreeWalker and removed default error listeners; added a test ensuring parsing of unnamed variables/patterns does not crash. Changes
Sequence Diagram(s)sequenceDiagram
participant Source as "Java source"
participant Lexer as "ANTLR JavaLexer"
participant Parser as "ANTLR JavaParser"
participant Walker as "Shared ParseTreeWalker"
participant Listener as "JavaFull/Basic Ident Listener"
participant Analyser as "JavaAnalyser"
Note over Source,Lexer: Input fed to lexer
Source->>Lexer: tokenize (recognize `UNDERSCORE`)
Lexer->>Parser: tokens
Parser->>Walker: parse tree
Walker->>Listener: walk tree (enter/exit nodes)
Listener->>Analyser: emit/collect ident data
Analyser-->>Listener: request/consume results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #54 +/- ##
============================================
- Coverage 68.92% 68.90% -0.02%
Complexity 1796 1796
============================================
Files 88 88
Lines 7977 7995 +18
Branches 1624 1626 +2
============================================
+ Hits 5498 5509 +11
- Misses 1452 1454 +2
- Partials 1027 1032 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to optimize Java parser performance through several improvements: converting immutable lists to mutable lists to reduce object creation, adding import indexing for O(1) lookups, removing ANTLR error listeners to reduce I/O overhead, and reusing a single ParseTreeWalker instance. Additionally, it adds grammar support for Java 22+ unnamed variables and patterns (underscore as variable name).
Changes:
- Performance optimizations: MutableList usage, import indexing, ParseTreeWalker reuse, and error listener removal
- Grammar updates: Support for Java 22+ unnamed variables/patterns using underscore
- Bug fix: Corrected exitClassBodyDeclaration in JavaBasicIdentListener to prevent duplicate class nodes
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaAnalyser.kt | Added companion object with shared ParseTreeWalker instance and removed default ANTLR error listeners |
| chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaFullIdentListener.kt | Converted List to MutableList, added import indexing maps, used string templates, and applied toList() for defensive copying |
| chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaBasicIdentListener.kt | Converted List to MutableList, added import indexing, moved class finalization to exitClassDeclaration, and updated containsType with index lookup |
| chapi-ast-java/src/main/antlr/JavaParser.g4 | Added support for UNDERSCORE as identifier in variable declarations, lambda parameters, catch clauses, and switch patterns |
| chapi-ast-java/src/main/antlr/JavaLexer.g4 | Added UNDERSCORE token definition for Java 22+ unnamed variables |
| chapi-ast-java/src/test/kotlin/chapi/ast/javaast/JavaIdentCallAppTest.kt | Added test case for Java 22+ unnamed variables in switch patterns and lambda parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Index for fast import lookup by class name | ||
| private var importsByClassName: MutableMap<String, CodeImport> = mutableMapOf() |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to JavaFullIdentListener, the import indexing here uses a simple Map<String, CodeImport> which doesn't handle import collisions. If two classes with the same simple name are imported from different packages (e.g., import com.foo.Widget; import com.bar.Widget;), the second import overwrites the first in the importsByClassName map.
This changes the behavior from the original O(n) implementation which would find the first matching import, to now only retaining the last import for each class name. This could break type resolution in files with import collisions.
| val className = fullSource.substringAfterLast('.') | ||
| importsByClassName[className] = codeImport | ||
| importsByFullSource[fullSource] = codeImport | ||
| // Also index by source for static imports | ||
| if (isStatic) { | ||
| importsByFullSource[codeImport.Source] = codeImport | ||
| } |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import indexing logic incorrectly indexes wildcard imports using the last segment of fullSource, which would be "" or the package name. For wildcard imports like "import java.util.", this code extracts "*" as the className and stores it in importsByClassName. This means:
- All wildcard imports would collide in the index under the same "*" key
- The lookup won't work correctly for classes imported via wildcard imports
The indexing should only be performed for non-wildcard imports (when the className is an actual class name, not "*"). For wildcard imports, the linear search fallback would be needed anyway since we don't know which classes are being imported.
| val className = fullSource.substringAfterLast('.') | |
| importsByClassName[className] = codeImport | |
| importsByFullSource[fullSource] = codeImport | |
| // Also index by source for static imports | |
| if (isStatic) { | |
| importsByFullSource[codeImport.Source] = codeImport | |
| } | |
| importsByFullSource[fullSource] = codeImport | |
| // Also index by source for static imports | |
| if (isStatic) { | |
| importsByFullSource[codeImport.Source] = codeImport | |
| } | |
| // Only index by class name for non-wildcard imports | |
| if (!isWildcard) { | |
| val classNameKey = fullSource.substringAfterLast('.') | |
| importsByClassName[classNameKey] = codeImport | |
| } |
| // Build import index | ||
| val className = fullSource.substringAfterLast('.') | ||
| importsByClassName[className] = codeImport |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to JavaFullIdentListener, the import indexing logic incorrectly indexes wildcard imports. For wildcard imports like "import java.util.", fullSource.substringAfterLast('.') would return "", which would be stored as the className in importsByClassName. This breaks the O(1) lookup optimization since:
- All wildcard imports collide under the "*" key
- Classes imported via wildcard imports won't be found in the index
The indexing should only be performed for non-wildcard imports. A check should be added to skip indexing when the import is a wildcard import.
| // Build import index | |
| val className = fullSource.substringAfterLast('.') | |
| importsByClassName[className] = codeImport | |
| // Build import index for non-wildcard imports only | |
| if (!isWildcard) { | |
| val className = fullSource.substringAfterLast('.') | |
| importsByClassName[className] = codeImport | |
| } |
| val className = fullSource.substringAfterLast('.') | ||
| importsByClassName[className] = codeImport | ||
| importsByFullSource[fullSource] = codeImport | ||
| // Also index by source for static imports | ||
| if (isStatic) { | ||
| importsByFullSource[codeImport.Source] = codeImport |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For static single-member imports (e.g., "import static pkg.Class.member;"), the indexing logic has an issue. The code extracts className from fullSource at line 104, which would be the member name (not a class name). For example, with "import static java.util.Collections.sort;", it would index "sort" as if it were a class name in importsByClassName.
This causes two problems:
- Static import members (which are methods/fields) get indexed as if they were classes
- If there's an actual class named "sort" imported elsewhere, the entries would conflict
The static import should only be indexed by the Source (the actual class), not by the member name. The indexing at lines 104-105 should be skipped for static imports, or only the class name from codeImport.Source should be indexed.
| val className = fullSource.substringAfterLast('.') | |
| importsByClassName[className] = codeImport | |
| importsByFullSource[fullSource] = codeImport | |
| // Also index by source for static imports | |
| if (isStatic) { | |
| importsByFullSource[codeImport.Source] = codeImport | |
| importsByFullSource[fullSource] = codeImport | |
| if (isStatic) { | |
| // For static imports, index by the declaring class name (from Source), | |
| // not by the static member name. | |
| val sourceClassName = codeImport.Source.substringAfterLast('.') | |
| importsByClassName[sourceClassName] = codeImport | |
| // Also index by source for static imports (fully-qualified class name). | |
| importsByFullSource[codeImport.Source] = codeImport | |
| } else if (!isWildcard) { | |
| // For non-static, non-wildcard imports, index by the imported class name. | |
| val className = fullSource.substringAfterLast('.') | |
| importsByClassName[className] = codeImport |
| // Index for fast import lookup by class name (O(1) instead of O(n)) | ||
| private var importsByClassName: MutableMap<String, CodeImport> = mutableMapOf() | ||
| private var importsByFullSource: MutableMap<String, CodeImport> = mutableMapOf() |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import indexing optimization has a fundamental design issue: the index maps (importsByClassName and importsByFullSource) use simple key-value mapping, which doesn't account for import collisions.
Consider these scenarios:
- Multiple wildcard imports (import java.util.; import java.io.;) - they all get indexed under "*"
- Static member imports with the same member name from different classes - the member name key collides
- Classes with the same simple name from different packages (import com.foo.Widget; import com.bar.Widget;) - they collide under "Widget"
For case 3, the second import would overwrite the first in importsByClassName, breaking lookups for the first import. The original O(n) approach with a linear search found the first matching import, but now only the last import for a given class name is retained.
A correct O(1) solution would need to either:
- Use a Map<String, List> to store all imports for each key
- Add special handling for wildcards and collisions
- Fall back to linear search when multiple imports exist for a key
| // Index for fast import lookup by class name (O(1) instead of O(n)) | |
| private var importsByClassName: MutableMap<String, CodeImport> = mutableMapOf() | |
| private var importsByFullSource: MutableMap<String, CodeImport> = mutableMapOf() | |
| // Index for fast import lookup by class name (O(1) instead of O(n)), supporting collisions | |
| private var importsByClassName: MutableMap<String, MutableList<CodeImport>> = mutableMapOf() | |
| private var importsByFullSource: MutableMap<String, MutableList<CodeImport>> = mutableMapOf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaBasicIdentListener.kt`:
- Around line 15-18: containsType() currently only uses importsByClassName and
exact Source.endsWith matches, so types brought in via on-demand (wildcard)
imports are missed; update the code to track wildcard imports separately (e.g.,
add a mutable set/list like importsWildcards or store CodeImport entries with a
flag for onDemand) when parsing imports, then modify containsType(typeName:
String) to: 1) check importsByClassName as before, 2) check the imports list for
any CodeImport that is an on-demand/wildcard import whose package + "." +
typeName would resolve (or whose Source endsWith ".*" and whose package prefix +
"." + typeName matches), and 3) fall back to the existing
Source.endsWith(".${typeName}") logic; ensure buildImplements() uses the updated
containsType() so interfaces from wildcard imports are retained.
🧹 Nitpick comments (3)
chapi-ast-java/src/main/antlr/JavaParser.g4 (1)
257-265: Consider usingvariableNameinvariableDeclaratorIdfor consistency.The new
variableNamerule is a good abstraction, butvariableDeclaratorIdstill uses inline(identifier | UNDERSCORE)instead of referencing it. This is a minor inconsistency.♻️ Optional: Use variableName for consistency
variableDeclaratorId - : (identifier | UNDERSCORE) ('[' ']')* + : variableName ('[' ']')* ;chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaAnalyser.kt (1)
51-57: Consider a configurable error-listener strategyRemoving default listeners silences syntax error diagnostics. If downstream users rely on parse error reporting, consider wiring a custom listener or a flag to opt back into error output.
chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaFullIdentListener.kt (1)
30-30: Consider snapshottingclassNodeswhen returningCodeContainer
classNodesis now mutable; returning it directly allows external mutation of listener state. AtoList()ingetNodeInfo()would preserve a stable snapshot.
| private var classNodes: MutableList<CodeDataStruct> = mutableListOf() | ||
| private var imports: MutableList<CodeImport> = mutableListOf() | ||
| // Index for fast import lookup by class name | ||
| private var importsByClassName: MutableMap<String, CodeImport> = mutableMapOf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file and check its size
fd -t f JavaBasicIdentListener.ktRepository: phodal/chapi
Length of output: 130
🏁 Script executed:
# Read the full file to understand the implementation
cat -n chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaBasicIdentListener.ktRepository: phodal/chapi
Length of output: 11402
Wildcard imports are not matched in containsType(), causing interface implementations to be silently dropped
The containsType() method (lines 110–115) only checks the class-name index and exact Source.endsWith(".$typeName") matching. For on-demand imports (import pkg.*;), this fails to match unqualified type names, so buildImplements() filters out interfaces that come from wildcard imports.
Example: import com.example.*; followed by implements MyInterface will drop MyInterface from the Implements list because containsType("MyInterface") returns false.
Consider tracking wildcard imports separately and matching unqualified types against them.
🔧 Possible fix
private var importsByClassName: MutableMap<String, CodeImport> = mutableMapOf()
+ private var wildcardImports: MutableSet<String> = mutableSetOf() imports.add(codeImport)
codeContainer.Imports += codeImport
// Build import index
val className = fullSource.substringAfterLast('.')
importsByClassName[className] = codeImport
+ if (isWildcard && !isStatic) {
+ wildcardImports.add(fullSource)
+ } private fun containsType(typeName: String?): Boolean {
if (typeName == null) return false
// Fast lookup using index
return importsByClassName.containsKey(typeName) ||
imports.any { imp -> imp.Source.endsWith(".$typeName") } ||
+ (wildcardImports.isNotEmpty() && !typeName.contains("."))
}Also applies to: 63–68, 104–108, 117–121
🤖 Prompt for AI Agents
In `@chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaBasicIdentListener.kt`
around lines 15 - 18, containsType() currently only uses importsByClassName and
exact Source.endsWith matches, so types brought in via on-demand (wildcard)
imports are missed; update the code to track wildcard imports separately (e.g.,
add a mutable set/list like importsWildcards or store CodeImport entries with a
flag for onDemand) when parsing imports, then modify containsType(typeName:
String) to: 1) check importsByClassName as before, 2) check the imports list for
any CodeImport that is an on-demand/wildcard import whose package + "." +
typeName would resolve (or whose Source endsWith ".*" and whose package prefix +
"." + typeName matches), and 3) fall back to the existing
Source.endsWith(".${typeName}") logic; ensure buildImplements() uses the updated
containsType() so interfaces from wildcard imports are retained.
- 将 importsByClassName 改为 Map<String, MutableList<CodeImport>> 以处理冲突 - 跳过通配符导入的索引(无法有效索引) - 修正静态导入:使用声明类名而不是成员名 - 在 containsType 中为通配符导入添加回退搜索 - 在查找时处理多个匹配的 import 解决了 PR review comments 中的 5 个问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaFullIdentListener.kt (1)
586-591: PotentialNullPointerExceptioninbuildPureTargetType.The parameter
targetTypeis nullable (String?), but line 587 usestargetType!!without a null check. If called withnull, this will throw an NPE.🐛 Proposed fix
// remove array from types private fun buildPureTargetType(targetType: String?): String { + if (targetType == null) return "" - val split = targetType!!.split(".") + val split = targetType.split(".") // remove for array val pureTargetType = split[0].replace("[", "").replace("]", "") return pureTargetType }
🤖 Fix all issues with AI agents
In `@chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaFullIdentListener.kt`:
- Around line 548-557: The current isStatic detection in JavaFullIdentListener
(using importByClassName.UsageName.contains(pureTargetType)) is incorrect and
causes regular class imports to be treated as static; replace that logic by
checking the import's Kind field instead (e.g., importByClassName.Kind ==
ImportKind.STATIC or the equivalent enum/string value used in your model) when
computing isStatic for importByClassName, and then return JavaTargetType with
CallType.STATIC only when that kind indicates a static import, otherwise use
CallType.CHAIN.
♻️ Duplicate comments (1)
chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaBasicIdentListener.kt (1)
113-122: Wildcard imports not matched incontainsType()fallback.The fallback at line 121 checks
imp.Source.endsWith(".$typeName"), but for wildcard imports likeimport java.util.*;, theSourceis"java.util"(the package), not the full type path. This means checking"java.util".endsWith(".List")returnsfalse, so types from wildcard imports won't be matched.This can cause interface implementations from wildcard imports to be silently dropped in
buildImplements().
| val importsByName = importsByClassName[pureTargetType] | ||
| if (importsByName != null && importsByName.isNotEmpty()) { | ||
| // Use the first matching import (consistent with original behavior) | ||
| val importByClassName = importsByName[0] | ||
| val isStatic = importByClassName.UsageName.isNotEmpty() && importByClassName.UsageName.contains(pureTargetType) | ||
| return JavaTargetType( | ||
| targetType = importByClassName.Source, | ||
| callType = if (isStatic) CallType.STATIC else CallType.CHAIN | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect isStatic detection logic.
The check at line 552 uses importByClassName.UsageName.contains(pureTargetType) to determine if the import is static. However, for regular named imports like import java.util.List, UsageName is ["List"], so looking up "List" would make isStatic = true, incorrectly returning CallType.STATIC.
The condition should check the import's Kind instead:
🐛 Proposed fix
if (importsByName != null && importsByName.isNotEmpty()) {
// Use the first matching import (consistent with original behavior)
val importByClassName = importsByName[0]
- val isStatic = importByClassName.UsageName.isNotEmpty() && importByClassName.UsageName.contains(pureTargetType)
+ val isStatic = importByClassName.Kind == ImportKind.STATIC
return JavaTargetType(
targetType = importByClassName.Source,
callType = if (isStatic) CallType.STATIC else CallType.CHAIN
)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val importsByName = importsByClassName[pureTargetType] | |
| if (importsByName != null && importsByName.isNotEmpty()) { | |
| // Use the first matching import (consistent with original behavior) | |
| val importByClassName = importsByName[0] | |
| val isStatic = importByClassName.UsageName.isNotEmpty() && importByClassName.UsageName.contains(pureTargetType) | |
| return JavaTargetType( | |
| targetType = importByClassName.Source, | |
| callType = if (isStatic) CallType.STATIC else CallType.CHAIN | |
| ) | |
| } | |
| val importsByName = importsByClassName[pureTargetType] | |
| if (importsByName != null && importsByName.isNotEmpty()) { | |
| // Use the first matching import (consistent with original behavior) | |
| val importByClassName = importsByName[0] | |
| val isStatic = importByClassName.Kind == ImportKind.STATIC | |
| return JavaTargetType( | |
| targetType = importByClassName.Source, | |
| callType = if (isStatic) CallType.STATIC else CallType.CHAIN | |
| ) | |
| } |
🤖 Prompt for AI Agents
In `@chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaFullIdentListener.kt`
around lines 548 - 557, The current isStatic detection in JavaFullIdentListener
(using importByClassName.UsageName.contains(pureTargetType)) is incorrect and
causes regular class imports to be treated as static; replace that logic by
checking the import's Kind field instead (e.g., importByClassName.Kind ==
ImportKind.STATIC or the equivalent enum/string value used in your model) when
computing isStatic for importByClassName, and then return JavaTargetType with
CallType.STATIC only when that kind indicates a static import, otherwise use
CallType.CHAIN.
Performance Optimizations
This PR optimizes the Java parser performance through several key improvements:
Changes
Reduce Object Creation
ListwithMutableListand useadd()instead of+=Optimize Import Lookups (O(n) → O(1))
importsByClassNameandimportsByFullSourceindex mapsReduce I/O Overhead
Reduce Object Allocation
ParseTreeWalkerinstance across all parsing operationsApply to Both Listeners
JavaFullIdentListener: Full optimizations including import indexingJavaBasicIdentListener: Same optimizations for consistencyPerformance Impact
Testing
✅ All existing tests pass (79 tests)
JavaFullIdentListenerTest: All tests passJavaBasicIdentListenerTest: All tests passJavaIdentCallAppTest: All tests passJavaTypeRefTest: All tests passFiles Changed
chapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaAnalyser.ktchapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaFullIdentListener.ktchapi-ast-java/src/main/kotlin/chapi/ast/javaast/JavaBasicIdentListener.ktNote
SLL prediction mode was tested but not enabled due to compatibility issues with existing grammar rules. The other optimizations provide substantial performance improvements while maintaining full compatibility.
Summary by CodeRabbit
New Features
Performance
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.