-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Optimizing listCodeDefinitionNamesTool for Object-oriented design language. FIX 7330 #7373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the changes to improve Java method parsing in the listCodeDefinitionNamesTool. The approach to skip line count checks for Java methods and handle annotations is good, but I found some issues that need attention before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test fixture modifications compress multi-line method declarations into single lines. Could this reduce test coverage for real-world Java code formatting? Consider keeping some multi-line examples to ensure the parser handles various formatting styles correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it actually expands to four lines just to accommodate currentMinComponentLines=4
b491b13 to
8970f77
Compare
|
@roomote-agent Please review this pull request.
|
Pull Request Review Complete ✅Hi @NaccOll! I have completed my review of this PR. Here are my findings: 1. Does it complete the objectives of issue #7330? ✅ YESThe PR successfully addresses both issues mentioned in #7330:
2. Does it only affect Java method handling? ✅ YESThe changes are properly scoped to Java-specific handling: Key Changes:
Code Quality Assessment:Strengths:
Minor Suggestions for Future Improvements:
Test Results:Conclusion:This PR effectively solves the reported issues without introducing any side effects. The implementation is clean, well-tested, and properly scoped to Java. The approach of using language-specific rules is a good architectural decision that will make it easy to handle similar issues in other languages if they arise. Recommendation: APPROVE ✅ Great work on this fix! The solution elegantly handles the unique characteristics of Java code while maintaining backward compatibility. |
|
Hey @NaccOll thank you for your contribution! Can this problem be solved by removing the minimum line check for all the languages? I would prefer a general fix rather than creating langauge-specific parsing functions. Let me know what you think! |
There are two problem here.
Because interface definitions often only have method signatures and no implementation, they are only one line. This issue can be addressed with the following method: if (["definition.method"].includes(capture.name)) { I only did this for Java because I wanted to focus on resolving the issue mentioned.
For Java, I modified the syntax capture expression to capture both annotations and method signatures.(by definition.method.start) Please review again. |
… for better accuracy
0e24e97 to
59a3dc9
Compare
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the changes and while they do address the issue, I'm wondering if there's a simpler approach we could consider.
The PR description mentions that "The original design was to avoid returning small, meaningless blocks, but this doesn't fit with object-oriented design language." Since the core issue is that the 4-line minimum threshold filters out legitimate Java interface methods and single-line methods with annotations, wouldn't it be simpler to just reduce the DEFAULT_MIN_COMPONENT_LINES_VALUE from 4 to 1 or 2 globally?
This would:
- Solve the Java interface method issue (as shown in issue #7330 where TestInterface methods were being filtered out)
- Avoid adding the complexity of special-casing methods with the new OutputItem type and deduplication logic
- Keep the codebase simpler and more maintainable
- Still filter out truly empty or meaningless blocks (0 lines)
What do you think about this simpler approach? It seems like it would achieve the same goal without the added complexity of special handling for method captures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Architectural concern: this PR introduces method-specific logic in the generic tree-sitter pipeline (src/services/tree-sitter/index.ts), which appears Java-driven. I’m okay with Java-specific adjustments in the Java query, but I’d like to avoid central special-casing here. I think the best solution would be a combination of fixing for the min line requirement for all languages and updating the java query.
| currentMinComponentLines = value | ||
| } | ||
|
|
||
| function passesMinLines(lineCount: number, capture: QueryCapture, language: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method-specific bypass (added to handle Java) brings language concerns into the generic processor. Can we keep language-specific handling in queries (e.g., java.ts) or adopt a simpler language-agnostic rule, rather than special-casing in index.ts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about this. In the latest code, I did not introduce any language-specific processing logic in index.ts.







Related GitHub Issue
Closes: #7330
Description
Optimizing listCodeDefinitionNamesTool for object-oriented design language.
Tree-sitter has already parsed the Java grammar. However, there are differences between the listCodeDefinitionNamesTool and the test. The listCodeDefinitionNamesTool function filters out blocks with fewer than four lines, while the test displays all content.
The original design was to avoid returning small, meaningless blocks, but this doesn't fit with object-oriented design language.
Java is an object-oriented language, and projects often contain numerous interfaces. Implementation classes have
@Overridemethods that override methods in interfaces or base classes, and rely on annotations to implement AOP.The previous approach resulted in numerous method names being ignored or only one of multiple annotations being displayed, leading to LLM misinterpreting the code.
The issue was resolved by follow:
Test Procedure
change src\services\tree-sitter_tests_\parseSourceCodeDefinitions.java.spec.ts and src\services\tree-sitter_tests_\fixtures\sample-java.ts
Pre-Submission Checklist
Screenshots / Videos
before:

after:

Documentation Updates
No documentation updates are required.
Additional Notes
Get in Touch
NaccOll
Important
Enhances
processCapturesinindex.tsto improve Java method parsing by skipping line count checks and annotations, with updated tests.processCapturesinindex.tsnow skips line count checks for Javadefinition.methodand skips annotations.shouldSkipandskipJavaAnnotationsfunctions inindex.tsfor Java-specific handling.parseSourceCodeDefinitions.java.spec.tsto test new Java parsing behavior.sample-java.tsto include more Java method cases with annotations.minComponentLinesparameter fromprocessCapturesinindex.ts.This description was created by
for f55baa6a5679ec89bf3cec23ade127aeceada382. You can customize this summary. It will automatically update as commits are pushed.