Skip to content
27 changes: 13 additions & 14 deletions src/services/tree-sitter/__tests__/fixtures/sample-java.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,10 @@ public @interface TestAnnotationDefinition {
// Interface declaration test - at least 4 lines long
public interface TestInterfaceDefinition<T extends Comparable<T>> {
// Interface method declarations
void testInterfaceMethod(
String message,
T data
);
void testInterfaceMethod(String message,T data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fixture modifications compress multi-line method declarations into single lines. Could this reduce test coverage for real-world Java code formatting? Consider keeping some multi-line examples to ensure the parser handles various formatting styles correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it actually expands to four lines just to accommodate currentMinComponentLines=4


// Default method in interface - 4+ lines
default String testInterfaceDefaultMethod(
String input,
T data
) {
// Default method in interface
default String testInterfaceDefaultMethod(String input,T data) {
return String.format("%s: %s", input, data.toString());
}
}
Expand Down Expand Up @@ -89,12 +83,17 @@ public class TestClassDefinition<T extends Comparable<T>>
instanceCount++;
}

// Method implementation - at least 4 lines long
// Method implementation
@Override
public void testInterfaceMethod(
String message,
T data
) {
public void testInterfaceMethod(String message, T data) {
System.out.println(testInterfaceDefaultMethod(message, data));
}

@TestAnnotationDefinition(
value="test"
)

void testMultipleAnnotationMethod(String message, T data) {
System.out.println(testInterfaceDefaultMethod(message, data));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ describe("parseSourceCodeDefinitionsForFile with Java", () => {

it("should parse method declarations", () => {
expect(parseResult).toMatch(/\d+--\d+ \|\s*void testInterfaceMethod\(/)
expect(parseResult).toMatch(/88--88 \|\s*public void testInterfaceMethod\(String message, T data\) {/)
expect(parseResult).toMatch(/\d+--\d+ \|\s*void testMultipleAnnotationMethod\(String message, T data\) {/)
expect(parseResult).toMatch(/\d+--\d+ \|\s*default String testInterfaceDefaultMethod\(/)
expect(parseResult).toMatch(/\d+--\d+ \|\s*public <R extends Comparable<R>> R testGenericMethodDefinition\(/)
expect(parseResult).toMatch(/\d+--\d+ \|\s*public String formatMessage\(/)
Expand Down
69 changes: 61 additions & 8 deletions src/services/tree-sitter/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ import { parseMarkdown } from "./markdownParser"
import { RooIgnoreController } from "../../core/ignore/RooIgnoreController"
import { QueryCapture } from "web-tree-sitter"

type OutputItem = {
startLine: number
endLine: number
type: string
startContent: string
}

const METHOD_CAPTURE = ["definition.method", "definition.method.start"]

// Private constant
const DEFAULT_MIN_COMPONENT_LINES_VALUE = 4

Expand All @@ -27,6 +36,14 @@ export function setMinComponentLines(value: number): void {
currentMinComponentLines = value
}

function passesMinLines(lineCount: number, capture: QueryCapture, language: string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method-specific bypass (added to handle Java) brings language concerns into the generic processor. Can we keep language-specific handling in queries (e.g., java.ts) or adopt a simpler language-agnostic rule, rather than special-casing in index.ts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this. In the latest code, I did not introduce any language-specific processing logic in index.ts.

if (METHOD_CAPTURE.includes(capture.name)) {
// In object-oriented programming languages, method signatures are only one line and should not be ignored.
return false
}
return lineCount < getMinComponentLines()
}

const extensions = [
"tla",
"js",
Expand Down Expand Up @@ -262,7 +279,7 @@ This approach allows us to focus on the most relevant parts of the code (defined
*
* @param captures - The captures to process
* @param lines - The lines of the file
* @param minComponentLines - Minimum number of lines for a component to be included
* @param language - The programming language of the file
* @returns A formatted string with definitions
*/
function processCaptures(captures: QueryCapture[], lines: string[], language: string): string | null {
Expand All @@ -283,7 +300,7 @@ function processCaptures(captures: QueryCapture[], lines: string[], language: st
return null
}

let formattedOutput = ""
const outputArr: OutputItem[] = []

// Sort captures by their start position
captures.sort((a, b) => a.node.startPosition.row - b.node.startPosition.row)
Expand All @@ -310,7 +327,7 @@ function processCaptures(captures: QueryCapture[], lines: string[], language: st
const lineCount = endLine - startLine + 1

// Skip components that don't span enough lines
if (lineCount < getMinComponentLines()) {
if (passesMinLines(lineCount, capture, language)) {
return
}

Expand All @@ -333,13 +350,23 @@ function processCaptures(captures: QueryCapture[], lines: string[], language: st

// Add component name to output regardless of HTML filtering
if (!processedLines.has(lineKey) && componentName) {
formattedOutput += `${startLine + 1}--${endLine + 1} | ${lines[startLine]}\n`
outputArr.push({
startLine: startLine,
endLine: endLine,
type: name,
startContent: lines[startLine],
})
processedLines.add(lineKey)
}
}
// For other component definitions
else if (isNotHtmlElement(startLineContent)) {
formattedOutput += `${startLine + 1}--${endLine + 1} | ${lines[startLine]}\n`
outputArr.push({
startLine: startLine,
endLine: endLine,
type: name,
startContent: lines[startLine],
})
processedLines.add(lineKey)

// If this is part of a larger definition, include its non-HTML context
Expand All @@ -352,16 +379,42 @@ function processCaptures(captures: QueryCapture[], lines: string[], language: st
// Add the full range first
const rangeKey = `${node.parent.startPosition.row}-${contextEnd}`
if (!processedLines.has(rangeKey)) {
formattedOutput += `${node.parent.startPosition.row + 1}--${contextEnd + 1} | ${lines[node.parent.startPosition.row]}\n`
outputArr.push({
startLine: node.parent.startPosition.row,
endLine: contextEnd,
type: name,
startContent: lines[node.parent.startPosition.row],
})
processedLines.add(rangeKey)
}
}
}
}
})

if (formattedOutput.length > 0) {
return formattedOutput
// Deduplication and filtering
const dedupedArr: OutputItem[] = []
const seen = new Set<string>()
for (const item of outputArr) {
// Only skip if there is another item with the same startLine and startContent
if (item.startLine === item.endLine && METHOD_CAPTURE.includes(item.type)) {
const isDuplicate = outputArr.some(
(other) =>
other !== item && other.startLine === item.startLine && other.startContent === item.startContent,
)
if (isDuplicate) {
continue
}
}
const key = `${item.startLine}-${item.endLine}-${item.type}-${item.startContent}`
if (!seen.has(key)) {
dedupedArr.push(item)
seen.add(key)
}
}

if (dedupedArr.length > 0) {
return dedupedArr.map((item) => `${item.startLine + 1}--${item.endLine + 1} | ${item.startContent}`).join("\n")
}

return null
Expand Down
4 changes: 3 additions & 1 deletion src/services/tree-sitter/queries/java.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,9 @@ export default `

; Method declarations
(method_declaration
name: (identifier) @name.definition.method) @definition.method
type: (_) @definition.method.start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional to keep both definition.method and definition.method.start? They can match the same method and lead to duplicate output downstream. You could normalize .start to the parent method node in processing to dedupe, or keep only a single method capture so each method is emitted once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this intentionally. Currently, definition.method.start is only used in Java, but the problem you mentioned does exist. It will cause duplication in some scenarios. I will try again to see if there is any way to solve it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @NaccOll, please let me know if you find a solution!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue has been resolved.

The key point is that Java method definitions have two different scenarios:

  1. A method starts with an annotation and contains the complete method definition;
  2. A method has no annotation, or is simply an interface with no implementation;

I use the following gramma to query java

Method declarations
(method_declaration
type: (_) @definition.method.start
name: (identifier) ​​@name.definition.method) @definition.method

When the method starts with an annotation I get the following definition

407--453 | 	@Override
413--413 | 	public void assignRoleToUsers(AssignSysRoleToUsersCommand command) {

When the method doesn't have an annotation, I get the following result:

36--39 | List<SysCodeRulePo> findAll(SysCodeRuleQueryCondition condition) {
36--36 | List<SysCodeRulePo> findAll(SysCodeRuleQueryCondition condition) {

I modified the original simple string concatenation to remove duplicates before returning. For details, see https://github.com/RooCodeInc/Roo-Code/pull/7373/files#diff-5bb6d3bb0949cc7e638ea501f011388fdb3756fe469775775803f35bc620caacR395

Finally I will get the following definition

36--39 | List<SysCodeRulePo> findAll(SysCodeRuleQueryCondition condition) {

Although annotations are a special syntax of Java, this deduplication idea can also be used in other languages. I think this is a common practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this does not mean that duplicates have been completely eliminated. In fact, I have found that duplicates exist in various languages. Only a part of it is filtered by the line count check.

The following is part of the parsing result of src\services\tree-sitter_tests_\parseSourceCodeDefinitions.html.spec.ts

1--88 | <!DOCTYPE html>
1--1 | <!DOCTYPE html>
2--88 | <html lang="en">
2--2 | <html lang="en">
3--9 | <head>
3--3 | <head>

name: (identifier) @name.definition.method)@definition.method


; Inner class declarations
(class_declaration
Expand Down
Loading