Skip to content

Commit d7ad947

Browse files
KJ7LNWEric Wheeler
andauthored
Tree-sitter Enhancements: TSX, TypeScript, JSON, and Markdown Support (RooCodeInc#2169)
* feat: add Tree-sitter TSX query support Added support for parsing TSX files with Tree-sitter: - Created TSX query patterns for React components and functions - Fixed field name issues by using direct node matching - Added comprehensive documentation about TSX component structure - Created a debug tool to inspect the actual tree structure - Added test coverage for TSX parsing - Embedded test fixture directly in the test file Signed-off-by: Eric Wheeler <[email protected]> * refactor: combine TypeScript and TSX queries Reduce duplication by: - Import TypeScript queries from typescript.ts - Keep only TSX-specific component and JSX queries - Update documentation to reflect combined approach Signed-off-by: Eric Wheeler <[email protected]> * test: update integration for parse definitions - Adjusted logParseResult to call the actual parse definitions function using WASM from initializeWorkingParser. - Patched TreeSitter initialization to resolve the WASM path correctly and bypass redundant init() calls. Signed-off-by: Eric Wheeler <[email protected]> * test: should successfully call parseSourceCodeDefinitionsForFile Mock loadRequiredLanguageParsers to use real parser instance from initializeTreeSitter, ensuring proper interaction between parseSourceCodeDefinitionsForFile and its dependencies. Signed-off-by: Eric Wheeler <[email protected]> * test: improve test structure for parseSourceCodeDefinitions - Combine component parsing tests into a single test case - Update test assertions to match actual parser output - Fix interface and component definition tests to use VSCodeCheckbox as sample Signed-off-by: Eric Wheeler <[email protected]> * feat: improve React component detection in tree-sitter - Add tests for complex TSX structures (nested components, HOCs) - Enhance TSX queries to better detect React components - Document current limitations in test file * feat: improve React component detection in TSX files Make TSX/React component detection more generic and robust by: - Implementing structural pattern matching instead of specific React wrapper functions - Adding configurable line threshold for React component inclusion (MIN_COMPONENT_LINES) - Adding robust HTML element filtering with regex patterns - Improving React component name handling for nested components - Ensuring proper context handling for multi-line React components Also update mock captures in tree-sitter tests to span at least 4 lines to meet the MIN_COMPONENT_LINES threshold. Signed-off-by: Eric Wheeler <[email protected]> * refactor: use testParseSourceCodeDefinitions in logParseResult - Moved testParseSourceCodeDefinitions function to top level - Updated logParseResult to use testParseSourceCodeDefinitions - Removed duplicate function from describe block - Added console.log for debugging output Signed-off-by: Eric Wheeler <[email protected]> * refactor: improve React component detection output format - Remove individual line output for components that don't meet MIN_COMPONENT_LINES threshold - Update tests to match the new behavior where lines < MIN_COMPONENT_LINES are skipped - Maintain range output (e.g., 1--4) for component definitions - Preserve context for larger definitions as ranges only Signed-off-by: Eric Wheeler <[email protected]> * feat: add switch/case statement support to tree-sitter TypeScript parser - Added node patterns in typescript.ts query to capture switch statements, case clauses, and default clauses - Modified index.ts to avoid duplicate line ranges in the output - Added test for switch/case statements with complex case blocks - Fixed line range tracking to prevent duplicate output Signed-off-by: Eric Wheeler <[email protected]> * feat: add support for enum declarations in tree-sitter TypeScript parser Signed-off-by: Eric Wheeler <[email protected]> * feat: add support for namespace declarations in tree-sitter TypeScript parser Added query pattern for namespace declarations (internal_module nodes) in the TypeScript parser. This allows the parser to identify and extract namespace declarations from TypeScript code. - Added test case to verify namespace parsing functionality - Added query pattern to capture namespace declarations in typescript.ts Signed-off-by: Eric Wheeler <[email protected]> * feat: add decorator pattern support to tree-sitter TypeScript parser - Added support for parsing complex decorators with arguments - Added test case for Component decorator pattern - Enhanced TypeScript queries to capture decorator definitions Signed-off-by: Eric Wheeler <[email protected]> * feat: add generic type declaration support to tree-sitter TypeScript parser - Added support for parsing generic types with constraints - Added test case for Dictionary<K extends string | number, V> pattern - Enhanced TypeScript queries to capture generic type definitions Signed-off-by: Eric Wheeler <[email protected]> * test: add conditional type support to tree-sitter TypeScript parser - Added test case for conditional type patterns like ReturnType<T> - Verified that conditional types with infer keyword are already supported - Enhanced inspectTreeStructure with detailed node inspection for debugging Signed-off-by: Eric Wheeler <[email protected]> * test: add template literal type support to tree-sitter TypeScript parser - Added test case for template literal type patterns like EventName<T> - Verified that template literal types are already partially supported - Confirmed support for complex template literal patterns in conditional types Signed-off-by: Eric Wheeler <[email protected]> * feat: implement tree-sitter compatible markdown processor Adds a special case implementation for markdown files that: - Parses markdown headers and section line ranges - Returns captures in a format compatible with tree-sitter - Integrates with the existing parseFile function - Includes comprehensive tests for the implementation Signed-off-by: Eric Wheeler <[email protected]> * fix: markdown parser not detecting sections with horizontal rules The markdownParser was incorrectly interpreting horizontal rules (---) as setext headers when they appeared after non-header text. This caused some sections to be missed in the output. This fix: - Makes setext header detection more strict by requiring at least 3 = or - characters - Adds validation for the text line before a potential setext header - Ensures horizontal rules are not confused with setext headers Added a test case to verify the fix works correctly with horizontal rules. Signed-off-by: Eric Wheeler <[email protected]> * refactor: move helper functions to dedicated file Move test helper functions from parseSourceCodeDefinitions.test.ts to a new helpers.ts file. Rename test file to parseSourceCodeDefinitions.tsx.test.ts to indicate it's for TSX tests. This improves code organization by separating test helpers from test cases. Signed-off-by: Eric Wheeler <[email protected]> * feat: enable JSON structure display in list_code_definitions This change allows list_code_definitions to show JSON structures by: - Moving JSON query patterns into the JavaScript query file - Using the JavaScript parser for JSON files - Removing the separate JSON parser implementation - Adding comprehensive tests for JSON parsing Example output for a JSON file: # test.json 0--90 | { 1--9 | "server": { 4--8 | "ssl": { 10--45 | "database": { 11--24 | "primary": { 14--18 | "credentials": { 19--23 | "pool": { 25--44 | "replicas": [ 26--43 | { 30--42 | "status": { 33--41 | "metrics": { 36--40 | "connections": { 46--73 | "features": { 47--72 | "auth": { 48--71 | "providers": { 49--53 | "local": { 54--70 | "oauth": { 56--69 | "providers": [ 57--68 | { Signed-off-by: Eric Wheeler <[email protected]> * fix: Add compile step to CI workflow to ensure WASM files are available The TreeSitter tests were failing in CI because the WASM files weren't being copied to the dist directory before running the tests. This adds an explicit compile step to ensure the WASM files are properly built and copied. Signed-off-by: Eric Wheeler <[email protected]> * fix: improve tree-sitter test type safety and debug logging - Replace 'any' types with proper Parser types in helpers.ts - Add centralized DEBUG flag and debugLog function in helpers.ts - Update all console.log statements to use debugLog across all test files - This change appeases @ellipsis-dev by improving type safety and providing a clean way to control debug logging Signed-off-by: Eric Wheeler <[email protected]> --------- Signed-off-by: Eric Wheeler <[email protected]> Co-authored-by: Eric Wheeler <[email protected]>
1 parent b9f4695 commit d7ad947

File tree

14 files changed

+2530
-119
lines changed

14 files changed

+2530
-119
lines changed

.github/workflows/code-qa.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,8 @@ jobs:
7676
cache: 'npm'
7777
- name: Install dependencies
7878
run: npm run install:all
79+
- name: Compile (to build and copy WASM files)
80+
run: npm run compile
7981
- name: Run unit tests
8082
run: npx jest --silent
8183

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
import { jest } from "@jest/globals"
2+
import { parseSourceCodeDefinitionsForFile } from ".."
3+
import * as fs from "fs/promises"
4+
import * as path from "path"
5+
import Parser from "web-tree-sitter"
6+
import tsxQuery from "../queries/tsx"
7+
8+
// Global debug flag - set to 0 to disable debug logging
9+
export const DEBUG = 0
10+
11+
// Debug function to conditionally log messages
12+
export const debugLog = (message: string, ...args: any[]) => {
13+
if (DEBUG) {
14+
console.debug(message, ...args)
15+
}
16+
}
17+
18+
// Mock fs module
19+
const mockedFs = jest.mocked(fs)
20+
21+
// Store the initialized TreeSitter for reuse
22+
let initializedTreeSitter: Parser | null = null
23+
24+
// Function to initialize tree-sitter
25+
export async function initializeTreeSitter() {
26+
if (initializedTreeSitter) {
27+
return initializedTreeSitter
28+
}
29+
30+
const TreeSitter = await initializeWorkingParser()
31+
const wasmPath = path.join(process.cwd(), "dist/tree-sitter-tsx.wasm")
32+
const tsxLang = await TreeSitter.Language.load(wasmPath)
33+
34+
initializedTreeSitter = TreeSitter
35+
return TreeSitter
36+
}
37+
38+
// Function to initialize a working parser with correct WASM path
39+
// DO NOT CHANGE THIS FUNCTION
40+
export async function initializeWorkingParser() {
41+
const TreeSitter = jest.requireActual("web-tree-sitter") as any
42+
43+
// Initialize directly using the default export or the module itself
44+
const ParserConstructor = TreeSitter.default || TreeSitter
45+
await ParserConstructor.init()
46+
47+
// Override the Parser.Language.load to use dist directory
48+
const originalLoad = TreeSitter.Language.load
49+
TreeSitter.Language.load = async (wasmPath: string) => {
50+
const filename = path.basename(wasmPath)
51+
const correctPath = path.join(process.cwd(), "dist", filename)
52+
// console.log(`Redirecting WASM load from ${wasmPath} to ${correctPath}`)
53+
return originalLoad(correctPath)
54+
}
55+
56+
return TreeSitter
57+
}
58+
59+
// Test helper for parsing source code definitions
60+
export async function testParseSourceCodeDefinitions(
61+
testFilePath: string,
62+
content: string,
63+
options: {
64+
language?: string
65+
wasmFile?: string
66+
queryString?: string
67+
extKey?: string
68+
} = {},
69+
): Promise<string | undefined> {
70+
// Set default options
71+
const language = options.language || "tsx"
72+
const wasmFile = options.wasmFile || "tree-sitter-tsx.wasm"
73+
const queryString = options.queryString || tsxQuery
74+
const extKey = options.extKey || "tsx"
75+
76+
// Clear any previous mocks
77+
jest.clearAllMocks()
78+
79+
// Mock fs.readFile to return our sample content
80+
mockedFs.readFile.mockResolvedValue(content)
81+
82+
// Get the mock function
83+
const mockedLoadRequiredLanguageParsers = require("../languageParser").loadRequiredLanguageParsers
84+
85+
// Initialize TreeSitter and create a real parser
86+
const TreeSitter = await initializeTreeSitter()
87+
const parser = new TreeSitter()
88+
89+
// Load language and configure parser
90+
const wasmPath = path.join(process.cwd(), `dist/${wasmFile}`)
91+
const lang = await TreeSitter.Language.load(wasmPath)
92+
parser.setLanguage(lang)
93+
94+
// Create a real query
95+
const query = lang.query(queryString)
96+
97+
// Set up our language parser with real parser and query
98+
const mockLanguageParser: any = {}
99+
mockLanguageParser[extKey] = { parser, query }
100+
101+
// Configure the mock to return our parser
102+
mockedLoadRequiredLanguageParsers.mockResolvedValue(mockLanguageParser)
103+
104+
// Call the function under test
105+
const result = await parseSourceCodeDefinitionsForFile(testFilePath)
106+
107+
// Verify loadRequiredLanguageParsers was called with the expected file path
108+
expect(mockedLoadRequiredLanguageParsers).toHaveBeenCalledWith([testFilePath])
109+
expect(mockedLoadRequiredLanguageParsers).toHaveBeenCalled()
110+
111+
debugLog(`content:\n${content}\n\nResult:\n${result}`)
112+
return result
113+
}
114+
115+
// Helper function to inspect tree structure
116+
export async function inspectTreeStructure(content: string, language: string = "typescript"): Promise<void> {
117+
const TreeSitter = await initializeTreeSitter()
118+
const parser = new TreeSitter()
119+
const wasmPath = path.join(process.cwd(), `dist/tree-sitter-${language}.wasm`)
120+
const lang = await TreeSitter.Language.load(wasmPath)
121+
parser.setLanguage(lang)
122+
123+
// Parse the content
124+
const tree = parser.parse(content)
125+
126+
// Print the tree structure
127+
debugLog(`TREE STRUCTURE (${language}):\n${tree.rootNode.toString()}`)
128+
129+
// Add more detailed debug information
130+
debugLog("\nDETAILED NODE INSPECTION:")
131+
132+
// Function to recursively print node details
133+
const printNodeDetails = (node: Parser.SyntaxNode, depth: number = 0) => {
134+
const indent = " ".repeat(depth)
135+
debugLog(
136+
`${indent}Node Type: ${node.type}, Start: ${node.startPosition.row}:${node.startPosition.column}, End: ${node.endPosition.row}:${node.endPosition.column}`,
137+
)
138+
139+
// Print children
140+
for (let i = 0; i < node.childCount; i++) {
141+
const child = node.child(i)
142+
if (child) {
143+
// For type_alias_declaration nodes, print more details
144+
if (node.type === "type_alias_declaration") {
145+
debugLog(`${indent} TYPE ALIAS: ${node.text}`)
146+
}
147+
148+
// For conditional_type nodes, print more details
149+
if (node.type === "conditional_type" || child.type === "conditional_type") {
150+
debugLog(`${indent} CONDITIONAL TYPE FOUND: ${child.text}`)
151+
}
152+
153+
// For infer_type nodes, print more details
154+
if (node.type === "infer_type" || child.type === "infer_type") {
155+
debugLog(`${indent} INFER TYPE FOUND: ${child.text}`)
156+
}
157+
158+
printNodeDetails(child, depth + 1)
159+
}
160+
}
161+
}
162+
163+
// Start recursive printing from the root node
164+
printNodeDetails(tree.rootNode)
165+
}

0 commit comments

Comments
 (0)