Skip to content

Commit f36aa76

Browse files
committed
docs: add instructions for integrating tree-sitter-abl and tree-sitter-df as git submodules
- Created comprehensive guide for adding new tree-sitter language support - Documented steps for adding git submodules in /deps directory - Included instructions for building WASM files from source - Explained how to update build process, file extensions, and parsers - Added guidance for GitHub Actions integration - Provided troubleshooting tips and alternative approaches Addresses #7519
1 parent ae01a90 commit f36aa76

File tree

1 file changed

+367
-0
lines changed

1 file changed

+367
-0
lines changed

docs/ADD_TREE_SITTER_LANGUAGES.md

Lines changed: 367 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,367 @@
1+
# Adding Tree-Sitter Language Support via Git Submodules
2+
3+
This document provides step-by-step instructions for adding new tree-sitter language parsers (specifically tree-sitter-abl and tree-sitter-df) to the Roo Code codebase using git submodules.
4+
5+
## Overview
6+
7+
The goal is to integrate the following tree-sitter repositories:
8+
9+
- [tree-sitter-abl](https://github.com/usagi-coffee/tree-sitter-abl) - For OpenEdge ABL language support
10+
- [tree-sitter-df](https://github.com/usagi-coffee/tree-sitter-df) - For OpenEdge Data Dictionary (.df) file support
11+
12+
## Step-by-Step Instructions
13+
14+
### 1. Add Git Submodules
15+
16+
First, create a `/deps` directory in the project root and add the tree-sitter repositories as submodules:
17+
18+
```bash
19+
# Create deps directory if it doesn't exist
20+
mkdir -p deps
21+
22+
# Add tree-sitter-abl as a submodule
23+
git submodule add https://github.com/usagi-coffee/tree-sitter-abl.git deps/tree-sitter-abl
24+
25+
# Add tree-sitter-df as a submodule
26+
git submodule add https://github.com/usagi-coffee/tree-sitter-df.git deps/tree-sitter-df
27+
28+
# Initialize and update submodules
29+
git submodule update --init --recursive
30+
```
31+
32+
### 2. Build WASM Files from Submodules
33+
34+
You'll need to compile the tree-sitter grammars to WASM format. This requires the tree-sitter CLI tool:
35+
36+
```bash
37+
# Install tree-sitter CLI if not already installed
38+
npm install -g tree-sitter-cli
39+
40+
# Build WASM for tree-sitter-abl
41+
cd deps/tree-sitter-abl
42+
tree-sitter build --wasm
43+
# This creates tree-sitter-abl.wasm
44+
45+
# Build WASM for tree-sitter-df
46+
cd ../tree-sitter-df
47+
tree-sitter build --wasm
48+
# This creates tree-sitter-df.wasm
49+
50+
cd ../..
51+
```
52+
53+
### 3. Update Build Process to Copy WASM Files
54+
55+
Modify `packages/build/src/esbuild.ts` to include the new WASM files in the build process:
56+
57+
```typescript
58+
// In the copyWasms function, after copying from tree-sitter-wasms:
59+
60+
// Copy custom tree-sitter WASM files from deps
61+
const customWasmFiles = [
62+
{ source: "deps/tree-sitter-abl/tree-sitter-abl.wasm", name: "tree-sitter-abl.wasm" },
63+
{ source: "deps/tree-sitter-df/tree-sitter-df.wasm", name: "tree-sitter-df.wasm" },
64+
]
65+
66+
customWasmFiles.forEach(({ source, name }) => {
67+
const sourcePath = path.join(srcDir, "..", source)
68+
if (fs.existsSync(sourcePath)) {
69+
fs.copyFileSync(sourcePath, path.join(distDir, name))
70+
console.log(`[copyWasms] Copied custom ${name} to ${distDir}`)
71+
} else {
72+
console.warn(`[copyWasms] Custom WASM file not found: ${sourcePath}`)
73+
}
74+
})
75+
```
76+
77+
### 4. Add File Extensions to Scanner
78+
79+
Update `src/services/tree-sitter/index.ts` to include the new file extensions:
80+
81+
```typescript
82+
const extensions = [
83+
// ... existing extensions ...
84+
85+
// OpenEdge ABL
86+
"p", // ABL procedure files
87+
"i", // ABL include files
88+
"w", // ABL window files
89+
"cls", // ABL class files
90+
91+
// OpenEdge Data Dictionary
92+
"df", // Data dictionary files
93+
94+
// ... rest of extensions ...
95+
].map((e) => `.${e}`)
96+
```
97+
98+
### 5. Add Language Parser Support
99+
100+
Update `src/services/tree-sitter/languageParser.ts` to handle the new languages:
101+
102+
```typescript
103+
// Add imports for the new query strings (create these first - see step 6)
104+
import { ablQuery } from "./queries/abl"
105+
import { dfQuery } from "./queries/df"
106+
107+
// In the loadRequiredLanguageParsers function, add cases:
108+
case "p":
109+
case "i":
110+
case "w":
111+
case "cls":
112+
language = await loadLanguage("abl", sourceDirectory)
113+
query = new Query(language, ablQuery)
114+
break
115+
116+
case "df":
117+
language = await loadLanguage("df", sourceDirectory)
118+
query = new Query(language, dfQuery)
119+
break
120+
```
121+
122+
### 6. Create Query Files
123+
124+
Create query files for the new languages:
125+
126+
**src/services/tree-sitter/queries/abl.ts:**
127+
128+
```typescript
129+
export default `
130+
; ABL Query for code definitions
131+
; Based on tree-sitter-abl grammar
132+
133+
; Procedure definitions
134+
(procedure_statement
135+
name: (identifier) @name.definition.function)
136+
137+
; Function definitions
138+
(function_statement
139+
name: (identifier) @name.definition.function)
140+
141+
; Method definitions
142+
(method_statement
143+
name: (identifier) @name.definition.method)
144+
145+
; Class definitions
146+
(class_statement
147+
name: (identifier) @name.definition.class)
148+
149+
; Interface definitions
150+
(interface_statement
151+
name: (identifier) @name.definition.interface)
152+
153+
; Variable definitions
154+
(define_variable_statement
155+
name: (identifier) @name.definition.variable)
156+
157+
; Property definitions
158+
(define_property_statement
159+
name: (identifier) @name.definition.property)
160+
161+
; Temp-table definitions
162+
(define_temp_table_statement
163+
name: (identifier) @name.definition.table)
164+
`
165+
```
166+
167+
**src/services/tree-sitter/queries/df.ts:**
168+
169+
```typescript
170+
export default `
171+
; Data Dictionary Query for schema definitions
172+
; Based on tree-sitter-df grammar
173+
174+
; Table definitions
175+
(table_definition
176+
name: (identifier) @name.definition.table)
177+
178+
; Field definitions
179+
(field_definition
180+
name: (identifier) @name.definition.field)
181+
182+
; Index definitions
183+
(index_definition
184+
name: (identifier) @name.definition.index)
185+
186+
; Sequence definitions
187+
(sequence_definition
188+
name: (identifier) @name.definition.sequence)
189+
`
190+
```
191+
192+
### 7. Add to Fallback Extensions (Optional)
193+
194+
If the parsers are not stable or complete, you may want to add these extensions to the fallback list in `src/services/code-index/shared/supported-extensions.ts`:
195+
196+
```typescript
197+
export const fallbackExtensions = [
198+
// ... existing extensions ...
199+
".p", // ABL - use fallback if parser is incomplete
200+
".i", // ABL include
201+
".w", // ABL window
202+
".cls", // ABL class
203+
".df", // Data dictionary
204+
]
205+
```
206+
207+
### 8. Update GitHub Actions Workflow
208+
209+
Modify `.github/workflows/code-qa.yml` to handle submodules:
210+
211+
```yaml
212+
- name: Checkout code
213+
uses: actions/checkout@v4
214+
with:
215+
submodules: recursive # Add this line to checkout submodules
216+
217+
# Add a step to build WASM files from submodules
218+
- name: Build custom tree-sitter WASM files
219+
run: |
220+
# Install tree-sitter CLI
221+
npm install -g tree-sitter-cli
222+
223+
# Build ABL WASM
224+
if [ -d "deps/tree-sitter-abl" ]; then
225+
cd deps/tree-sitter-abl
226+
tree-sitter build --wasm
227+
cd ../..
228+
fi
229+
230+
# Build DF WASM
231+
if [ -d "deps/tree-sitter-df" ]; then
232+
cd deps/tree-sitter-df
233+
tree-sitter build --wasm
234+
cd ../..
235+
fi
236+
```
237+
238+
### 9. Add Tests
239+
240+
Create test files to verify the new language support:
241+
242+
**src/services/tree-sitter/**tests**/parseSourceCodeDefinitions.abl.spec.ts:**
243+
244+
```typescript
245+
import { describe, it, expect } from "vitest"
246+
import { parseTestFile } from "./helpers"
247+
import ablQuery from "../queries/abl"
248+
249+
describe("parseSourceCodeDefinitions - ABL", () => {
250+
it("should parse ABL procedure definitions", async () => {
251+
const { captures } = await parseTestFile({
252+
language: "abl",
253+
wasmFile: "tree-sitter-abl.wasm",
254+
queryString: ablQuery,
255+
content: `
256+
PROCEDURE myProcedure:
257+
DEFINE VARIABLE x AS INTEGER NO-UNDO.
258+
x = 10.
259+
END PROCEDURE.
260+
261+
FUNCTION myFunction RETURNS INTEGER:
262+
RETURN 42.
263+
END FUNCTION.
264+
`,
265+
})
266+
267+
expect(captures).toContainEqual(
268+
expect.objectContaining({
269+
name: "name.definition.function",
270+
node: expect.objectContaining({ text: "myProcedure" }),
271+
}),
272+
)
273+
})
274+
})
275+
```
276+
277+
### 10. Update Documentation
278+
279+
Add the new languages to any relevant documentation:
280+
281+
1. Update README.md to mention OpenEdge ABL support
282+
2. Add to the list of supported languages in documentation
283+
3. Update CHANGELOG.md with the new feature
284+
285+
## Building and Testing
286+
287+
After making all changes:
288+
289+
```bash
290+
# Install dependencies
291+
pnpm install
292+
293+
# Build the project
294+
pnpm build
295+
296+
# Run tests
297+
pnpm test
298+
299+
# Bundle the extension
300+
pnpm bundle
301+
```
302+
303+
## Maintenance
304+
305+
### Updating Submodules
306+
307+
To update the submodules to their latest versions:
308+
309+
```bash
310+
git submodule update --remote --merge
311+
```
312+
313+
### Adding More Languages
314+
315+
Follow the same pattern:
316+
317+
1. Add submodule to `/deps`
318+
2. Build WASM file
319+
3. Add to build process
320+
4. Add file extensions
321+
5. Add parser cases
322+
6. Create query files
323+
7. Add tests
324+
325+
## Troubleshooting
326+
327+
### WASM Build Failures
328+
329+
If the tree-sitter CLI fails to build WASM:
330+
331+
- Ensure you have the latest tree-sitter CLI: `npm update -g tree-sitter-cli`
332+
- Check that the grammar has a valid `grammar.js` file
333+
- Verify Node.js version compatibility
334+
335+
### Parser Not Working
336+
337+
If files are not being parsed:
338+
339+
1. Check that file extensions are added to `src/services/tree-sitter/index.ts`
340+
2. Verify WASM files are being copied to dist directory
341+
3. Check browser console for WASM loading errors
342+
4. Test with fallback chunking first to isolate parser issues
343+
344+
### Query Issues
345+
346+
If queries don't capture expected definitions:
347+
348+
- Use tree-sitter playground to test queries
349+
- Check the grammar's node types match query patterns
350+
- Start with simple queries and gradually add complexity
351+
352+
## Alternative Approach: Using npm Packages
353+
354+
If the repositories provide npm packages with prebuilt WASM files, you could alternatively:
355+
356+
1. Add them as dependencies in `src/package.json`
357+
2. Import WASM files from node_modules
358+
3. Skip the submodule approach entirely
359+
360+
This would be simpler but requires the maintainers to publish npm packages with WASM builds.
361+
362+
## References
363+
364+
- [Tree-sitter Documentation](https://tree-sitter.github.io/tree-sitter/)
365+
- [Web Tree-sitter](https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web)
366+
- [Creating Tree-sitter Parsers](https://tree-sitter.github.io/tree-sitter/creating-parsers)
367+
- [Tree-sitter Queries](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries)

0 commit comments

Comments
 (0)