Skip to content

Commit 9b45c0e

Browse files
committed
Improve WebTreeSitter query performance
1 parent 7f3c582 commit 9b45c0e

File tree

11 files changed

+11305
-86
lines changed

11 files changed

+11305
-86
lines changed
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2018-2024 Max Brunsfeld
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
# Web Tree-sitter
2+
3+
[![npmjs.com badge]][npmjs.com]
4+
5+
[npmjs.com]: https://www.npmjs.org/package/web-tree-sitter
6+
[npmjs.com badge]: https://img.shields.io/npm/v/web-tree-sitter.svg?color=%23BF4A4A
7+
8+
WebAssembly bindings to the [Tree-sitter](https://github.com/tree-sitter/tree-sitter) parsing library.
9+
10+
## Setup
11+
12+
You can download the `web-tree-sitter.js` and `web-tree-sitter.wasm` files from [the latest GitHub release][gh release] and load
13+
them using a standalone script:
14+
15+
```html
16+
<script src="/the/path/to/web-tree-sitter.js"></script>
17+
18+
<script>
19+
const { Parser } = window.TreeSitter;
20+
Parser.init().then(() => { /* the library is ready */ });
21+
</script>
22+
```
23+
24+
You can also install [the `web-tree-sitter` module][npm module] from NPM and load it using a system like Webpack:
25+
26+
```js
27+
const { Parser } = require('web-tree-sitter');
28+
Parser.init().then(() => { /* the library is ready */ });
29+
```
30+
31+
or Vite:
32+
33+
```js
34+
import { Parser } from 'web-tree-sitter';
35+
Parser.init().then(() => { /* the library is ready */ });
36+
```
37+
38+
With Vite, you also need to make sure your server provides the `tree-sitter.wasm`
39+
file to your `public` directory. You can do this automatically with a `postinstall`
40+
[script](https://docs.npmjs.com/cli/v10/using-npm/scripts) in your `package.json`:
41+
42+
```js
43+
"postinstall": "cp node_modules/web-tree-sitter/tree-sitter.wasm public"
44+
```
45+
46+
You can also use this module with [deno](https://deno.land/):
47+
48+
```js
49+
import Parser from "npm:web-tree-sitter";
50+
await Parser.init();
51+
// the library is ready
52+
```
53+
54+
To use the debug version of the library, replace your import of `web-tree-sitter` with `web-tree-sitter/debug`:
55+
56+
```js
57+
import { Parser } from 'web-tree-sitter/debug'; // or require('web-tree-sitter/debug')
58+
59+
Parser.init().then(() => { /* the library is ready */ });
60+
```
61+
62+
This will load the debug version of the `.js` and `.wasm` file, which includes debug symbols and assertions.
63+
64+
> [!NOTE]
65+
> The `web-tree-sitter.js` file on GH releases is an ES6 module. If you are interested in using a pure CommonJS library, such
66+
> as for Electron, you should use the `web-tree-sitter.cjs` file instead.
67+
68+
### Basic Usage
69+
70+
First, create a parser:
71+
72+
```js
73+
const parser = new Parser();
74+
```
75+
76+
Then assign a language to the parser. Tree-sitter languages are packaged as individual `.wasm` files (more on this below):
77+
78+
```js
79+
const { Language } = require('web-tree-sitter');
80+
const JavaScript = await Language.load('/path/to/tree-sitter-javascript.wasm');
81+
parser.setLanguage(JavaScript);
82+
```
83+
84+
Now you can parse source code:
85+
86+
```js
87+
const sourceCode = 'let x = 1; console.log(x);';
88+
const tree = parser.parse(sourceCode);
89+
```
90+
91+
and inspect the syntax tree.
92+
93+
```javascript
94+
console.log(tree.rootNode.toString());
95+
96+
// (program
97+
// (lexical_declaration
98+
// (variable_declarator (identifier) (number)))
99+
// (expression_statement
100+
// (call_expression
101+
// (member_expression (identifier) (property_identifier))
102+
// (arguments (identifier)))))
103+
104+
const callExpression = tree.rootNode.child(1).firstChild;
105+
console.log(callExpression);
106+
107+
// { type: 'call_expression',
108+
// startPosition: {row: 0, column: 16},
109+
// endPosition: {row: 0, column: 30},
110+
// startIndex: 0,
111+
// endIndex: 30 }
112+
```
113+
114+
### Editing
115+
116+
If your source code *changes*, you can update the syntax tree. This will take less time than the first parse.
117+
118+
```javascript
119+
// Replace 'let' with 'const'
120+
const newSourceCode = 'const x = 1; console.log(x);';
121+
122+
tree.edit({
123+
startIndex: 0,
124+
oldEndIndex: 3,
125+
newEndIndex: 5,
126+
startPosition: {row: 0, column: 0},
127+
oldEndPosition: {row: 0, column: 3},
128+
newEndPosition: {row: 0, column: 5},
129+
});
130+
131+
const newTree = parser.parse(newSourceCode, tree);
132+
```
133+
134+
### Parsing Text From a Custom Data Structure
135+
136+
If your text is stored in a data structure other than a single string, you can parse it by supplying a callback to `parse`
137+
instead of a string:
138+
139+
```javascript
140+
const sourceLines = [
141+
'let x = 1;',
142+
'console.log(x);'
143+
];
144+
145+
const tree = parser.parse((index, position) => {
146+
let line = sourceLines[position.row];
147+
if (line) return line.slice(position.column);
148+
});
149+
```
150+
151+
### Getting the `.wasm` language files
152+
153+
There are several options on how to get the `.wasm` files for the languages you want to parse.
154+
155+
#### From npmjs.com
156+
157+
The recommended way is to just install the package from npm. For example, to parse JavaScript, you can install the `tree-sitter-javascript`
158+
package:
159+
160+
```sh
161+
npm install tree-sitter-javascript
162+
```
163+
164+
Then you can find the `.wasm` file in the `node_modules/tree-sitter-javascript` directory.
165+
166+
#### From GitHub
167+
168+
You can also download the `.wasm` files from GitHub releases, so long as the repository uses our reusable workflow to publish
169+
them.
170+
For example, you can download the JavaScript `.wasm` file from the tree-sitter-javascript [releases page][gh release js].
171+
172+
#### Generating `.wasm` files
173+
174+
You can also generate the `.wasm` file for your desired grammar. Shown below is an example of how to generate the `.wasm`
175+
file for the JavaScript grammar.
176+
177+
**IMPORTANT**: [Emscripten][emscripten], [Docker][docker], or [Podman][podman] need to be installed.
178+
179+
First install `tree-sitter-cli`, and the tree-sitter language for which to generate `.wasm`
180+
(`tree-sitter-javascript` in this example):
181+
182+
```sh
183+
npm install --save-dev tree-sitter-cli tree-sitter-javascript
184+
```
185+
186+
Then just use tree-sitter cli tool to generate the `.wasm`.
187+
188+
```sh
189+
npx tree-sitter build --wasm node_modules/tree-sitter-javascript
190+
```
191+
192+
If everything is fine, file `tree-sitter-javascript.wasm` should be generated in current directory.
193+
194+
### Running .wasm in Node.js
195+
196+
Notice that executing `.wasm` files in Node.js is considerably slower than running [Node.js bindings][node bindings].
197+
However, this could be useful for testing purposes:
198+
199+
```javascript
200+
const Parser = require('web-tree-sitter');
201+
202+
(async () => {
203+
await Parser.init();
204+
const parser = new Parser();
205+
const Lang = await Parser.Language.load('tree-sitter-javascript.wasm');
206+
parser.setLanguage(Lang);
207+
const tree = parser.parse('let x = 1;');
208+
console.log(tree.rootNode.toString());
209+
})();
210+
```
211+
212+
### Running .wasm in browser
213+
214+
`web-tree-sitter` can run in the browser, but there are some common pitfalls.
215+
216+
#### Loading the .wasm file
217+
218+
`web-tree-sitter` needs to load the `tree-sitter.wasm` file. By default, it assumes that this file is available in the
219+
same path as the JavaScript code. Therefore, if the code is being served from `http://localhost:3000/bundle.js`, then
220+
the wasm file should be at `http://localhost:3000/tree-sitter.wasm`.
221+
222+
For server side frameworks like NextJS, this can be tricky as pages are often served from a path such as
223+
`http://localhost:3000/_next/static/chunks/pages/index.js`. The loader will therefore look for the wasm file at
224+
`http://localhost:3000/_next/static/chunks/pages/tree-sitter.wasm`. The solution is to pass a `locateFile` function in
225+
the `moduleOptions` argument to `Parser.init()`:
226+
227+
```javascript
228+
await Parser.init({
229+
locateFile(scriptName: string, scriptDirectory: string) {
230+
return scriptName;
231+
},
232+
});
233+
```
234+
235+
`locateFile` takes in two parameters, `scriptName`, i.e. the wasm file name, and `scriptDirectory`, i.e. the directory
236+
where the loader expects the script to be. It returns the path where the loader will look for the wasm file. In the NextJS
237+
case, we want to return just the `scriptName` so that the loader will look at `http://localhost:3000/tree-sitter.wasm`
238+
and not `http://localhost:3000/_next/static/chunks/pages/tree-sitter.wasm`.
239+
240+
For more information on the module options you can pass in, see the [emscripten documentation][emscripten-module-options].
241+
242+
#### "Can't resolve 'fs' in 'node_modules/web-tree-sitter"
243+
244+
Most bundlers will notice that the `web-tree-sitter.js` file is attempting to import `fs`, i.e. node's file system library.
245+
Since this doesn't exist in the browser, the bundlers will get confused. For Webpack, you can fix this by adding the
246+
following to your webpack config:
247+
248+
```javascript
249+
{
250+
resolve: {
251+
fallback: {
252+
fs: false
253+
}
254+
}
255+
}
256+
```
257+
258+
[docker]: https://www.docker.com
259+
[emscripten]: https://emscripten.org
260+
[emscripten-module-options]: https://emscripten.org/docs/api_reference/module.html#affecting-execution
261+
[gh release]: https://github.com/tree-sitter/tree-sitter/releases/latest
262+
[gh release js]: https://github.com/tree-sitter/tree-sitter-javascript/releases/latest
263+
[node bindings]: https://github.com/tree-sitter/node-tree-sitter
264+
[npm module]: https://www.npmjs.com/package/web-tree-sitter
265+
[podman]: https://podman.io

0 commit comments

Comments
 (0)