|
| 1 | +# Web Tree-sitter |
| 2 | + |
| 3 | +[![npmjs.com badge]][npmjs.com] |
| 4 | + |
| 5 | +[npmjs.com]: https://www.npmjs.org/package/web-tree-sitter |
| 6 | +[npmjs.com badge]: https://img.shields.io/npm/v/web-tree-sitter.svg?color=%23BF4A4A |
| 7 | + |
| 8 | +WebAssembly bindings to the [Tree-sitter](https://github.com/tree-sitter/tree-sitter) parsing library. |
| 9 | + |
| 10 | +## Setup |
| 11 | + |
| 12 | +You can download the `web-tree-sitter.js` and `web-tree-sitter.wasm` files from [the latest GitHub release][gh release] and load |
| 13 | +them using a standalone script: |
| 14 | + |
| 15 | +```html |
| 16 | +<script src="/the/path/to/web-tree-sitter.js"></script> |
| 17 | + |
| 18 | +<script> |
| 19 | + const { Parser } = window.TreeSitter; |
| 20 | + Parser.init().then(() => { /* the library is ready */ }); |
| 21 | +</script> |
| 22 | +``` |
| 23 | + |
| 24 | +You can also install [the `web-tree-sitter` module][npm module] from NPM and load it using a system like Webpack: |
| 25 | + |
| 26 | +```js |
| 27 | +const { Parser } = require('web-tree-sitter'); |
| 28 | +Parser.init().then(() => { /* the library is ready */ }); |
| 29 | +``` |
| 30 | + |
| 31 | +or Vite: |
| 32 | + |
| 33 | +```js |
| 34 | +import { Parser } from 'web-tree-sitter'; |
| 35 | +Parser.init().then(() => { /* the library is ready */ }); |
| 36 | +``` |
| 37 | + |
| 38 | +With Vite, you also need to make sure your server provides the `tree-sitter.wasm` |
| 39 | +file to your `public` directory. You can do this automatically with a `postinstall` |
| 40 | +[script](https://docs.npmjs.com/cli/v10/using-npm/scripts) in your `package.json`: |
| 41 | + |
| 42 | +```js |
| 43 | +"postinstall": "cp node_modules/web-tree-sitter/tree-sitter.wasm public" |
| 44 | +``` |
| 45 | + |
| 46 | +You can also use this module with [deno](https://deno.land/): |
| 47 | + |
| 48 | +```js |
| 49 | +import Parser from "npm:web-tree-sitter"; |
| 50 | +await Parser.init(); |
| 51 | +// the library is ready |
| 52 | +``` |
| 53 | + |
| 54 | +To use the debug version of the library, replace your import of `web-tree-sitter` with `web-tree-sitter/debug`: |
| 55 | + |
| 56 | +```js |
| 57 | +import { Parser } from 'web-tree-sitter/debug'; // or require('web-tree-sitter/debug') |
| 58 | + |
| 59 | +Parser.init().then(() => { /* the library is ready */ }); |
| 60 | +``` |
| 61 | + |
| 62 | +This will load the debug version of the `.js` and `.wasm` file, which includes debug symbols and assertions. |
| 63 | + |
| 64 | +> [!NOTE] |
| 65 | +> The `web-tree-sitter.js` file on GH releases is an ES6 module. If you are interested in using a pure CommonJS library, such |
| 66 | +> as for Electron, you should use the `web-tree-sitter.cjs` file instead. |
| 67 | +
|
| 68 | +### Basic Usage |
| 69 | + |
| 70 | +First, create a parser: |
| 71 | + |
| 72 | +```js |
| 73 | +const parser = new Parser(); |
| 74 | +``` |
| 75 | + |
| 76 | +Then assign a language to the parser. Tree-sitter languages are packaged as individual `.wasm` files (more on this below): |
| 77 | + |
| 78 | +```js |
| 79 | +const { Language } = require('web-tree-sitter'); |
| 80 | +const JavaScript = await Language.load('/path/to/tree-sitter-javascript.wasm'); |
| 81 | +parser.setLanguage(JavaScript); |
| 82 | +``` |
| 83 | + |
| 84 | +Now you can parse source code: |
| 85 | + |
| 86 | +```js |
| 87 | +const sourceCode = 'let x = 1; console.log(x);'; |
| 88 | +const tree = parser.parse(sourceCode); |
| 89 | +``` |
| 90 | + |
| 91 | +and inspect the syntax tree. |
| 92 | + |
| 93 | +```javascript |
| 94 | +console.log(tree.rootNode.toString()); |
| 95 | + |
| 96 | +// (program |
| 97 | +// (lexical_declaration |
| 98 | +// (variable_declarator (identifier) (number))) |
| 99 | +// (expression_statement |
| 100 | +// (call_expression |
| 101 | +// (member_expression (identifier) (property_identifier)) |
| 102 | +// (arguments (identifier))))) |
| 103 | + |
| 104 | +const callExpression = tree.rootNode.child(1).firstChild; |
| 105 | +console.log(callExpression); |
| 106 | + |
| 107 | +// { type: 'call_expression', |
| 108 | +// startPosition: {row: 0, column: 16}, |
| 109 | +// endPosition: {row: 0, column: 30}, |
| 110 | +// startIndex: 0, |
| 111 | +// endIndex: 30 } |
| 112 | +``` |
| 113 | + |
| 114 | +### Editing |
| 115 | + |
| 116 | +If your source code *changes*, you can update the syntax tree. This will take less time than the first parse. |
| 117 | + |
| 118 | +```javascript |
| 119 | +// Replace 'let' with 'const' |
| 120 | +const newSourceCode = 'const x = 1; console.log(x);'; |
| 121 | + |
| 122 | +tree.edit({ |
| 123 | + startIndex: 0, |
| 124 | + oldEndIndex: 3, |
| 125 | + newEndIndex: 5, |
| 126 | + startPosition: {row: 0, column: 0}, |
| 127 | + oldEndPosition: {row: 0, column: 3}, |
| 128 | + newEndPosition: {row: 0, column: 5}, |
| 129 | +}); |
| 130 | + |
| 131 | +const newTree = parser.parse(newSourceCode, tree); |
| 132 | +``` |
| 133 | + |
| 134 | +### Parsing Text From a Custom Data Structure |
| 135 | + |
| 136 | +If your text is stored in a data structure other than a single string, you can parse it by supplying a callback to `parse` |
| 137 | +instead of a string: |
| 138 | + |
| 139 | +```javascript |
| 140 | +const sourceLines = [ |
| 141 | + 'let x = 1;', |
| 142 | + 'console.log(x);' |
| 143 | +]; |
| 144 | + |
| 145 | +const tree = parser.parse((index, position) => { |
| 146 | + let line = sourceLines[position.row]; |
| 147 | + if (line) return line.slice(position.column); |
| 148 | +}); |
| 149 | +``` |
| 150 | + |
| 151 | +### Getting the `.wasm` language files |
| 152 | + |
| 153 | +There are several options on how to get the `.wasm` files for the languages you want to parse. |
| 154 | + |
| 155 | +#### From npmjs.com |
| 156 | + |
| 157 | +The recommended way is to just install the package from npm. For example, to parse JavaScript, you can install the `tree-sitter-javascript` |
| 158 | +package: |
| 159 | + |
| 160 | +```sh |
| 161 | +npm install tree-sitter-javascript |
| 162 | +``` |
| 163 | + |
| 164 | +Then you can find the `.wasm` file in the `node_modules/tree-sitter-javascript` directory. |
| 165 | + |
| 166 | +#### From GitHub |
| 167 | + |
| 168 | +You can also download the `.wasm` files from GitHub releases, so long as the repository uses our reusable workflow to publish |
| 169 | +them. |
| 170 | +For example, you can download the JavaScript `.wasm` file from the tree-sitter-javascript [releases page][gh release js]. |
| 171 | + |
| 172 | +#### Generating `.wasm` files |
| 173 | + |
| 174 | +You can also generate the `.wasm` file for your desired grammar. Shown below is an example of how to generate the `.wasm` |
| 175 | +file for the JavaScript grammar. |
| 176 | + |
| 177 | +**IMPORTANT**: [Emscripten][emscripten], [Docker][docker], or [Podman][podman] need to be installed. |
| 178 | + |
| 179 | +First install `tree-sitter-cli`, and the tree-sitter language for which to generate `.wasm` |
| 180 | +(`tree-sitter-javascript` in this example): |
| 181 | + |
| 182 | +```sh |
| 183 | +npm install --save-dev tree-sitter-cli tree-sitter-javascript |
| 184 | +``` |
| 185 | + |
| 186 | +Then just use tree-sitter cli tool to generate the `.wasm`. |
| 187 | + |
| 188 | +```sh |
| 189 | +npx tree-sitter build --wasm node_modules/tree-sitter-javascript |
| 190 | +``` |
| 191 | + |
| 192 | +If everything is fine, file `tree-sitter-javascript.wasm` should be generated in current directory. |
| 193 | + |
| 194 | +### Running .wasm in Node.js |
| 195 | + |
| 196 | +Notice that executing `.wasm` files in Node.js is considerably slower than running [Node.js bindings][node bindings]. |
| 197 | +However, this could be useful for testing purposes: |
| 198 | + |
| 199 | +```javascript |
| 200 | +const Parser = require('web-tree-sitter'); |
| 201 | + |
| 202 | +(async () => { |
| 203 | + await Parser.init(); |
| 204 | + const parser = new Parser(); |
| 205 | + const Lang = await Parser.Language.load('tree-sitter-javascript.wasm'); |
| 206 | + parser.setLanguage(Lang); |
| 207 | + const tree = parser.parse('let x = 1;'); |
| 208 | + console.log(tree.rootNode.toString()); |
| 209 | +})(); |
| 210 | +``` |
| 211 | + |
| 212 | +### Running .wasm in browser |
| 213 | + |
| 214 | +`web-tree-sitter` can run in the browser, but there are some common pitfalls. |
| 215 | + |
| 216 | +#### Loading the .wasm file |
| 217 | + |
| 218 | +`web-tree-sitter` needs to load the `tree-sitter.wasm` file. By default, it assumes that this file is available in the |
| 219 | +same path as the JavaScript code. Therefore, if the code is being served from `http://localhost:3000/bundle.js`, then |
| 220 | +the wasm file should be at `http://localhost:3000/tree-sitter.wasm`. |
| 221 | + |
| 222 | +For server side frameworks like NextJS, this can be tricky as pages are often served from a path such as |
| 223 | +`http://localhost:3000/_next/static/chunks/pages/index.js`. The loader will therefore look for the wasm file at |
| 224 | +`http://localhost:3000/_next/static/chunks/pages/tree-sitter.wasm`. The solution is to pass a `locateFile` function in |
| 225 | +the `moduleOptions` argument to `Parser.init()`: |
| 226 | + |
| 227 | +```javascript |
| 228 | +await Parser.init({ |
| 229 | + locateFile(scriptName: string, scriptDirectory: string) { |
| 230 | + return scriptName; |
| 231 | + }, |
| 232 | +}); |
| 233 | +``` |
| 234 | + |
| 235 | +`locateFile` takes in two parameters, `scriptName`, i.e. the wasm file name, and `scriptDirectory`, i.e. the directory |
| 236 | +where the loader expects the script to be. It returns the path where the loader will look for the wasm file. In the NextJS |
| 237 | +case, we want to return just the `scriptName` so that the loader will look at `http://localhost:3000/tree-sitter.wasm` |
| 238 | +and not `http://localhost:3000/_next/static/chunks/pages/tree-sitter.wasm`. |
| 239 | + |
| 240 | +For more information on the module options you can pass in, see the [emscripten documentation][emscripten-module-options]. |
| 241 | + |
| 242 | +#### "Can't resolve 'fs' in 'node_modules/web-tree-sitter" |
| 243 | + |
| 244 | +Most bundlers will notice that the `web-tree-sitter.js` file is attempting to import `fs`, i.e. node's file system library. |
| 245 | +Since this doesn't exist in the browser, the bundlers will get confused. For Webpack, you can fix this by adding the |
| 246 | +following to your webpack config: |
| 247 | + |
| 248 | +```javascript |
| 249 | +{ |
| 250 | + resolve: { |
| 251 | + fallback: { |
| 252 | + fs: false |
| 253 | + } |
| 254 | + } |
| 255 | +} |
| 256 | +``` |
| 257 | + |
| 258 | +[docker]: https://www.docker.com |
| 259 | +[emscripten]: https://emscripten.org |
| 260 | +[emscripten-module-options]: https://emscripten.org/docs/api_reference/module.html#affecting-execution |
| 261 | +[gh release]: https://github.com/tree-sitter/tree-sitter/releases/latest |
| 262 | +[gh release js]: https://github.com/tree-sitter/tree-sitter-javascript/releases/latest |
| 263 | +[node bindings]: https://github.com/tree-sitter/node-tree-sitter |
| 264 | +[npm module]: https://www.npmjs.com/package/web-tree-sitter |
| 265 | +[podman]: https://podman.io |
0 commit comments