|
| 1 | +<p align="center"> |
| 2 | + <img width="300" alt="Codeium" src="codeium.svg"/> |
| 3 | +</p> |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +[](https://discord.gg/3XFf78nAx5) |
| 8 | +[](https://twitter.com/intent/follow?screen_name=codeiumdev) |
| 9 | + |
| 10 | + |
| 11 | +[](https://marketplace.visualstudio.com/items?itemName=Codeium.codeium) |
| 12 | +[](https://plugins.jetbrains.com/plugin/20540-codeium/) |
| 13 | +[](https://open-vsx.org/extension/Codeium/codeium) |
| 14 | +[](https://chrome.google.com/webstore/detail/codeium/hobjkcpmjhlegmobgonaagepfckjkceh) |
| 15 | + |
| 16 | +# codeium-parse |
| 17 | + |
| 18 | +This repository contains tools built with [tree-sitter](https://github.com/tree-sitter/tree-sitter) that let you: |
| 19 | +* Inspect the concrete syntax tree of a source file |
| 20 | +* Use pre-written tree-sitter query files to locate important symbols in source code |
| 21 | +* Optionally format output in JSON to use the results in your own applications |
| 22 | + |
| 23 | +Contributions welcome. These queries are used by Codeium Search to index your |
| 24 | +code locally for semantic search! Adding queries for your language here will |
| 25 | +enable Codeium Search to work better on your own code! |
| 26 | + |
| 27 | +In particular, this repo provides a binary prepackaged with: |
| 28 | +* A recent version of the tree-sitter library |
| 29 | +* A large number of tree-sitter grammars |
| 30 | +* An implementation of many common query predicates |
| 31 | + |
| 32 | +## Usage example |
| 33 | + |
| 34 | +```console |
| 35 | +$ ./download_parse.sh |
| 36 | +$ ./parse -file examples/example.js -named_only |
| 37 | +program [0, 0] - [4, 0] "// Adds two numbers.\n…" |
| 38 | + comment [0, 0] - [0, 20] "// Adds two numbers." |
| 39 | + function_declaration [1, 0] - [3, 1] "function add(a, b) {\n…" |
| 40 | + name: identifier [1, 9] - [1, 12] "add" |
| 41 | + parameters: formal_parameters [1, 12] - [1, 18] "(a, b)" |
| 42 | + identifier [1, 13] - [1, 14] "a" |
| 43 | + identifier [1, 16] - [1, 17] "b" |
| 44 | + body: statement_block [1, 19] - [3, 1] "{\n…" |
| 45 | + return_statement [2, 4] - [2, 17] "return a + b;" |
| 46 | + binary_expression [2, 11] - [2, 16] "a + b" |
| 47 | + left: identifier [2, 11] - [2, 12] "a" |
| 48 | + right: identifier [2, 15] - [2, 16] "b" |
| 49 | +$ ./parse -file examples/example.js -use_tags_query -json | jq ".captures.doc[0].text" |
| 50 | +"// Adds two numbers." |
| 51 | +``` |
| 52 | + |
| 53 | +## Support status |
| 54 | + |
| 55 | +### Queries |
| 56 | + |
| 57 | +Queries try to follow the [conventions established by tree-sitter.](https://tree-sitter.github.io/tree-sitter/code-navigation-systems) |
| 58 | + |
| 59 | +Most captures also include documentation as `@doc`. `@definition.function` and `@definition.method` also capture `@codeium.parameters`. |
| 60 | + |
| 61 | +| | Python | TypeScript | JavaScript | Go | |
| 62 | +| ---------------------- | ------ | ---------- | ---------- | --- | |
| 63 | +| `@definition.class` | ✅ | ✅ | ✅ | ✅ | |
| 64 | +| `@definition.function` | ✅ | ✅[^3] | ✅ | ✅ | |
| 65 | +| `@definition.method` | ✅[^1] | ✅[^3] | ✅ | ✅ | |
| 66 | +| `@definition.interface` | N/A | ✅ | N/A | ✅ | |
| 67 | +| `@definition.namespace` | N/A | ✅ | N/A | N/A | |
| 68 | +| `@definition.module` | N/A | ✅ | N/A | N/A | |
| 69 | +| `@definition.type` | N/A | ✅ | N/A | ✅ | |
| 70 | +| `@definition.constant` | ❌ | ❌ | ❌ | ❌ | |
| 71 | +| `@definition.enum` | ❌ | ❌ | ❌ | ❌ | |
| 72 | +| `@reference.call` | ✅ | ✅ | ✅ | ✅ | |
| 73 | +| `@reference.class` | ✅[^2] | ✅ | ✅ | ✅ | |
| 74 | + |
| 75 | +[^1]: Currently functions and methods are not distinguished in Python. |
| 76 | +[^2]: Function calls and class instantiation are indistinguishable in Python. |
| 77 | +[^3]: Function and method signatures are captured individually in TypeScript. Therefore, the `@doc` capture may not exist on all nodes. |
| 78 | + |
| 79 | +Want to write a query for a new language? `tags.scm` and other queries in each language's tree-sitter repository, [like tree-sitter-javascript](https://github.com/tree-sitter/tree-sitter-javascript/blob/5720b249490b3c17245ba772f6be4a43edb4e3b7/queries/tags.scm), are a good place to start. |
| 80 | + |
| 81 | +### Query predicates |
| 82 | + |
| 83 | +```console |
| 84 | +$ ./parse -supported_predicates |
| 85 | +#eq?/#not-eq? |
| 86 | + (#eq? <@capture|"literal"> <@capture|"literal">) |
| 87 | + Checks if two values are equal. |
| 88 | + |
| 89 | +#has-parent?/#not-has-parent? |
| 90 | + (#has-parent? @capture node_type...) |
| 91 | + Checks if @capture has a parent node of any of the given types. |
| 92 | + |
| 93 | +#has-type?/#not-has-type? |
| 94 | + (#has-type? @capture node_type...) |
| 95 | + Checks if @capture has a node of any of the given types. |
| 96 | + |
| 97 | +#match?/#not-match? |
| 98 | + (#match? @capture "regex") |
| 99 | + Checks if the text for @capture matches the given regular expression. |
| 100 | + |
| 101 | +#select-adjacent! |
| 102 | + (#select-adjacent! @capture @anchor) |
| 103 | + Selects @capture nodes contiguous with @anchor (all starting and ending on |
| 104 | + adjacent lines). |
| 105 | + |
| 106 | +#strip! |
| 107 | + (#strip! @capture "regex") |
| 108 | + Removes all matching text from all @capture nodes. |
| 109 | +``` |
| 110 | + |
| 111 | +Need a predicate which hasn't been implemented? [File an issue!](https://github.com/Exafunction/codeium-parse/issues/new) We try to use [predicates from nvim-treesitter.](https://github.com/nvim-treesitter/nvim-treesitter/blob/980f0816cc28c20e45715687a0a21b5b39af59eb/lua/nvim-treesitter/query_predicates.lua) |
| 112 | + |
| 113 | +### Grammars |
| 114 | + |
| 115 | +```console |
| 116 | +$ ./parse -supported_languages |
| 117 | +c |
| 118 | +cpp |
| 119 | +csharp |
| 120 | +css |
| 121 | +dart |
| 122 | +go |
| 123 | +hcl |
| 124 | +html |
| 125 | +java |
| 126 | +javascript |
| 127 | +json |
| 128 | +kotlin |
| 129 | +latex |
| 130 | +markdown |
| 131 | +php |
| 132 | +protobuf |
| 133 | +python |
| 134 | +ruby |
| 135 | +rust |
| 136 | +shell |
| 137 | +svelte |
| 138 | +toml |
| 139 | +tsx |
| 140 | +typescript |
| 141 | +vue |
| 142 | +yaml |
| 143 | +``` |
| 144 | + |
| 145 | +Looking for support for another language? [File an issue](https://github.com/Exafunction/codeium-parse/issues/new) with a link to the repo that contains the grammar. |
| 146 | + |
| 147 | +## Contributing |
| 148 | + |
| 149 | +Pull requests are welcome. For non-issue discussions about `codeium-parse`, [join |
| 150 | +our Discord.](https://discord.gg/3XFf78nAx5) |
| 151 | + |
| 152 | +### Adding and testing queries |
| 153 | + |
| 154 | +* You can create new source files with patterns you want to target in `test_files/`. |
| 155 | +* Look at the syntax tree using `./parse -file test_files/<your file>` to get a sense of how to capture the pattern. |
| 156 | +* Learn the query syntax from [tree-sitter documentation.](https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries) |
| 157 | +* Run `./goldens.sh` to see what your query captures. |
0 commit comments