Update adding language docs (#2459)

pokey · web-flow · commit 97d8ee83ab8c · 2024-07-01T15:29:24.000Z
## Checklist - [-] I have added [tests](https://www.cursorless.org/docs/contributing/test-case-recorder/) - [x] I have updated the [docs](https://github.com/cursorless-dev/cursorless/tree/main/docs) and [cheatsheet](https://github.com/cursorless-dev/cursorless/tree/main/cursorless-talon/src/cheatsheet) - [-] I have not broken the cheatsheet
diff --git a/.github/PULL_REQUEST_TEMPLATE/new_programming_language.md b/.github/PULL_REQUEST_TEMPLATE/new_programming_language.md
diff --git a/docs/contributing/adding-a-new-language.md b/docs/contributing/adding-a-new-language.md
@@ -13,31 +13,28 @@ for how to add support for a new parser
 
 If you are adding support for a new language that isn't natively detected by VSCode, you will need to add the appropriate extension to the list of dependencies. The list of languages officially supported by VSCode is listed [in the VSCode docs](https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers). If your language is in that list, you can skip this step and proceed to step 3. If your language is not in that list, you need to find a VSCode extension that adds support for your language, and add the id of the given extension to [`packages/common/src/extensionDependencies.ts`](../../packages/common/src/extensionDependencies.ts) and then re-run `pnpm init-vscode-sandbox` to ensure it is installed. If you do not do this you will encounter errors when attempting to execute cursorless commands in the next step. See [#1895](https://github.com/cursorless-dev/cursorless/issues/1895) for more info.
 
-## 3. Define parse tree patterns in Cursorless
+## 3. Register your language with Cursorless
 
-First a few notes / tips:
+1. Create a file with your scope support map to indicate which scopes you support. See eg [`markdown.ts`](../../packages/common/src/scopeSupportFacets/markdown.ts). At the start, you can leave the actual scope support table empty, so it will look something like the following (keeping in mind it's best to look at `markdown.ts` or another support file in case the following snippet rots):
 
-- We suggest opening a draft PR as soon as possible to get early feedback. Please use the new language PR template either by adding `?template=new_programming_language` to the end of the URL you used to open the PR, or just by copying and pasting from the [template](https://github.com/cursorless-dev/cursorless/blob/main/.github/PULL_REQUEST_TEMPLATE/new_programming_language.md?plain=1) to your PR body, if that's easier.
-- We suggest adding tests as early as possible, after each language feature you add. Recording tests is quick and painless using the test case recorder described below. We promise 😇
+   ```ts
+   import {
+     LanguageScopeSupportFacetMap,
+     ScopeSupportFacetLevel,
+   } from "./scopeSupportFacets.types";
 
-To add a new language, you just need to add a `.scm` file to the [`queries` directory](../../queries). The `.scm` query format is documented [here](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax).
+   // eslint-disable-next-line @typescript-eslint/no-unused-vars
+   const { supported, unsupported, notApplicable } = ScopeSupportFacetLevel;
 
-The parse trees exposed by tree-sitter are often pretty close to what we're
-looking for, but we often need to look for specific patterns within the parse
-tree to get the scopes that the user expects. Fortunately, the tree-sitter query language makes these definitions fairly compact.
+   export const markdownScopeSupport: LanguageScopeSupportFacetMap = {};
+   ```
 
-- Check out the [docs](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax) for the query language.
-- Have a look at our custom query predicate operators in [`queryPredicateOperators.ts`](../../packages/cursorless-engine/src/languages/TreeSitterQuery/queryPredicateOperators.ts)
-- Look at the existing language definitions in the [`queries` directory](../../queries) for examples.
-- If you look in the debug console, you'll see debug output every time you move
-  your cursor, which might be helpful.
-- You will likely want to look at `node-types.json` for your language, (eg [java](https://github.com/tree-sitter/tree-sitter-java/blob/master/src/node-types.json)). This file is generated from `grammar.js`, which might also be helpful to look at (eg [java](https://github.com/tree-sitter/tree-sitter-java/blob/master/grammar.js)).
+2. Add an entry pointing to your support table to [the global scope support table](../../packages/common/src/scopeSupportFacets/languageScopeSupport.ts)
 
-### Writing tests
+3. Create an empty `.scm` file in [`queries/`](../../queries) to hold your parse tree patterns. It should be named after your language, eg `java.scm`.
 
-Test cases can be automatically recorded, which should speed things up a lot.
-See the [docs](test-case-recorder.md) for the test case recorder. It will also
-likely be helpful to look at the existing recorded test cases (eg
-[java](../../data/fixtures/recorded/languages/java)) to see how
-they
-should end up looking when they're recorded.
+You can file a PR with just these changes to get the ball rolling.
+
+## 4. Define your language's scopes
+
+Follow the instructions in [Adding a new scope](./adding-a-new-scope.md) to define the scopes for your language. Note that you can file a PR for each added scopes, or do a couple at a time, but it's best _**not**_ to do them all at once, as smaller PRs make the review process easier.
diff --git a/docs/contributing/adding-a-new-scope.md b/docs/contributing/adding-a-new-scope.md
@@ -0,0 +1,82 @@
+# Adding a new scope
+
+For each scope that your language should support (eg `"funk"`), you need to do the following:
+
+## 1. Find the scope's internal identifier
+
+You'll first need to figure out the internal identifier we use for the given scope. You can do so by looking in your `modifier_scope_types.csv` (see [Customization](../user/customization.md) if you're not sure where that file is). The internal identifier is the second column in the CSV file. For example, the internal identifier for `"funk"` is `namedFunction`. This identifier is what you'll use in the `.scm` file in step 4 below when you define your language's parse tree patterns.
+
+## 2. Find the appropriate scope support facets
+
+Find the _facets_ of the given scope that are relevant to your language. Each scope has several "facets" that indicate different syntactic constructs that should be considered to be the given scope.
+
+For example, `"funk"` (`namedFunction`) has the following facets:
+
+- `namedFunction`, corresponding to a standalone function declaration,
+- `namedfunction.method`, corresponding to a class method declaration, and
+- `namedfunction.constructor`, corresponding to a class constructor declaration.
+
+Have a look in [`scopeSupportFacetInfos`](../../packages/common/src/scopeSupportFacets/scopeSupportFacetInfos.ts) to see which facets the given scope has. The key is the id of the facet, and the value has information about the facet, including a description and a `scopeType` field indicating which scope type the facet corresponds to.
+
+These facet ids will be the keys in your language's scope support table below.
+
+Note that in addition to the straightforward facet IDs that correspond to the scope type, there are also some special facet IDs. In particular:
+
+- `foo.iteration` indicates the iteration scope of a given facet. For example, `namedFunction.method.iteration.class` allows you to indicate that the iteration scope for functions is a class.
+- `textFragment.xxx` scopes allow you to indicate regions in the document that have no syntactic structure. These allow us to support matching pairs inside of strings and comments, where there will be no tokens for delimiters like `(` and `)`.
+
+## 3. Add entries to your language's scope support table
+
+Add entries for each of the facet IDs of the given scope to the scope support table for your language in [the `scopeSupportFacets` directory](/../../packages/common/src/scopeSupportFacets).
+
+For example, if you'd like to add support for the `namedFunction` facet of the `funk` scope, you would add entries like the following to your language's scope support table:
+
+```ts
+  namedFunction: supported,
+  "namedFunction.method": supported,
+  "namedFunction.method.iteration.class": supported,
+  "namedFunction.constructor": supported,
+  "namedFunction.iteration": supported,
+  "namedFunction.iteration.document": supported,
+```
+
+If one of the above facets doesn't apply to your language, you can mark it as `notApplicable` instead of `supported`. If the facet does apply to your language, but you'd prefer to add support in a follow-up PR, you can mark it as `unsupported`.
+
+## 4. Add tests for the given scope
+
+We have a bulk test recorder for scope tests. You can use it by running Cursorless in debug mode, and then saying `"cursorless record scope"`, and selecting your language. This will create a temporary file containing slots for every scope facet in your language which you've marked `supported` but that doesn't yet have any tests. You can then fill in the tests for each facet by providing a small snippet of code exemplifying the given facet.
+
+When you're done, say `"cursorless save scope"` to save the tests to the appropriate files in the `data/fixtures/recorded/scopes` directory.
+
+## 5. Add parse tree patterns for the given scope
+
+Launch your extension in debug mode and open a file in your language. You can create one or more files in [`playground/`](../../data/playground) and feel free to include those in your PR.
+
+Then add parse tree patterns for the given scope to your language's `.scm` file in the [`queries` directory](../../queries). The parse tree patterns should match the syntactic constructs that should be considered to be the given scope. Tag the nodes in the parse tree that correspond to the given scope with the internal identifier you found in step 1 above, eg `@namedFunction`. Note that you use the scope identifier (`namedFunction`), not the facet identifier (`@namedFunction.class`).
+
+### Notes / tips
+
+- See our [Tree-sitter query syntax](tree-sitter-query-syntax.md) guide for more information on the syntax we support.
+- Look at the existing language definitions in the [`queries` directory](../../queries) for examples.
+- Use the [scope visualizer](../user/scope-visualizer.md) to see your scope highlighted in real time every time you save the `.scm` file.
+- Use the command `"parse tree <target>"` to see the parse tree for a given target. For example `"parse tree line"` will show you the parse tree for the current line, as well as all of its ancestors. This will generate a markdown file with parse tree info, which you can then use to write your patterns. You might find it helpful to open a markdown preview of the file.
+- You will likely want to look at `node-types.json` for your language, (eg [java](https://github.com/tree-sitter/tree-sitter-java/blob/master/src/node-types.json)). This file is generated from the language's `grammar.js`, which might also be helpful to look at (eg [java](https://github.com/tree-sitter/tree-sitter-java/blob/master/grammar.js)).
+
+## 6. Update the tests
+
+The tests generated in step 4 only include the code example. Now that you've told Cursorless how to find the scope, we can automatically update the test cases to indicate where the scope should appear in your code examples.
+
+1. Say `"debug edit subset"` and alter the file to include just the name of your language
+2. Run the `Update fixtures subset` launch configuration to update your fixtures.
+3. Check that the fixtures now look as expected, and no other tests for your language have been altered. The VSCode source control side bar is useful for this purpose.
+
+## 7. File a PR!
+
+## Examples
+
+Here are a few example PRs adding scopes. Note that in each case the PR also introduced a new facet, but in many cases you will just be able to use an existing facet.
+
+- [#2346](https://github.com/cursorless-dev/cursorless/pull/2346)
+- [#2215](https://github.com/cursorless-dev/cursorless/pull/2215)
+- [#2361](https://github.com/cursorless-dev/cursorless/pull/2361)
+- [#2364](https://github.com/cursorless-dev/cursorless/pull/2364)
diff --git a/docs/contributing/tree-sitter-query-syntax.md b/docs/contributing/tree-sitter-query-syntax.md
@@ -0,0 +1,24 @@
+# Tree-sitter query syntax
+
+We use the tree-sitter query language to define our parse tree patterns. In addition to the [official tree-sitter query documentation](https://tree-sitter.github.io/tree-sitter/using-parsers#query-syntax), we support a couple of additional features.
+
+## Relationships
+
+In addition to the node corresponding to the scope itself (which we call its _content range_), you can tag different aspects / relationships of the scope. Assuming the internal identifier of our scope is `foo`, we can tag the following aspects of the scope:
+
+- `@foo.domain` indicates the domain of the scope. For example, you could use `@collectionKey.domain` to indicate that the domain of a key is the containing item, which would allow you to say `"take key"` from within the value of a key-value pair to select the key.
+- `@foo.leading` and `@foo.trailing` indicate the leading and trailing delimiters of the scope. For example, you could use `@collectionKey.trailing` to include all the way up to the start of the value as the trailing delimiter, so that `"chuck key"` will leave just the value.
+- `@foo.removal` to indicate the removal range of the scope. Note that it is preferred to use `@foo.leading` or `@foo.trailing` instead of `@foo.removal` in situations where you just need to include a leading or trailing delimiter in the removal range.
+- `@foo.interior` to indicate the interior of the scope, used for `"inside"`. For example, you could use `@namedFunction.interior` to indicate the interior of a function, which would usually be the function body itself, without any leading or trailing delimiters.
+- `@foo.iteration` to indicate the iteration scope of the scope. For example, you could use `@namedFunction.iteration` to indicate that the iteration scope for functions is a class. Note that unlike the other aspects, the iteration scope is not a part of the scope itself, but rather a separate scope that is used to determine the iteration scope of the given scope. Thus, it should nearly always appear in a separate pattern from the scope itself, unlike the other aspects, which must appear in the same pattern as the scope itself.
+
+## Inline operators
+
+In addition to the above aspects, you can also use the following inline operators to modify the scope:
+
+- `@foo.start` and `@foo.end` to construct the scope using a range between two nodes (inclusive).
+- `@foo.startOf` and `@foo.endOf` to refer to the start and end positions of a node. For example, you could use `@foo.start.endOf` to indicate that the scope should start at the end of the node.
+
+## Query predicate operators
+
+We also support a number of query predicate operators for modifying the scope. See [`queryPredicateOperators.ts`](../../packages/cursorless-engine/src/languages/TreeSitterQuery/queryPredicateOperators.ts) for a list of available operators.