Skip to content

Commit 03b7760

Browse files
amaanqdbaynard
andcommitted
docs(scanner): add overview to the scan function
Co-authored-by: David Baynard <[email protected]>
1 parent 28f7c6b commit 03b7760

File tree

1 file changed

+16
-3
lines changed

1 file changed

+16
-3
lines changed

docs/src/creating-parsers/4-external-scanners.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ void tree_sitter_my_language_external_scanner_destroy(void *payload) {
6868
6969
This function should free any memory used by your scanner. It is called once when a parser is deleted or assigned a different
7070
language. It receives as an argument the same pointer that was returned from the _create_ function. If your _create_ function
71-
didn't allocate any memory, this function can be a noop.
71+
didn't allocate any memory, this function can be a no-op.
7272
7373
## Serialize
7474
@@ -110,6 +110,20 @@ their values from the byte buffer.
110110

111111
## Scan
112112

113+
Typically, one will
114+
115+
- Call `lexer->advance` several times, if the characters are valid for the token being lexed.
116+
117+
- Optionally, call `lexer->mark_end` to mark the end of the token, and "peek ahead"
118+
to check if the next character (or set of characters) invalidates the token.
119+
120+
- Set `lexer->result_symbol` to the token type.
121+
122+
- Return `true` from the scanning function, indicating that a token was successfully lexed.
123+
124+
Tree-sitter will then push resulting node to the parse stack, and the input position will remain where it reached at the
125+
point `lexer->mark_end` was called.
126+
113127
```c
114128
bool tree_sitter_my_language_external_scanner_scan(
115129
void *payload,
@@ -120,8 +134,7 @@ bool tree_sitter_my_language_external_scanner_scan(
120134
}
121135
```
122136

123-
This function is responsible for recognizing external tokens. It should return `true` if a token was recognized, and `false`
124-
otherwise. It is called with a "lexer" struct with the following fields:
137+
The second parameter to this function is the lexer, of type `TSLexer`. The `TSLexer` struct has the following fields:
125138

126139
- **`int32_t lookahead`** — The current next character in the input stream, represented as a 32-bit unicode code point.
127140

0 commit comments

Comments
 (0)