@@ -68,7 +68,7 @@ void tree_sitter_my_language_external_scanner_destroy(void *payload) {
6868
6969This function should free any memory used by your scanner. It is called once when a parser is deleted or assigned a different
7070language. It receives as an argument the same pointer that was returned from the _create_ function. If your _create_ function
71- didn't allocate any memory, this function can be a noop .
71+ didn't allocate any memory, this function can be a no-op .
7272
7373## Serialize
7474
@@ -110,6 +110,20 @@ their values from the byte buffer.
110110
111111## Scan
112112
113+ Typically, one will
114+
115+ - Call ` lexer->advance ` several times, if the characters are valid for the token being lexed.
116+
117+ - Optionally, call ` lexer->mark_end ` to mark the end of the token, and "peek ahead"
118+ to check if the next character (or set of characters) invalidates the token.
119+
120+ - Set ` lexer->result_symbol ` to the token type.
121+
122+ - Return ` true ` from the scanning function, indicating that a token was successfully lexed.
123+
124+ Tree-sitter will then push resulting node to the parse stack, and the input position will remain where it reached at the
125+ point ` lexer->mark_end ` was called.
126+
113127``` c
114128bool tree_sitter_my_language_external_scanner_scan (
115129 void * payload,
@@ -120,8 +134,7 @@ bool tree_sitter_my_language_external_scanner_scan(
120134}
121135```
122136
123- This function is responsible for recognizing external tokens. It should return ` true ` if a token was recognized, and ` false `
124- otherwise. It is called with a "lexer" struct with the following fields:
137+ The second parameter to this function is the lexer, of type ` TSLexer ` . The ` TSLexer ` struct has the following fields:
125138
126139- ** ` int32_t lookahead ` ** — The current next character in the input stream, represented as a 32-bit unicode code point.
127140
0 commit comments