-
Notifications
You must be signed in to change notification settings - Fork 70
Open
Labels
Description
Did you check existing issues?
- I have read all the tree-sitter docs if it relates to using the parser
- I have searched the existing issues of tree-sitter-c
Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)
tree-sitter 0.24.4 (fc8c1863e2e5724a0c40bb6e6cfc8631bfe5908b)
Describe the bug
Regexps containing escape sequences like \d, \w, and POSIX character classes like [[:alpha:]] are incorrectly parsed. The scanner prematurely ends string content when it encounters backslashes in regex, treating them as interpolation boundaries rather than escape sequences.
I used 3 editors to validate this behavior:
neovim:
helix:
zed:
Sorry, I don't know which versions neovim and helix use but Zed uses 71bd32f.
This behavior leads to 2, I think, potential problems:
- Regular expressions are not properly tokenized
- The previous problem leads to another problem: problem with code highlighting
Steps To Reproduce/Bad Parse Tree
- Try to parse
/[[[:alpha:]]\d]+/
The grammar outputs the following parse tree:
(program [0, 0] - [1, 0]
(call [0, 0] - [0, 31]
receiver: (regex [0, 0] - [0, 18]
(string_content [0, 1] - [0, 13])
(escape_sequence [0, 13] - [0, 15])
(string_content [0, 15] - [0, 17]))
method: (identifier [0, 19] - [0, 24])
arguments: (argument_list [0, 24] - [0, 31]
(string [0, 25] - [0, 30]
(string_content [0, 26] - [0, 29])))))
Expected Behavior/Parse Tree
I expect the following parse tree without escape sequences to treat the regular expressions as a single node:
(program [0, 0] - [1, 0]
(call [0, 0] - [0, 31]
receiver: (regex [0, 0] - [0, 18]
(string_content [0, 1] - [0, 17]))
method: (identifier [0, 19] - [0, 24])
arguments: (argument_list [0, 24] - [0, 31]
(string [0, 25] - [0, 30]
(string_content [0, 26] - [0, 29])))))
Repro
The minimal repo case is `/\d+/` but I used this `/[[[:alpha:]]\d]+/`