-
Notifications
You must be signed in to change notification settings - Fork 165
Open
Description
It seems that javalang replaces unicode escapes back to the raw form (as pointed out in issue #58) in pre_tokenize method before tokenizing.
I don't get why this replacement is necessary (pre_tokenize method is added since the initial commit), and this may lead to failures in rare conditions.
Example:
>>> import javalang
>>> javalang.parse.parse(r'class Foo { String bar = "\u0022"; }')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python38\lib\site-packages\javalang\parse.py", line 52, in parse
parser = Parser(tokens)
File "C:\Program Files\Python38\lib\site-packages\javalang\parser.py", line 95, in __init__
self.tokens = util.LookAheadListIterator(tokens)
File "C:\Program Files\Python38\lib\site-packages\javalang\util.py", line 92, in __init__
self.list = list(iterable)
File "C:\Program Files\Python38\lib\site-packages\javalang\tokenizer.py", line 535, in tokenize
self.read_string()
File "C:\Program Files\Python38\lib\site-packages\javalang\tokenizer.py", line 201, in read_string
self.error('Unterminated character/string literal')
File "C:\Program Files\Python38\lib\site-packages\javalang\tokenizer.py", line 576, in error
raise error
javalang.tokenizer.LexerError: Unterminated character/string literal at """, line 1: class Foo { String bar = """;PR #96 fixes this issue and maybe we should merge it?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels