Token categories #11
Replies: 5 comments 21 replies
-
|
I would find it quite useful indeed. The syntax used makes me think of annotations, and I think that a general mechanism for providing annotation would be useful. Plugin writers could specify code that could receive as input the parse-tree obtained from parsing a grammar file with annotations and they could use that information to generate different things, including files for syntax highlighting. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Trying to sum up, and adding my pinch of salt:
What's unclear at this point is whether these annotations are available:
I'd recommend the former in order to keep things small, and also because with a decently serialised parse tree, there are plenty of XPath-like tools out there that can be used to locate specific nodes. But maybe I'm missing something ? |
Beta Was this translation helpful? Give feedback.
-
|
Why can't we use existing feature of lexer commands to mark tokens? I mean introducing a new command like KEYWORD: 'keyword' -> category(KEYWORD);
KEYWORD2: 'keyword2' -> category(KEYWORD);
...
ID: -> category(ID);
...
NEWLINE: [ \r\n] -> channel(HIDDEN), category(SPLITTER);
...
LP: '(' -> category(OPEN_PARENTHESIS);
RP: ')' -> category(CLOSE_PARENTHESIS);
...
LINE_COMMENT: '//' ~[ \r\n]* -> channel(HIDDEN), category(COMMENT);Category type can be itself different:
All categories are accessible in built tree and can be used if needed (for instance, for code highlighting). |
Beta Was this translation helpful? Give feedback.
-
|
I have my LSP server for Antlr4 grammars now working again, and I'd like to explore how we intend to use annotations/token categories for markup for an LSP server. For an Antlr4 grammar, parser and lexer rules are defined by: Because LSP distinguishes between a "def" and a "ref" of a symbol, I can't summarily annotate a RULE_REF as always a parser rule. I have to distinguish between a "def" and a "ref" for a parser rule name. For a "def", it's the RULE_REF in this production, to the left of the ':'. I could "annotate" these two rules (parserRuleSpec and lexerRuleSpec) at the RULE_REF and TOKEN_REF with a "definition" category for parser and lexer rules, respectively, something like An alternative would be to annotate the RULE_REF and TOKEN_REF rules in the lexer grammar, as we discussed earlier in this thread. Unfortunately, I cannot annotate rules for RULE_REF and TOKEN_REF because the grammar actually does not define these rules--it's in some target-specific code! Perhaps this is why tree-sitter puts this information in separate files, outside a grammar. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
From a parser point of view, apart from EOF, tokens do not convey any meaning.
A developer may think differently in that they may want their favorite IDE to decorate code based on token categories: literals, keywords, flow controls.
As of writing, most IDEs require tokens to be redefined and categorized in order to support syntax highlighting.
To facilitate generation of basic IDE support, it could be useful to categorize tokens.
Then a tool which only knows the IDE would be able to generate a basic editor.
A token category could be applied to a token as follows:
Alternate proposals are welcome.
Beta Was this translation helpful? Give feedback.
All reactions