You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Meta tokens: higher-level constructs like Comment, DefineStatement, SayStatement, etc. (these can overlap and effectively form a tree overlay). Note that meta tokens are not strictly necessary to parse the document, but are there to help.
Tokens → AST (Parser)
The parser iterates through the token stream and applies rule objects that each have test() and parse() methods.
Rules validate grammar, consume tokens, and build typed AST nodes.
AST → Program model
The AST is processed into an RpyProgram, which contains symbols, scopes, references, and diagnostics.
This model powers features like go-to-definition and find-references.
parser/parser-test.ts — a simple wrapper that runs tokenization and parsing for the active document.
Tokenizer
The tokenizer is mature and used in production today. It’s conceptually similar to VS Code’s highlighting approach (Oniguruma-style), but implemented from scratch in TypeScript.
Emits a strictly ordered sequence of tokens.
Atomic tokens don’t overlap.
Meta tokens can overlap and describe higher-level constructs, effectively giving a rudimentary AST shape before verification.
There’s a debug command to visualize tokens; by default it's bound to Ctrl+Alt+Shift+T (you may need to bind it manually).
The tokenizer rules are auto generated from the syntax highlighting rules we use for VSCode's highlighter
VSCode also has the internal Ctrl+Alt+Shift+I command, to visualize the syntax highlighting tokens, which might be nice to check sometimes, given we use the same regex expressions
If you want to know exactly which tokens exist, check tokenizer/renpy-tokens.ts and related files in that folder.
Also check out the syntax highligher guide to understand more about the tokenizer: #447
Parser
The parser is rule-driven and intentionally modular.
Each rule has:
test() — quick predicate to check if the rule applies at the current token.
parse() — consumes tokens and returns a specific AST node or null on error.
A central loop walks a list of rules and executes the first rule whose test() returns true.
Example: DefineStatementRule expects the define keyword, optionally an integer, an assignment, then an end-of-line.
We maintain a formal grammar in grammars/renpy.grammar.ebnf. That’s the source of truth we mirror in the rules.
Note: The current grammar is incomplete and in some cases inaccurate, meaning we might introduce bugs that the offical renpy parser does not agree on.
Error handling has been improved recently, but it’s still evolving. Comments and some edge cases may need more work; while extending the parser, start with valid syntax and then expand coverage.
AST and semantic processing
Every parse() returns a typed AST node (see src/parser/ast-nodes.ts). Nodes can override a visit(program: RpyProgram) method to:
Declare symbols (labels, defines, characters).
Record references (e.g., a jump to a label creates a reference to the label’s symbol).
Emit diagnostics when constraints aren’t met (e.g., an undefined label).
The RpyProgram is the semantic model. It holds:
A global scope (and more scopes as needed, though Ren’Py/Python semantics often push things toward module/global).
A symbol table with definition locations and a list of references per symbol.
Example usage pattern (as seen in parser-test.ts):
Tokenize and parse the document to an AST.
ast.visit(program) builds the semantic model.
Resolve a symbol: program.globalScope.resolve("e") → returns a RpySymbol with definitionLocation and references.
Implementing a new statement (example: jump)
Here’s the pattern I follow when implementing a new statement.
Define/verify grammar
Confirm the EBNF in grammars/renpy.grammar.ebnf covers the statement (e.g., jump).
Cross-check with Ren’Py’s reference implementation if needed.
Implement the parser rule
Add JumpStatementRule to src/parser/renpy-grammar-rules.ts.
test() should look for the jump keyword and whatever follows according to the grammar (e.g., a label name).
parse() should consume the tokens and build a JumpStatementNode that contains a LabelNameNode.
Register the rule where it belongs
Add the rule to the statement group list in renpy-grammar-rules.ts so it’s considered at the appropriate point in the top-level loop.
Add AST nodes and processing
In src/parser/ast-nodes.ts, add JumpStatementNode (or a shared call/jump node if that fits better).
Override process(program) to resolve the label symbol and add a reference; if the label isn’t defined (yet), emit a diagnostic.
Validate with a unit test
Put a sample jump start in a .rpy test file (e.g., parser_test.rpy).
Run the parser test and confirm the label symbol collects the jump site as a reference after ast.process(program).
The same steps work for most statements: start from EBNF → write a rule → register it → add AST nodes → wire references/diagnostics → validate.
LSP server (why and how it fits)
I’m building an LSP server so parsing/indexing runs out-of-process:
Keeps the editor responsive while the server indexes the whole project.
Makes the parsing engine reusable by any LSP client, not just VS Code.
The parser/tokenizer themselves avoid heavy VS Code types; the main dependency you’ll see is the text document API, which I can abstract further if needed.
If you’re consuming this outside the VS Code extension, the LSP server is the cleanest integration point.
Current status and limitations
Tokenizer: stable and production-tested.
Parser: rapidly improving; error handling and recovery are better but not perfect.
Current progress is mostly blocked by missing EBNF grammar definitions for all renpy features
We can parse some basic source, but are lacking some as well
Raw python source is currently parsed using parser rules generated by AI. These are likely incorrect, but it might be possible to parse some pythn source already
Comments and some edge cases: still being expanded.
Symbol references: infrastructure is in place; some categories may still be incomplete.
Grammar coverage: substantial, but newer features are being added as we go.
AST: Need to properly implement walking the AST and processing the data which we can then use.
I have made some attempts to make use of the current data. For example see registerHighlightProvider in src\semantics.ts
Developer workflow tips
Use src/parser/parser-test.ts to run tokenization/parsing on the active document and print debug output.
Try out/extend parser_test.rpy to exercise specific constructs.
Use the token debug command (bind to Ctrl+Alt+Shift+T) to inspect tokens and their metadata.
When a rule doesn’t fire, check the tokenizer output and ensure the rule is registered in the correct statement group.
Make sure the rules are defined in the correct order. Prioritize rules that have clear grammar and place generic/fallback patterns last.
Emit diagnostics from AST visit() when you need semantic context (e.g., undefined names, duplicates).
You can also dump the AST using logCatMessage(LogLevel.Debug, LogCategory.Parser, ast.toString());
Contributing checklist
Start with the EBNF in grammars/renpy.grammar.ebnf.
Implement or extend a rule in src/parser/renpy-grammar-rules.ts.
Add/adjust AST nodes in src/parser/ast-nodes.ts.
Update process(program) logic to declare symbols, add references, and emit diagnostics.
Validate with src/parser/parser-test.ts and sample .rpy code.
Iterate on error handling and edge cases.
If you need pointers on specific files or abstractions, I’m happy to guide—my goal is to keep the core modular and approachable while we expand grammar coverage and semantic analysis.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This guide explains how the tokenizer, parser, and AST processing work, and how to extend them.
Architecture overview
Comment
,DefineStatement
,SayStatement
, etc. (these can overlap and effectively form a tree overlay). Note that meta tokens are not strictly necessary to parse the document, but are there to help.test()
andparse()
methods.RpyProgram
, which contains symbols, scopes, references, and diagnostics.Core files you’ll interact with:
Tokenizer
The tokenizer is mature and used in production today. It’s conceptually similar to VS Code’s highlighting approach (Oniguruma-style), but implemented from scratch in TypeScript.
Ctrl+Alt+Shift+T
(you may need to bind it manually).Ctrl+Alt+Shift+I
command, to visualize the syntax highlighting tokens, which might be nice to check sometimes, given we use the same regex expressionsIf you want to know exactly which tokens exist, check tokenizer/renpy-tokens.ts and related files in that folder.
Also check out the syntax highligher guide to understand more about the tokenizer: #447
Parser
The parser is rule-driven and intentionally modular.
test()
— quick predicate to check if the rule applies at the current token.parse()
— consumes tokens and returns a specific AST node or null on error.test()
returns true.DefineStatementRule
expects thedefine
keyword, optionally an integer, an assignment, then an end-of-line.We maintain a formal grammar in
grammars/renpy.grammar.ebnf
. That’s the source of truth we mirror in the rules.Note: The current grammar is incomplete and in some cases inaccurate, meaning we might introduce bugs that the offical renpy parser does not agree on.
Error handling has been improved recently, but it’s still evolving. Comments and some edge cases may need more work; while extending the parser, start with valid syntax and then expand coverage.
AST and semantic processing
Every
parse()
returns a typed AST node (seesrc/parser/ast-nodes.ts
). Nodes can override avisit(program: RpyProgram)
method to:jump
to a label creates a reference to the label’s symbol).The
RpyProgram
is the semantic model. It holds:Example usage pattern (as seen in
parser-test.ts
):ast.visit(program)
builds the semantic model.program.globalScope.resolve("e")
→ returns aRpySymbol
withdefinitionLocation
andreferences
.Implementing a new statement (example:
jump
)Here’s the pattern I follow when implementing a new statement.
Define/verify grammar
grammars/renpy.grammar.ebnf
covers the statement (e.g.,jump
).Implement the parser rule
JumpStatementRule
tosrc/parser/renpy-grammar-rules.ts
.test()
should look for thejump
keyword and whatever follows according to the grammar (e.g., a label name).parse()
should consume the tokens and build aJumpStatementNode
that contains aLabelNameNode
.Register the rule where it belongs
renpy-grammar-rules.ts
so it’s considered at the appropriate point in the top-level loop.Add AST nodes and processing
src/parser/ast-nodes.ts
, addJumpStatementNode
(or a shared call/jump node if that fits better).process(program)
to resolve the label symbol and add a reference; if the label isn’t defined (yet), emit a diagnostic.Validate with a unit test
jump start
in a.rpy
test file (e.g.,parser_test.rpy
).jump
site as a reference afterast.process(program)
.The same steps work for most statements: start from EBNF → write a rule → register it → add AST nodes → wire references/diagnostics → validate.
LSP server (why and how it fits)
I’m building an LSP server so parsing/indexing runs out-of-process:
If you’re consuming this outside the VS Code extension, the LSP server is the cleanest integration point.
Current status and limitations
registerHighlightProvider
insrc\semantics.ts
Developer workflow tips
src/parser/parser-test.ts
to run tokenization/parsing on the active document and print debug output.parser_test.rpy
to exercise specific constructs.Ctrl+Alt+Shift+T
) to inspect tokens and their metadata.visit()
when you need semantic context (e.g., undefined names, duplicates).logCatMessage(LogLevel.Debug, LogCategory.Parser, ast.toString());
Contributing checklist
grammars/renpy.grammar.ebnf
.src/parser/renpy-grammar-rules.ts
.src/parser/ast-nodes.ts
.process(program)
logic to declare symbols, add references, and emit diagnostics.src/parser/parser-test.ts
and sample.rpy
code.If you need pointers on specific files or abstractions, I’m happy to guide—my goal is to keep the core modular and approachable while we expand grammar coverage and semantic analysis.
Beta Was this translation helpful? Give feedback.
All reactions