Rewrite MASM parser to produce a concrete syntax tree

There are a few goals we have that the current parser gets in the way of:

1. It does not preserve trivia (whitespace, comments), which is critical for implementing a `miden-format` tool.
2. It performs constant propagation throughout parsing, losing important context needed for an LSP. This too prevents us from implementing a `miden-format` tool, because the resulting output would have all the constants erased
3. It provides poor error recovery at best. This is problematic for an LSP, and particularly integrating into text editors/IDEs, as the user experience is really bad.

To address these issues, the current parser needs to be rewritten to have the following characteristics:

* It produces a concrete syntax tree (CST), which can then be converted to an abstract syntax tree (AST). CSTs preserve all of the textual trivia and tokens, and furthermore, allow parts to be invalid/missing while still being able to continue parsing the rest of the tree. For example, if you start typing a new procedure definition in the middle of a file, e.g. `export.foo|` (where `|` here represents the cursor), the file would fail to parse today because the syntax is invalid on its own. With a CST-based parser however, the parser can easily recover by ignoring the invalid subtree, and continuing with the rest of the file. The mechanics of this are too much to get into here, but suffice to say that this is _the_ way to implement parsers that are designed to provide a smooth text editing/IDE experience.
* Constant propagation needs to be deferred to assembly-time. We don't actually need to do it during parsing, but it simplified some things, so that is what we did. However, by doing so, we've hamstrung our ability to reformat MASM and provide more useful diagnostics related to constants.
* We should move from using a parser generator (LALRPOP) to a handwritten recursive-descent parser. In order to properly implement error recovery over the CST, we need granular control over parsing that we just don't get with parser generators. This is actually a fairly trivial lift, as MASM has a very simple grammar, but the bigger task is ensuring that the new parser actually accepts partial syntax trees (i.e. it doesn't just bail at the first problem, it can parse even in situations like I described earlier, where a definition is only partially-complete).

There are some great tools for building parsers in this style, e.g. [rowan](https://crates.io/crates/rowan). `rust-analyzer` actually uses `rowan` internally (in addition to its `syntax` and `parser` crates to complete the picture).

This is a prerequisite to building a proper MASM formatter and LSP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite MASM parser to produce a concrete syntax tree #1901

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rewrite MASM parser to produce a concrete syntax tree #1901

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions