Initial prototype for using serde generic parse trees. #591

ratmice · 2025-06-07T13:16:03Z

This is an initial prototype for #589, I think the code is pretty gross, however I also believe it might work,
just from minimal testing with

cargo build --all-features
./target/debug/nimbleparse -t ./lrpar/examples/calc_parsetree/src/calc.{l,y} lrpar/examples/calc_parsetree/src/input.txt

For instance, if I compare the following output with and without -t passed to nimbleparse, it seems like I got the correct nesting (I think, unless my sleep deprived brain is mistaken).

{
    "src": "5 + 4 * 3",
    "ast": [
        Nonterm("Expr", [
            Nonterm("Expr", [
                Nonterm("Term", [
                    Nonterm("Factor", [
                        Term("INT", "5"),
                    ]),
                ]),
            ]),
            Term("+", "+"),
            Nonterm("Term", [
                Nonterm("Term", [
                    Nonterm("Factor", [
                        Term("INT", "4"),
                    ]),
                ]),
                Term("*", "*"),
                Nonterm("Factor", [
                    Term("INT", "3"),
                ]),
            ]),
        ]),
    ],
}

Expr
 Expr
  Term
   Factor
    INT 5
 + +
 Term
  Term
   Factor
    INT 4
  * *
  Factor
   INT 3

It currently only supports Serialize, doesn't do anything when parsing multiple inputs (parse_many) or when inputs are specified via test_files in the %grmtools section.

But it seemed like perhaps a good place to stop and evaluate.

ratmice · 2025-06-08T03:39:40Z

So, the one obvious thing that is present in this serialization, that is not in the ParseGenericTree output is the presence of the Term/Nonterm variants, perhaps there is a way to omit that from the output and would clean up the output perhaps. But it isn't clear to me how we can omit it, and then unambiguously deserialize unless we just start treating Term as having an empty Vec<Node>. It might be worth trying?

Edit:
I tried removing the Term/Nonterm distinction using a struct like the following, but rather than simplifying the output, it more than doubled the length of the serialized output, 78 lines, vs rather than the 26 lines of serialized output above.
So somehow it managed to be both longer/noisier and contain less information.

        #[derive(Serialize)]
        struct NamedNode<'a> {
            name: &'a str,
            lexeme: Option<&'a str>,
            nodes: VecDeque<NamedNode<'a>>,
        }

I'm still interested in how to improve the serialized AST, but that attempt seemed like a dud.

ratmice · 2025-06-08T22:36:02Z

lrpar/src/lib/parser.rs

-                            insert_inner(entries, node, limit, acc + 1)
-                        } else {
-                            top.push_back(node)
-                        }


I think this whole insert_inner function I think is pretty terrible, and is still using the first algorithm
that came to mind. In particular the recursion seems problematic for large a AST (potential stack overflows, and seems inefficient).

Perhaps there is a better way to implement it using something like a BinaryHeap<NamedNode>, and a Vec<NamedNode>, so that the inner elements are being stored/pushed to in the binary heap.
So that rather than starting from the root and recursively traversing the back_mut() to find where we push_back. The element to push onto would be at some lower priority of the BinaryHeap.
Then the Vec<NamedNode> would be the rest of the nodes that in the order they get popped from that heap.

Anyhow I haven't totally thought it through, instead more focused on what the actual AST representation should be, but figured I should drop a note that this here probably needs a rethink.

ratmice · 2025-06-10T07:57:17Z

Feel like this one can be closed, because the parse_map approach suggested in #589 seems like it should reduce a lot of complexity that was mostly not essential to solving the problem, but more just satisfying the type system's need for a Self type.

Initial prototype for using serde generic parse trees.

0df0cd4

rustfmt

f25c75e

ratmice commented Jun 8, 2025

View reviewed changes

ltratt self-assigned this Jun 9, 2025

ratmice mentioned this pull request Jun 10, 2025

Initial implementation of parse_map #592

Merged

ratmice closed this Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial prototype for using serde generic parse trees. #591

Initial prototype for using serde generic parse trees. #591

Uh oh!

ratmice commented Jun 7, 2025

Uh oh!

ratmice commented Jun 8, 2025 •

edited

Loading

Uh oh!

ratmice Jun 8, 2025 •

edited

Loading

Uh oh!

ratmice commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Initial prototype for using serde generic parse trees. #591

Initial prototype for using serde generic parse trees. #591

Uh oh!

Conversation

ratmice commented Jun 7, 2025

Uh oh!

ratmice commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ratmice Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ratmice commented Jun 8, 2025 •

edited

Loading

ratmice Jun 8, 2025 •

edited

Loading