Skip to content

Conversation

@ratmice
Copy link
Collaborator

@ratmice ratmice commented Jun 7, 2025

This is an initial prototype for #589, I think the code is pretty gross, however I also believe it might work,
just from minimal testing with

cargo build --all-features
./target/debug/nimbleparse -t ./lrpar/examples/calc_parsetree/src/calc.{l,y} lrpar/examples/calc_parsetree/src/input.txt

For instance, if I compare the following output with and without -t passed to nimbleparse, it seems like I got the correct nesting (I think, unless my sleep deprived brain is mistaken).

{
    "src": "5 + 4 * 3",
    "ast": [
        Nonterm("Expr", [
            Nonterm("Expr", [
                Nonterm("Term", [
                    Nonterm("Factor", [
                        Term("INT", "5"),
                    ]),
                ]),
            ]),
            Term("+", "+"),
            Nonterm("Term", [
                Nonterm("Term", [
                    Nonterm("Factor", [
                        Term("INT", "4"),
                    ]),
                ]),
                Term("*", "*"),
                Nonterm("Factor", [
                    Term("INT", "3"),
                ]),
            ]),
        ]),
    ],
}
Expr
 Expr
  Term
   Factor
    INT 5
 + +
 Term
  Term
   Factor
    INT 4
  * *
  Factor
   INT 3

It currently only supports Serialize, doesn't do anything when parsing multiple inputs (parse_many) or when inputs are specified via test_files in the %grmtools section.

But it seemed like perhaps a good place to stop and evaluate.

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 8, 2025

So, the one obvious thing that is present in this serialization, that is not in the ParseGenericTree output is the presence of the Term/Nonterm variants, perhaps there is a way to omit that from the output and would clean up the output perhaps. But it isn't clear to me how we can omit it, and then unambiguously deserialize unless we just start treating Term as having an empty Vec<Node>. It might be worth trying?

Edit:
I tried removing the Term/Nonterm distinction using a struct like the following, but rather than simplifying the output, it more than doubled the length of the serialized output, 78 lines, vs rather than the 26 lines of serialized output above.
So somehow it managed to be both longer/noisier and contain less information.

        #[derive(Serialize)]
        struct NamedNode<'a> {
            name: &'a str,
            lexeme: Option<&'a str>,
            nodes: VecDeque<NamedNode<'a>>,
        }

I'm still interested in how to improve the serialized AST, but that attempt seemed like a dud.

insert_inner(entries, node, limit, acc + 1)
} else {
top.push_back(node)
}
Copy link
Collaborator Author

@ratmice ratmice Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole insert_inner function I think is pretty terrible, and is still using the first algorithm
that came to mind. In particular the recursion seems problematic for large a AST (potential stack overflows, and seems inefficient).

Perhaps there is a better way to implement it using something like a BinaryHeap<NamedNode>, and a Vec<NamedNode>, so that the inner elements are being stored/pushed to in the binary heap.
So that rather than starting from the root and recursively traversing the back_mut() to find where we push_back. The element to push onto would be at some lower priority of the BinaryHeap.
Then the Vec<NamedNode> would be the rest of the nodes that in the order they get popped from that heap.

Anyhow I haven't totally thought it through, instead more focused on what the actual AST representation should be, but figured I should drop a note that this here probably needs a rethink.

@ltratt ltratt self-assigned this Jun 9, 2025
@ratmice
Copy link
Collaborator Author

ratmice commented Jun 10, 2025

Feel like this one can be closed, because the parse_map approach suggested in #589 seems like it should reduce a lot of complexity that was mostly not essential to solving the problem, but more just satisfying the type system's need for a Self type.

@ratmice ratmice closed this Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants