Initial implementation of `parse_map` #592

ratmice · 2025-06-10T04:58:01Z

Here is an initial attempt to implement something like parse_map, which was discussed as a part of #589 ,
I'm marking as draft because it is totally untested, it at least still needs the following:

Tests
I should probably port the stuff in Initial prototype for using serde generic parse trees. #591 on top of it, so we're certain it is fit for purpose,
and that the constraints that these function pointers are under from utilizing the %parse_param fields
in this RtParserBuilder context doesn't overconstrain things such that the serde stuff is actually implementable.

But it seemed a good place to check that this is the kind of thing you were thinking of, or if I misinterpreted something.
I wasn't able to figure out what you meant by having separate Node types for Term and Nonterm, or how that
would be implementable, given that the type of the return from parse_* is generically threaded through to constrain the return type of actions for both Term and NonTerm (if that makes sense). But by using a single node type, for both Term and Nonterm seemed feasible.

Questions I have:

Is there a bettern name than Node for the return type? I just fear that we have both parser::Node<...> and Node as a generic type parameter, I would maybe just consider using T

ratmice · 2025-06-10T06:36:15Z

So, after adding the tests I don't really feel any longer that we need porting over the serde stuff.
It seems like something as simple as throwing a #[derive(Serialize)] on the TestParseMap
type in the test would basically be a sufficient serde implementation basically equivalent to the stuff in my previous PR?

ltratt · 2025-06-11T16:00:01Z

I wasn't able to figure out what you meant by having separate Node types for Term and Nonterm, or how that
would be implementable, given that the type of the return from parse_* is generically threaded through to constrain the return type of actions for both Term and NonTerm (if that makes sense). But by using a single node type, for both Term and Nonterm seemed feasible.

Aha, that does sound plausible. At which point I'm fine with this approach!

It seems like something as simple as throwing a #[derive(Serialize)] on the TestParseMap
type in the test would basically be a sufficient serde implementation basically equivalent to the stuff in my previous PR?

That's what I'm hoping!

One test of this would be if we can express the existing parse_actions in terms of parse_map. Is that possible? If so, it would show that the PR really has a superior API to what we had before.

ratmice · 2025-06-11T16:11:40Z

It seems like something as simple as throwing a #[derive(Serialize)] on the TestParseMap
type in the test would basically be a sufficient serde implementation basically equivalent to the stuff in my previous PR?

That's what I'm hoping!

It is the case, the stuff in #589 (comment) was generated this way

One test of this would be if we can express the existing parse_actions in terms of parse_map. Is that possible? If so, it would show that the PR really has a superior API to what we had before.

It isn't possible, parse_actions is used under the hood of parse_map, unless you meant parse_generictree, in which case yes I could easily add another test to assert_eq!(parse_generictree(...).0.unwrap(), parse_map(...).0.unwrap());
I had sworn that I added a review comment to this effect, but I must have forgotten to submit it.

ltratt · 2025-06-11T16:18:58Z

OOOPS! I meant parse_generictree. Sorry!

ratmice · 2025-06-11T16:28:40Z

I went ahead and added that test which checks their equivalence, but perhaps we could remove a lot of code by reimplementing parse_noactions and parse_generictree in terms of parse_map. In which case that test wouldn't do very much.

In hindsight I wonder if you meant actually changing the parse_generictree impl?

Edit: After thinking about it, I don't know if parse_noactions would optimize out the interim vector in the nonterm closure, if not there may be reason to leave that one as is?

ltratt · 2025-06-11T16:59:55Z

I went ahead and added that test which checks their equivalence, but perhaps we could remove a lot of code by reimplementing parse_noactions and parse_generictree in terms of parse_map

I hadn't even thought about parse_noactions, but of course you're right! Yes, reimplementing both in terms of parse_map would be good. The existing tests / examples should pick up if we've done such a conversion correctly. We can then mark both as deprecated and point users at parse_map.

[I'm now wondering if parse_map is the best name or not. Suggestions welcome!]

ratmice · 2025-06-12T02:18:20Z

Off-hand as pretty much a functional programmer, and someone terrible at naming things, I thought parse_map was a great name 🤣

I'm uncertain about deprecating them because e.g. CTParserBuilder ends up calling these functions for YaccOriginalActionKind::GenericParseTree and YaccOriginalActionKind::NoAction, so I think unless we deprecate those
too we're going to end up generating code which has to #[allow(deprecated)], I'm not certain we want to deprecate those though,

For instance, I tend to use the Original formats simply because the format is simpler when I'm not embedding actions.
Then when I go to add actions, I'll upgrade to the Grmtools format, the other place we currently use parse_generictree is in the nimbleparse output. I think sources generated by CTParserBuilder and nimbleparse though are the only non-test cases where we're calling these though.

ltratt · 2025-06-12T07:07:52Z

I'm uncertain about deprecating them because e.g. CTParserBuilder ends up calling these functions for YaccOriginalActionKind::GenericParseTree and YaccOriginalActionKind::NoAction, so I think unless we deprecate those
too we're going to end up generating code which has to #[allow(deprecated)], I'm not certain we want to deprecate those though,

I wonder if we can/should make CTParserBuilder handle those itself? I agree that they're useful, but I'm partly wondering if we can remove some of the complexity from the core parser and just have this one "simpler" API (in the medium term).

Warning: I haven't thought this through, and perhaps keeping things as-is is the best outcome!

ratmice · 2025-06-12T07:15:34Z

I wonder if we can/should make CTParserBuilder handle those itself? I agree that they're useful, but I'm partly wondering if we can remove some of the complexity from the core parser and just have this one "simpler" API (in the medium term).

I think that'll actually work, it means duplicating the parse_map calls, in two places but those are only a few lines,
when I was originally thinking about it I was also taking into account the Node.pp call which would involve a lot more duplication, but we actually don't need to duplicate that if we aren't deprecating Node<...> too, so I was overestimating quite a bit.

ltratt · 2025-06-12T07:31:09Z

Good question about whether Node should be in the core library or not! I think, ideally, we'd push it out of the core parser at least, perhaps into CTParserBuilder? Or it could even put it in generated code perhaps?

ratmice · 2025-06-12T07:49:12Z

lrlex/src/lib/ctbuilder.rs

                                let input = fs::read_to_string(&path)?;
                                let l: LRNonStreamingLexer<LexerTypesT> =
                                    closure_lexerdef.lexer(&input);
+                                #[allow(deprecated)]


Missed one here.

Should be fixed in f8f5157

After this the only calls left to these deprecated functions should be in the testsuite code.

ltratt · 2025-06-12T07:54:49Z

Looks good! Can we move Node or is that too disruptive?

ratmice · 2025-06-12T08:04:33Z

Yeah we can, I think perhaps a good place might be wherever we want the serialization types for test_files to land,
There is some amount of textual representation of an AST similarity to them.

I don't know if we have a good name for that module, I had been calling it lrpartest, but moving Node in doesn't quite follow the test theme. But if we called it something more generic like ast_fmt there is some testing specific code that doesn't quite match that name either.

Maybe if we call it something like ast_utils, testing still somewhat falls under the utility aspect?
Even if we move it, we can re-export it from parser without much trouble, if it is too disruptive to move it without forwarding address.

ltratt · 2025-06-12T08:11:37Z

Where in the module/file hierarchy are you thinking that such a file might live?

ratmice · 2025-06-12T08:14:16Z

I was thinking lrpar::ast_utils in lrpar/src/ast_utils.rs (we can consider it a placeholder for now unless we can find a better name!), with the thinking that test_files code gets used by both nimbleparse and the test_files support in CTParserBuilder so I think it needs to live in lrpar.

ltratt · 2025-06-12T08:15:16Z

I guess, ideally, the test stuff would be moved outside the lrpar crate, but that might not be possible. Idle thought: perhaps we need a generic testing crate of some sorts?

ratmice · 2025-06-12T08:29:39Z

I think it'd be awkward to have the Node itself in a testing focused crate, because it isn't quite testing related, the similarity really is that a lot of the testing stuff is just a reshuffling of the Node type, so they kind of overlap in their implementation but not purpose.

If we did want to have a testing crate I'd be fine with that, It looks like lrlex::CTLexerBuilder and nimbleparse would
need to individually depend upon that testing crate.

grmtools/lrlex/src/lib/ctbuilder.rs

Line 545 in 5895724

glob(&path_joined.to_string_lossy()).map_err(|e| e.to_string())?
grmtools/nimbleparse/src/main.rs

Line 502 in 5895724

if let Some(HeaderValue(_, val)) = self.header.get("test_files") {

So it does look like that testing code at least can be pushed out of lrpar entirely.

ratmice · 2025-06-12T08:44:05Z

So I think rather there is no impediment to moving the testing stuff out of lrpar.
So for the current case of Node, the impediment is the lrpar/examples/calc_parsetree, example and similar usages.

For instance we can see the Node type returned by parse() of the generated module,
and used in the eval impl here: https://github.com/softdevteam/grmtools/blob/master/lrpar/examples/calc_parsetree/src/main.rs#L54

My thinking is that it is pretty user surface visible stuff and seems likely to need a deprecation cycle?

ltratt · 2025-06-12T08:45:50Z

For calc_parsetree I think it might actually be preferable for it to explicitly have its own Node type and use parse_map.

ratmice · 2025-06-12T08:54:20Z

Right, I guess I mean more than calc_parsetree, i.e. that example is just an example usage of
YaccKind::Original(YaccOriginialActionKind::GenericParseTree), so we need to take a further step of deprecating/removing that ActionKind before we can remove Node.

This also brings up an interesting issue, we have added parse_map to RTParserBuilder in this patch, but there isn't an equivalent for CTParserBuilder yet, so I think were kind of getting to the heart of the issue now.

ltratt · 2025-06-12T08:55:35Z

Right, I guess I mean more than calc_parsetree, i.e. that example is just an example usage of
YaccKind::Original(YaccOriginialActionKind::GenericParseTree), so we need to take a further step of deprecating/removing that ActionKind before we can remove Node.

I see your point!

Could/should we put a Node into the generated code for this case? Maybe that breaks the cycle. [We could then, perhaps, mark GenericParseTree as deprecated if we want, but that could be a separate step, and I don't think we need to do so immediately.]

ratmice · 2025-06-12T09:10:27Z

What I would like to try and see if I can make it work is to:

Make the generated code re-export an alias for lrpar::Node, and deprecate lrpar::Node.
Old code would (hopefully) get a deprecation warning if they try to import lrpar::Node.
New code which uses the re-export would not.
Make nimbleparse use it's own copy of Node

So, after a deprecation cycle we can remove the entirety of parse_noaction, parse_generictree, action_generictree, and Node from lrpar, without any regard for the deprecation of GenericParseTree. Then we replace the re-export with a concrete instance of Node

I think that seems feasible, but does it sound reasonable?

ltratt · 2025-06-12T09:32:40Z

Sounds like a good plan to me: I like a route to deprecation where users get a warning (rather than an error).

ratmice · 2025-06-12T10:40:10Z

It required some hacks to do that, e.g. deprecating a type alias and then re-exporting the hidden concrete instance.
But it does appear to work, in that if I change the Node type between lrpar::Node and calc_y::Node I do/don't get the deprecation warning.

One thing to note is that the lr_cactus function used by cpctplus recovery seems to use the Node type,
so the deprecation might have a limited effect of making it pub(crate) lrpar::Node, and perhaps removing the impl Node unless we can migrate that usage somehow.

ltratt · 2025-06-12T13:21:09Z

Looks good! I think we're ready to squash.

ratmice · 2025-06-12T14:18:47Z

I noticed a couple of places where #[allow(deprecated)] didn't get removed, but was no longer needed.
(So I went through all of them in the library, and double-checked they were needed and removed any that werent)
Then fixed up the wording on the deprecation note which needed updating after the deprecation of Node.

Squashed.

… fn.

ratmice · 2025-06-13T07:51:46Z

Had to tweak the docs to avoid the doc link to Node, since it is now a type outside the lrpar dependency tree.
Should be fixed in the force push.

ratmice assigned ltratt Jun 10, 2025

ratmice marked this pull request as ready for review June 11, 2025 03:16

ratmice commented Jun 12, 2025

View reviewed changes

ratmice force-pushed the parse_map branch from 2762b8e to 0ced820 Compare June 12, 2025 14:16

ltratt added this pull request to the merge queue Jun 13, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 13, 2025

Replace parse_generictree, parse_noaction, with a new parse_map…

357349a

… fn.

ratmice force-pushed the parse_map branch from 0ced820 to 357349a Compare June 13, 2025 07:50

ltratt added this pull request to the merge queue Jun 13, 2025

Merged via the queue into softdevteam:master with commit a9466a2 Jun 13, 2025
2 checks passed

ratmice deleted the parse_map branch June 13, 2025 08:06

ratmice mentioned this pull request Oct 8, 2025

Fix compile error in generated code. #600

Merged

ratmice mentioned this pull request Oct 15, 2025

Serde support for ParseGenericTree #589

Closed

Initial implementation of parse_map #592

Initial implementation of parse_map #592

Uh oh!

Conversation

ratmice commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ratmice commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented Jun 11, 2025

Uh oh!

ratmice commented Jun 11, 2025

Uh oh!

ltratt commented Jun 11, 2025

Uh oh!

ratmice commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented Jun 11, 2025

Uh oh!

ratmice commented Jun 12, 2025

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025

Uh oh!

ltratt commented Jun 12, 2025

Uh oh!

ratmice commented Jun 12, 2025

Uh oh!

Uh oh!

ratmice commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Initial implementation of `parse_map` #592

Initial implementation of `parse_map` #592

ratmice commented Jun 10, 2025 •

edited

Loading

ratmice commented Jun 10, 2025 •

edited

Loading

ratmice commented Jun 11, 2025 •

edited

Loading

ratmice commented Jun 12, 2025 •

edited

Loading

ratmice commented Jun 12, 2025 •

edited

Loading

ratmice commented Jun 12, 2025 •

edited

Loading