Skip to content

Conversation

@ratmice
Copy link
Collaborator

@ratmice ratmice commented Jun 10, 2025

Here is an initial attempt to implement something like parse_map, which was discussed as a part of #589 ,
I'm marking as draft because it is totally untested, it at least still needs the following:

  1. Tests
  2. I should probably port the stuff in Initial prototype for using serde generic parse trees. #591 on top of it, so we're certain it is fit for purpose,
    and that the constraints that these function pointers are under from utilizing the %parse_param fields
    in this RtParserBuilder context doesn't overconstrain things such that the serde stuff is actually implementable.

But it seemed a good place to check that this is the kind of thing you were thinking of, or if I misinterpreted something.
I wasn't able to figure out what you meant by having separate Node types for Term and Nonterm, or how that
would be implementable, given that the type of the return from parse_* is generically threaded through to constrain the return type of actions for both Term and NonTerm (if that makes sense). But by using a single node type, for both Term and Nonterm seemed feasible.

Questions I have:

  • Is there a bettern name than Node for the return type? I just fear that we have both parser::Node<...> and Node as a generic type parameter, I would maybe just consider using T

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 10, 2025

So, after adding the tests I don't really feel any longer that we need porting over the serde stuff.
It seems like something as simple as throwing a #[derive(Serialize)] on the TestParseMap
type in the test would basically be a sufficient serde implementation basically equivalent to the stuff in my previous PR?

@ratmice ratmice marked this pull request as ready for review June 11, 2025 03:16
@ltratt
Copy link
Member

ltratt commented Jun 11, 2025

I wasn't able to figure out what you meant by having separate Node types for Term and Nonterm, or how that
would be implementable, given that the type of the return from parse_* is generically threaded through to constrain the return type of actions for both Term and NonTerm (if that makes sense). But by using a single node type, for both Term and Nonterm seemed feasible.

Aha, that does sound plausible. At which point I'm fine with this approach!

It seems like something as simple as throwing a #[derive(Serialize)] on the TestParseMap
type in the test would basically be a sufficient serde implementation basically equivalent to the stuff in my previous PR?

That's what I'm hoping!

One test of this would be if we can express the existing parse_actions in terms of parse_map. Is that possible? If so, it would show that the PR really has a superior API to what we had before.

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 11, 2025

It seems like something as simple as throwing a #[derive(Serialize)] on the TestParseMap
type in the test would basically be a sufficient serde implementation basically equivalent to the stuff in my previous PR?

That's what I'm hoping!

It is the case, the stuff in #589 (comment) was generated this way

One test of this would be if we can express the existing parse_actions in terms of parse_map. Is that possible? If so, it would show that the PR really has a superior API to what we had before.

It isn't possible, parse_actions is used under the hood of parse_map, unless you meant parse_generictree, in which case yes I could easily add another test to assert_eq!(parse_generictree(...).0.unwrap(), parse_map(...).0.unwrap());
I had sworn that I added a review comment to this effect, but I must have forgotten to submit it.

@ltratt
Copy link
Member

ltratt commented Jun 11, 2025

OOOPS! I meant parse_generictree. Sorry!

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 11, 2025

I went ahead and added that test which checks their equivalence, but perhaps we could remove a lot of code by reimplementing parse_noactions and parse_generictree in terms of parse_map. In which case that test wouldn't do very much.

In hindsight I wonder if you meant actually changing the parse_generictree impl?

Edit: After thinking about it, I don't know if parse_noactions would optimize out the interim vector in the nonterm closure, if not there may be reason to leave that one as is?

@ltratt
Copy link
Member

ltratt commented Jun 11, 2025

I went ahead and added that test which checks their equivalence, but perhaps we could remove a lot of code by reimplementing parse_noactions and parse_generictree in terms of parse_map

I hadn't even thought about parse_noactions, but of course you're right! Yes, reimplementing both in terms of parse_map would be good. The existing tests / examples should pick up if we've done such a conversion correctly. We can then mark both as deprecated and point users at parse_map.

[I'm now wondering if parse_map is the best name or not. Suggestions welcome!]

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

Off-hand as pretty much a functional programmer, and someone terrible at naming things, I thought parse_map was a great name 🤣

I'm uncertain about deprecating them because e.g. CTParserBuilder ends up calling these functions for YaccOriginalActionKind::GenericParseTree and YaccOriginalActionKind::NoAction, so I think unless we deprecate those
too we're going to end up generating code which has to #[allow(deprecated)], I'm not certain we want to deprecate those though,

For instance, I tend to use the Original formats simply because the format is simpler when I'm not embedding actions.
Then when I go to add actions, I'll upgrade to the Grmtools format, the other place we currently use parse_generictree is in the nimbleparse output. I think sources generated by CTParserBuilder and nimbleparse though are the only non-test cases where we're calling these though.

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

I'm uncertain about deprecating them because e.g. CTParserBuilder ends up calling these functions for YaccOriginalActionKind::GenericParseTree and YaccOriginalActionKind::NoAction, so I think unless we deprecate those
too we're going to end up generating code which has to #[allow(deprecated)], I'm not certain we want to deprecate those though,

I wonder if we can/should make CTParserBuilder handle those itself? I agree that they're useful, but I'm partly wondering if we can remove some of the complexity from the core parser and just have this one "simpler" API (in the medium term).

Warning: I haven't thought this through, and perhaps keeping things as-is is the best outcome!

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

I wonder if we can/should make CTParserBuilder handle those itself? I agree that they're useful, but I'm partly wondering if we can remove some of the complexity from the core parser and just have this one "simpler" API (in the medium term).

I think that'll actually work, it means duplicating the parse_map calls, in two places but those are only a few lines,
when I was originally thinking about it I was also taking into account the Node.pp call which would involve a lot more duplication, but we actually don't need to duplicate that if we aren't deprecating Node<...> too, so I was overestimating quite a bit.

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

Good question about whether Node should be in the core library or not! I think, ideally, we'd push it out of the core parser at least, perhaps into CTParserBuilder? Or it could even put it in generated code perhaps?

let input = fs::read_to_string(&path)?;
let l: LRNonStreamingLexer<LexerTypesT> =
closure_lexerdef.lexer(&input);
#[allow(deprecated)]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed one here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in f8f5157

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this the only calls left to these deprecated functions should be in the testsuite code.

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

Looks good! Can we move Node or is that too disruptive?

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

Yeah we can, I think perhaps a good place might be wherever we want the serialization types for test_files to land,
There is some amount of textual representation of an AST similarity to them.

I don't know if we have a good name for that module, I had been calling it lrpartest, but moving Node in doesn't quite follow the test theme. But if we called it something more generic like ast_fmt there is some testing specific code that doesn't quite match that name either.

Maybe if we call it something like ast_utils, testing still somewhat falls under the utility aspect?
Even if we move it, we can re-export it from parser without much trouble, if it is too disruptive to move it without forwarding address.

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

Where in the module/file hierarchy are you thinking that such a file might live?

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

I was thinking lrpar::ast_utils in lrpar/src/ast_utils.rs (we can consider it a placeholder for now unless we can find a better name!), with the thinking that test_files code gets used by both nimbleparse and the test_files support in CTParserBuilder so I think it needs to live in lrpar.

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

I guess, ideally, the test stuff would be moved outside the lrpar crate, but that might not be possible. Idle thought: perhaps we need a generic testing crate of some sorts?

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

I think it'd be awkward to have the Node itself in a testing focused crate, because it isn't quite testing related, the similarity really is that a lot of the testing stuff is just a reshuffling of the Node type, so they kind of overlap in their implementation but not purpose.

If we did want to have a testing crate I'd be fine with that, It looks like lrlex::CTLexerBuilder and nimbleparse would
need to individually depend upon that testing crate.

So it does look like that testing code at least can be pushed out of lrpar entirely.

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

So I think rather there is no impediment to moving the testing stuff out of lrpar.
So for the current case of Node, the impediment is the lrpar/examples/calc_parsetree, example and similar usages.

For instance we can see the Node type returned by parse() of the generated module,
and used in the eval impl here: https://github.com/softdevteam/grmtools/blob/master/lrpar/examples/calc_parsetree/src/main.rs#L54

My thinking is that it is pretty user surface visible stuff and seems likely to need a deprecation cycle?

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

For calc_parsetree I think it might actually be preferable for it to explicitly have its own Node type and use parse_map.

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

Right, I guess I mean more than calc_parsetree, i.e. that example is just an example usage of
YaccKind::Original(YaccOriginialActionKind::GenericParseTree), so we need to take a further step of deprecating/removing that ActionKind before we can remove Node.

This also brings up an interesting issue, we have added parse_map to RTParserBuilder in this patch, but there isn't an equivalent for CTParserBuilder yet, so I think were kind of getting to the heart of the issue now.

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

Right, I guess I mean more than calc_parsetree, i.e. that example is just an example usage of
YaccKind::Original(YaccOriginialActionKind::GenericParseTree), so we need to take a further step of deprecating/removing that ActionKind before we can remove Node.

I see your point!

Could/should we put a Node into the generated code for this case? Maybe that breaks the cycle. [We could then, perhaps, mark GenericParseTree as deprecated if we want, but that could be a separate step, and I don't think we need to do so immediately.]

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

What I would like to try and see if I can make it work is to:

  1. Make the generated code re-export an alias for lrpar::Node, and deprecate lrpar::Node.
    Old code would (hopefully) get a deprecation warning if they try to import lrpar::Node.
    New code which uses the re-export would not.
  2. Make nimbleparse use it's own copy of Node

So, after a deprecation cycle we can remove the entirety of parse_noaction, parse_generictree, action_generictree, and Node from lrpar, without any regard for the deprecation of GenericParseTree. Then we replace the re-export with a concrete instance of Node

I think that seems feasible, but does it sound reasonable?

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

Sounds like a good plan to me: I like a route to deprecation where users get a warning (rather than an error).

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

It required some hacks to do that, e.g. deprecating a type alias and then re-exporting the hidden concrete instance.
But it does appear to work, in that if I change the Node type between lrpar::Node and calc_y::Node I do/don't get the deprecation warning.

One thing to note is that the lr_cactus function used by cpctplus recovery seems to use the Node type,
so the deprecation might have a limited effect of making it pub(crate) lrpar::Node, and perhaps removing the impl Node unless we can migrate that usage somehow.

@ltratt
Copy link
Member

ltratt commented Jun 12, 2025

Looks good! I think we're ready to squash.

@ratmice
Copy link
Collaborator Author

ratmice commented Jun 12, 2025

I noticed a couple of places where #[allow(deprecated)] didn't get removed, but was no longer needed.
(So I went through all of them in the library, and double-checked they were needed and removed any that werent)
Then fixed up the wording on the deprecation note which needed updating after the deprecation of Node.

Squashed.

@ltratt ltratt added this pull request to the merge queue Jun 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 13, 2025
@ratmice
Copy link
Collaborator Author

ratmice commented Jun 13, 2025

Had to tweak the docs to avoid the doc link to Node, since it is now a type outside the lrpar dependency tree.
Should be fixed in the force push.

@ltratt ltratt added this pull request to the merge queue Jun 13, 2025
Merged via the queue into softdevteam:master with commit a9466a2 Jun 13, 2025
2 checks passed
@ratmice ratmice deleted the parse_map branch June 13, 2025 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants