Skip to content

Conversation

@ratmice
Copy link
Collaborator

@ratmice ratmice commented May 26, 2025

I'm going to mark this as a draft, it isn't something I personally am likely to use,
While thinking through #584 and what uses GrammarAST::new has or could have,
issue #484 came to mind, the current workaround described in that issue is to create multiple copies of yacc grammars
and modify the start rule.

As such cc'ing @rneswold just to see whether or not this approach is an improvement on that hack,
or if this is something which if we added it might just go unused?

The current hacks have some drawbacks like needing to disable warnings as errors because if you change the start rule, unused warnings can come up from the top-level start rule.

The idea of this patch is to provide an alternative hack mechanism, where you build a GrammarAST, clone it
and then modify the start field of each clone, in an unsafe manner. Since you only do the actual checking for the first rule, you don't run into running the checks multiple times and subsequent warnings/errors caused by checking a partial grammar with a start rule somewhere in the middle.

It doesn't really fix #484 or share parsing tables or anything nice.
This may well be just a solution looking for a problem.

Also, because of the lrlex dependency I wasn't able to fully test this within cfgrammar (building parsers/lexers and passing input), I should probably be concerned though that the whole &mut self -> &mut GrammarAST seems like it might cause borrow checker issues once we try to actually finish the parser/lexer build process with the modified grammar?

@ratmice
Copy link
Collaborator Author

ratmice commented May 27, 2025

Somewhat silly of me not to think of it, but there is no way to get CTParserBuilder to actually use this modified AST since it always works from paths. So this is fairly unlikely to help.

@rneswold
Copy link
Contributor

The idea of this patch is to provide an alternative hack mechanism, where you build a GrammarAST, clone it and then modify the start field of each clone, in an unsafe manner.

This sounds like it could solve my case: 6 production rules which 3 could be used as a top-level rule.

Making more than one top-level rule doesn't seem like an unsafe operation to me. But maybe it means all rules would be more expensive(?) Or it's the only way to do it with the current GrammarAST? I just hate having to add unsafe code to a project.

I know, I'm asking for a lot! 🤣 Thanks for thinking about this.

@ltratt
Copy link
Member

ltratt commented May 27, 2025

Dumb idea: could we make something like fn clone_and_change_start_rule(rule: ...) -> Self to satisfy this use case?

@ratmice
Copy link
Collaborator Author

ratmice commented May 27, 2025

Making more than one top-level rule doesn't seem like an unsafe operation to me.

True, but the mechanism provided here is much more powerful than that, the unsafe
aspect is that it allows turning an unvalidated AST into a valid one.

For instance, you could also delete a rule while leaving productions referencing it in place.
This would not normally pass validation, but both the ast_mut and set_ast functions allow
one to bypass that.

@ratmice
Copy link
Collaborator Author

ratmice commented May 27, 2025

Dumb idea: could we make something like fn clone_and_change_start_rule(rule: ...) -> Self to satisfy this use case?

While perhaps providing a safer abstraction (I think we'd have to return a Result in case rule isn't a rule in the grammar), this I think runs into much the same problem, of how are we going to call this function from CTParserBuilder.

With either mechanism we could perhaps use a function CTParserBuilder::set_ast(self, ASTWithValidationInfo),
this function would then bypass the filesystem read/parse/validation phase, so it'd only use grammar_path if self.ast is None.

One of the reasons here we need to bypass validation is because we no longer have source which matches the AST, and it is also likely that spans are messed up so if we did do checking and what not and an error did occur it isn't clear how we'd do error printing (which now even relies on source availability within CTParserBuilder!

Both these mechanisms can bypass this problem it seems, (one by being unsafe, the other by potentially returning a Result at clone_and_change_start_rule time).

Edit: I guess clone_and_change_start_rule doesn't require returning a result, precisely because ASTWithValidationInfo owns the vector of errors, so this method can append an error to a previously valid AST.

Edit 2: The impression that I get though is that it might still be better to return a Result here, because the errors inside ASTWithValidationInfo assume spans exist, while we can return a separate error without Span from fn clone_and_change_start_rule... anyhow I'll try and come up with a patch.

@ratmice
Copy link
Collaborator Author

ratmice commented May 28, 2025

Ironically all the new changes, because they rely so much on grammar sources being available within CTParserBuilder::build() make exposing this functionality through CTParserBuilder quite a large/involved change.

This is mostly diagnostic printing, but to some extent also %grmtools section parsing.
It also has sneaked it's way into return values like CTParser for lex/yacc errors like the "missing from" ones.

So given that this is still a pretty hacky solution, I'm not sure it makes sense to go through all that effort over actually trying to do something which enables multiple start rules from a single grammar?

@ratmice
Copy link
Collaborator Author

ratmice commented May 28, 2025

I think that perhaps I just have to think about this differently, we'd need a function that takes both an ASTWithValidationInfo and a src string to the original source from which the AST is derived.
because of the Rule parameter we have a span to the Rule, (rather than the %start directive),
So we can still reference the original sources.

My original attempt was based on the fact that:
A valid AST remains valid, and thus we don't need the source string, but that is basically false, because of errors that can pop up after grammar validation, like missing from parser/lexer. Or rather it only remains true for grammar errors. Not for other cases for which we need the sources.

Anyhow at least in this case (but not arbitrary AST modifications) the original sources should be sufficient.

@ltratt
Copy link
Member

ltratt commented May 28, 2025

I think that perhaps I just have to think about this differently, we'd need a function that takes both an ASTWithValidationInfo and a src string to the original source from which the AST is derived.
because of the Rule parameter we have a span to the Rule, (rather than the %start directive),
So we can still reference the original sources.

This sounds plausible. Would that supersede this PR?

@ratmice
Copy link
Collaborator Author

ratmice commented May 28, 2025

Would that supersede this PR?

Yup

@ltratt
Copy link
Member

ltratt commented May 28, 2025

It sounds like it's worth giving it a go. Perhaps it will be an "unofficial" API of sorts, at least until we gain more confidence in it, but it is a feature that some people really want, so even unofficial/unstable is better than nothing (and double points if it's not unsafe).

@ratmice
Copy link
Collaborator Author

ratmice commented May 28, 2025

b9aba11 and dbb12c4 represent my second attempt at this, they actually include (somewhat hacky) CTParserBuilder support and cttests to check that it actually works.

This doesn't actually bypass the need for disabling warnings, because the unused checks also happen after validation :(

@ratmice
Copy link
Collaborator Author

ratmice commented May 28, 2025

Perhaps it will be an "unofficial" API of sorts

Ahh, I had forgot about this part, I know we have some API for unofficial functions which require features to be made pub, and special "Unstable" arguments. This seems like a good opportunity to try and use those.

Edit: The CTParserBuilder parts should be done now in 2be58ac

For the ASTWithValidationInfo parts though we don't currently have an unstable_api module for cfgrammar, Do we want those parts of the api unstable/unofficial too?

@ltratt
Copy link
Member

ltratt commented May 30, 2025

Thanks -- I think this is a good intermediate step towards future perfection! Please squash.

#![allow(clippy::unnecessary_wraps)]
#![allow(clippy::upper_case_acronyms)]
#![forbid(unsafe_code)]
#![deny(unsafe_code)]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change no longer needed.


impl GrammarAST {
pub fn new() -> GrammarAST {
pub(crate) fn new() -> GrammarAST {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change of visibility was only included in this patch accidentally, I feel like it is good to consider whether we should do this, but probably doesn't belong in this series.

So I'd rather drop it from this PR.

This attempts to improve the status quo regarding issue softdevteam#484.
Previously in order to build a grammar using multiple start rules
you needed multiple copies of the grammar in the filesystem,
each with a different start rule.

This adds a mechanism for modifying the AST by cloning and changing
the start rules, then building a parser from each clone.
The `CTParserBuilder` methods being unstable and require
a build-dependency `feature`.
@ratmice
Copy link
Collaborator Author

ratmice commented May 30, 2025

Squashed with the visibility modification, and unsafe_code changes omitted.
I squashed this into two commits, hoping that 5edbfc5 will also suffice as an example of usage

@ltratt
Copy link
Member

ltratt commented May 30, 2025

I think we can mark this as undraft and merge.

@ratmice ratmice marked this pull request as ready for review May 30, 2025 14:23
@ratmice
Copy link
Collaborator Author

ratmice commented May 30, 2025

Sorry about that, it's no longer in draft now.

@ratmice ratmice changed the title Add unsafe fn for modifying GrammarAST Allow building parsers from GrammarAST with modified start rules. May 30, 2025
@ltratt ltratt added this pull request to the merge queue May 30, 2025
Merged via the queue into softdevteam:master with commit 72636f0 May 30, 2025
2 checks passed
@ratmice ratmice deleted the modified_ast branch May 30, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants