Allow building parsers from `GrammarAST` with modified start rules. #585

ratmice · 2025-05-26T21:34:16Z

I'm going to mark this as a draft, it isn't something I personally am likely to use,
While thinking through #584 and what uses GrammarAST::new has or could have,
issue #484 came to mind, the current workaround described in that issue is to create multiple copies of yacc grammars
and modify the start rule.

As such cc'ing @rneswold just to see whether or not this approach is an improvement on that hack,
or if this is something which if we added it might just go unused?

The current hacks have some drawbacks like needing to disable warnings as errors because if you change the start rule, unused warnings can come up from the top-level start rule.

The idea of this patch is to provide an alternative hack mechanism, where you build a GrammarAST, clone it
and then modify the start field of each clone, in an unsafe manner. Since you only do the actual checking for the first rule, you don't run into running the checks multiple times and subsequent warnings/errors caused by checking a partial grammar with a start rule somewhere in the middle.

It doesn't really fix #484 or share parsing tables or anything nice.
This may well be just a solution looking for a problem.

Also, because of the lrlex dependency I wasn't able to fully test this within cfgrammar (building parsers/lexers and passing input), I should probably be concerned though that the whole &mut self -> &mut GrammarAST seems like it might cause borrow checker issues once we try to actually finish the parser/lexer build process with the modified grammar?

ratmice · 2025-05-27T07:59:55Z

Somewhat silly of me not to think of it, but there is no way to get CTParserBuilder to actually use this modified AST since it always works from paths. So this is fairly unlikely to help.

rneswold · 2025-05-27T14:02:45Z

The idea of this patch is to provide an alternative hack mechanism, where you build a GrammarAST, clone it and then modify the start field of each clone, in an unsafe manner.

This sounds like it could solve my case: 6 production rules which 3 could be used as a top-level rule.

Making more than one top-level rule doesn't seem like an unsafe operation to me. But maybe it means all rules would be more expensive(?) Or it's the only way to do it with the current GrammarAST? I just hate having to add unsafe code to a project.

I know, I'm asking for a lot! 🤣 Thanks for thinking about this.

ltratt · 2025-05-27T14:15:06Z

Dumb idea: could we make something like fn clone_and_change_start_rule(rule: ...) -> Self to satisfy this use case?

ratmice · 2025-05-27T16:59:14Z

Making more than one top-level rule doesn't seem like an unsafe operation to me.

True, but the mechanism provided here is much more powerful than that, the unsafe
aspect is that it allows turning an unvalidated AST into a valid one.

For instance, you could also delete a rule while leaving productions referencing it in place.
This would not normally pass validation, but both the ast_mut and set_ast functions allow
one to bypass that.

ratmice · 2025-05-27T17:20:27Z

Dumb idea: could we make something like fn clone_and_change_start_rule(rule: ...) -> Self to satisfy this use case?

While perhaps providing a safer abstraction (I think we'd have to return a Result in case rule isn't a rule in the grammar), this I think runs into much the same problem, of how are we going to call this function from CTParserBuilder.

With either mechanism we could perhaps use a function CTParserBuilder::set_ast(self, ASTWithValidationInfo),
this function would then bypass the filesystem read/parse/validation phase, so it'd only use grammar_path if self.ast is None.

One of the reasons here we need to bypass validation is because we no longer have source which matches the AST, and it is also likely that spans are messed up so if we did do checking and what not and an error did occur it isn't clear how we'd do error printing (which now even relies on source availability within CTParserBuilder!

Both these mechanisms can bypass this problem it seems, (one by being unsafe, the other by potentially returning a Result at clone_and_change_start_rule time).

Edit: I guess clone_and_change_start_rule doesn't require returning a result, precisely because ASTWithValidationInfo owns the vector of errors, so this method can append an error to a previously valid AST.

Edit 2: The impression that I get though is that it might still be better to return a Result here, because the errors inside ASTWithValidationInfo assume spans exist, while we can return a separate error without Span from fn clone_and_change_start_rule... anyhow I'll try and come up with a patch.

ratmice · 2025-05-28T05:58:14Z

Ironically all the new changes, because they rely so much on grammar sources being available within CTParserBuilder::build() make exposing this functionality through CTParserBuilder quite a large/involved change.

This is mostly diagnostic printing, but to some extent also %grmtools section parsing.
It also has sneaked it's way into return values like CTParser for lex/yacc errors like the "missing from" ones.

So given that this is still a pretty hacky solution, I'm not sure it makes sense to go through all that effort over actually trying to do something which enables multiple start rules from a single grammar?

ratmice · 2025-05-28T07:28:00Z

I think that perhaps I just have to think about this differently, we'd need a function that takes both an ASTWithValidationInfo and a src string to the original source from which the AST is derived.
because of the Rule parameter we have a span to the Rule, (rather than the %start directive),
So we can still reference the original sources.

My original attempt was based on the fact that:
A valid AST remains valid, and thus we don't need the source string, but that is basically false, because of errors that can pop up after grammar validation, like missing from parser/lexer. Or rather it only remains true for grammar errors. Not for other cases for which we need the sources.

Anyhow at least in this case (but not arbitrary AST modifications) the original sources should be sufficient.

ltratt · 2025-05-28T09:41:31Z

I think that perhaps I just have to think about this differently, we'd need a function that takes both an ASTWithValidationInfo and a src string to the original source from which the AST is derived.
because of the Rule parameter we have a span to the Rule, (rather than the %start directive),
So we can still reference the original sources.

This sounds plausible. Would that supersede this PR?

ratmice · 2025-05-28T10:02:11Z

Would that supersede this PR?

Yup

ltratt · 2025-05-28T10:03:25Z

It sounds like it's worth giving it a go. Perhaps it will be an "unofficial" API of sorts, at least until we gain more confidence in it, but it is a feature that some people really want, so even unofficial/unstable is better than nothing (and double points if it's not unsafe).

ratmice · 2025-05-28T20:50:26Z

b9aba11 and dbb12c4 represent my second attempt at this, they actually include (somewhat hacky) CTParserBuilder support and cttests to check that it actually works.

This doesn't actually bypass the need for disabling warnings, because the unused checks also happen after validation :(

ratmice · 2025-05-28T21:10:32Z

Perhaps it will be an "unofficial" API of sorts

Ahh, I had forgot about this part, I know we have some API for unofficial functions which require features to be made pub, and special "Unstable" arguments. This seems like a good opportunity to try and use those.

Edit: The CTParserBuilder parts should be done now in 2be58ac

For the ASTWithValidationInfo parts though we don't currently have an unstable_api module for cfgrammar, Do we want those parts of the api unstable/unofficial too?

ltratt · 2025-05-30T07:14:39Z

Thanks -- I think this is a good intermediate step towards future perfection! Please squash.

ratmice · 2025-05-30T13:12:34Z

cfgrammar/src/lib/mod.rs

 #![allow(clippy::unnecessary_wraps)]
 #![allow(clippy::upper_case_acronyms)]
-#![forbid(unsafe_code)]
+#![deny(unsafe_code)]


Change no longer needed.

ratmice · 2025-05-30T13:14:57Z

cfgrammar/src/lib/yacc/ast.rs


 impl GrammarAST {
-    pub fn new() -> GrammarAST {
+    pub(crate) fn new() -> GrammarAST {


This change of visibility was only included in this patch accidentally, I feel like it is good to consider whether we should do this, but probably doesn't belong in this series.

So I'd rather drop it from this PR.

This attempts to improve the status quo regarding issue softdevteam#484. Previously in order to build a grammar using multiple start rules you needed multiple copies of the grammar in the filesystem, each with a different start rule. This adds a mechanism for modifying the AST by cloning and changing the start rules, then building a parser from each clone. The `CTParserBuilder` methods being unstable and require a build-dependency `feature`.

ratmice · 2025-05-30T13:42:20Z

Squashed with the visibility modification, and unsafe_code changes omitted.
I squashed this into two commits, hoping that 5edbfc5 will also suffice as an example of usage

ltratt · 2025-05-30T14:22:20Z

I think we can mark this as undraft and merge.

ratmice · 2025-05-30T14:24:13Z

Sorry about that, it's no longer in draft now.

ratmice commented May 30, 2025

View reviewed changes

ratmice force-pushed the modified_ast branch from 2be58ac to 8373d9a Compare May 30, 2025 13:35

Test building parsers with modified start rules.

5edbfc5

ratmice force-pushed the modified_ast branch from 8373d9a to 5edbfc5 Compare May 30, 2025 13:36

ratmice marked this pull request as ready for review May 30, 2025 14:23

ratmice changed the title ~~Add unsafe fn for modifying GrammarAST~~ Allow building parsers from GrammarAST with modified start rules. May 30, 2025

ltratt added this pull request to the merge queue May 30, 2025

Merged via the queue into softdevteam:master with commit 72636f0 May 30, 2025
2 checks passed

ratmice deleted the modified_ast branch May 30, 2025 15:34

Allow building parsers from GrammarAST with modified start rules. #585

Allow building parsers from GrammarAST with modified start rules. #585

Uh oh!

Conversation

ratmice commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ratmice commented May 27, 2025

Uh oh!

rneswold commented May 27, 2025

Uh oh!

ltratt commented May 27, 2025

Uh oh!

ratmice commented May 27, 2025

Uh oh!

ratmice commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ratmice commented May 28, 2025

Uh oh!

ratmice commented May 28, 2025

Uh oh!

ltratt commented May 28, 2025

Uh oh!

ratmice commented May 28, 2025

Uh oh!

ltratt commented May 28, 2025

Uh oh!

ratmice commented May 28, 2025

Uh oh!

ratmice commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented May 30, 2025

Uh oh!

ratmice May 30, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice May 30, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice commented May 30, 2025

Uh oh!

ltratt commented May 30, 2025

Uh oh!

ratmice commented May 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Allow building parsers from `GrammarAST` with modified start rules. #585

Allow building parsers from `GrammarAST` with modified start rules. #585

ratmice commented May 26, 2025 •

edited

Loading

ratmice commented May 27, 2025 •

edited

Loading

ratmice commented May 28, 2025 •

edited

Loading