Skip to content

Conversation

@ratmice
Copy link
Collaborator

@ratmice ratmice commented Feb 22, 2025

I belive this should implement what we discussed in #349 for lrpar (It doesn't try to tackle lrlex yet)
I'm still not certain I like the syntax but it seemed not worth worrying about until I got something working at least.


This allows the yacckind value to be specified within a grammar. When using the YaccKind::SelfDescribing style the grammar must start with an initial %grmtools section, as documented in the variant.

For other YaccKind variants the %grmtools section is allowed (not required) and the YaccKind value given to YaccGrammar::new overrides one given in the %grmtools section.

Fixes #349

This allows the yacckind value to be specified within a grammar.
When using the `YaccKind::SelfDescribing` style the grammar must
start with an initial `%grmtools` section, as documented in the variant.

For other `YaccKind` variants the `%grmtools` section is allowed (not required)
and the `YaccKind` value given to `YaccGrammar::new` overrides one given in the
`%grmtools` section.

Fixes softdevteam#349
@ratmice
Copy link
Collaborator Author

ratmice commented Feb 23, 2025

One thing I meant to say is that the parsing code is most likely whitespace sensitive,
e.g. I believe it pretty much expects the format where { is followed by a newline and } is on it's own line like

%grmtools {
  ...
}

And would fail with something like %grmtools { yacckind Grmtools},
Similarly keys/values are all case sensitive, these seemed like things we could consider relaxing.
but It didn't seem absolutely necessary for an initial prototype.

Edit: It looks like the only case it wasn't handling correctly was the following which is fixed in 8d9520b

%grmtools
{ ... }

@ltratt
Copy link
Member

ltratt commented Feb 23, 2025

I think we would want the eventual parser to be whitespace insensitive, but it shouldn't be too hard? We just need to define how we want to map the human-writeable text to the internal format. We don't have to make it too "serde like" or even "Rust like".

[I'll try to do a proper review of this later today.]

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 23, 2025

Yeah, I don't think it should be too hard either -- it just isn't how our current parse_ws helpers behave (which expect newline, or expect no newline) and I didn't want to strike out on making separate relaxed function on the first approach.

Indeed I didn't do much or anything in the way of internal format, the key parsing just writes directly to the self.yacc_kind when needed in a ad-hoc way. A form of internal format might make sharing between lex/yacc easier, but also seemed like something that would need to be decided after this initial prototype stage.

Edit (One more thing I recall worth thinking about):

nimbleparse currently defaults to Original(GenericParseTree). With the code doing the defaulting in nimbleparse and outside of parser.rs.

Currently this patch errors rather than provide a default YaccKind, 2 approaches come to mind.
Either:

  • Instead of erroring due to missing yacc kind, fall back to a default inside parser.rs.
  • We could perhaps change the variant to SelfDescribing(default: Option<Box<YaccKind>>) which when a default is provided instead of causing a MissingYaccKind error uses the provided default that way outside code/nimbleparse can still make the decision regarding what to default to? (Unfortunately Box actually gets in the way of implementing the Copy trait.)

The latter option feels more tolerable to me (I went ahead and pushed an exploratory implementation of that), any opinions?

…arse.

Because of recursive types, this needs to be boxed, and because of box
this enum can no longer be `Copy`, so we had to drop the `Copy` trait
for `YaccKind` in order for this to work.
@ltratt
Copy link
Member

ltratt commented Feb 24, 2025

I've dumped in some detailed, but mostly not deep, comments.

One high-level thing that I haven't quite wrapped my head around is how we expect the user to use the SelfDescribing(default) case. For example, if I was a naive user, I wonder if I might expect to say I've got YaccKind::Original(...) case but if I add a %grmtools initial header then I expect that to override things and I might be a bit surprised if I'm told I should use SelfDescribing and then particularly if I find I probably want to use SelfDescribing(YaccKind::Orignal(...)) in case I later don't want the %grmtools bit?

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 24, 2025

I'm still waking up, I haven't managed to parse your confusion above so I don't quite understand the exact scenario given yet. I definitely agree that the number of "degrees of freedom" in the overriding can be confusing because it can be overridden "bottom up" where the %grmtools section takes precedence by passing YaccKind::SelfDescribing(...) to the constructor, and "top-down" where the YaccKind value passed to the constructor does for other values.

Overall my thoughts have been that the typical user shouldn't need to be aware of the YaccKind::SelfDescribing(...),
One of the primary motivations I had in doing this is that the errors when you run nimbleparse foo.{y,l} input and foo.y uses the Grmtools variant but the parser defaults to the Original the errors can get wonky like this because it is parsing with the default format, which is wrong for that calc example.

Unfortunately that isn't actually fixed by the patch, because for backwards compatibility it uses SelfDescribing(Original) still specifying the default. Rather than None. So we don't actually produce the "nice" error where it says I don't know the yacc kind.

My general thinking is that new code should typically use SelfDescribing(None), to encourage %grmtools sections to be added, and an error produced when missing. Then after they get added you no longer need to specify e.g. nimbleparse -y grmtools every time it just uses the format specified by the file.

There are a few reasons to want to specify values other than SelfDescribing(None), I tend to think they fall in two major cateogies (historical behavior, and flexibility). To give more exact reasons:

  1. They want to parse using some exact format regardless of what is specified in the file, such as YaccKind::Original specify that kind.
  2. They want to parse a grammar using any/all formats, but default to a specific format SelfDescribing(default).
  3. They want to parse files which don't have the %grmtools section (either of the above).
  4. They want to produce an error if %grmtools section is missing and only use value provided by the section SelfDescribing(None)

So, I think from the perspective of a user, SelfDescribing(None) should have the ability to produce the best error in the sense that it can produce MissingYaccKind, where defaulting to a format can produce confusing ones. Because it can unintentionally parse a format using the wrong kind.

That might not answer your specific scenario, I've got to go make some ☕ and read it again.

@ltratt
Copy link
Member

ltratt commented Feb 24, 2025

My general thinking is that new code should typically use SelfDescribing(None), to encourage %grmtools sections to be added, and an error produced when missing.

That's an interesting idea that hadn't occurred to me. I think I like it. Let me chew on it for a bit.

@ltratt
Copy link
Member

ltratt commented Feb 25, 2025

My general thinking is that new code should typically use SelfDescribing(None), to encourage %grmtools sections to be added, and an error produced when missing.

I agree with this and I tend to think this should become the default in the next major release i.e. we would no longer force API users to specify yacckind.

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 25, 2025

Cool, well I think I've dealt with all the issues that I had brought up after my initial commits, and feel like I'm satisfied with how it has turned out, unless anything comes up in another round of review.

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 25, 2025

My general thinking is that new code should typically use SelfDescribing(None), to encourage %grmtools sections to be added, and an error produced when missing.

I agree with this and I tend to think this should become the default in the next major release i.e. we would no longer force API users to specify yacckind.

So after my last comment I got to thinking I'm unsure how I should take this, I initially got to thinking this should mean the next time we consider making a breaking change we should make the yacckind fields that are required Option<YaccKind>.

But I later considered that I could take it to mean that e.g. instead of adding a new variant, we could do much the same,
e.g. None becomes SelfDescribing(None) and SelfDescribing(Some(default)) becomes Some(yacckind)?

I do think skipping the add a new variant and moving right into the breaking change does have the benefit of getting rid of the needed Box, because it no longer needs to be recursive, so perhaps it is better? I don't feel like I have a strong opinion on this.

@ltratt
Copy link
Member

ltratt commented Feb 25, 2025

Long and short: whereas in build.rs we force users to do something like ctp.yacckind(YaccKind::Grmtools) I'd like them to be able to avoid that call entirely, assuming they have a %grmtools declaration in the grammar itself.

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 25, 2025

Ahh yeah I think I see,

That isn't too hard of a task (e.g. the builder code already works with Option<YaccKind> internally) it's that they call YaccGrammar::new which requires a YaccKind.

So I think the simplest change is for that is turning a Builder None value into SelfDescribing(None).
I just wasn't sure if we didn't want to take the opportunity to thread Option<YaccKind> all throughout YaccGrammar::new instead (that would perhaps lead to a cleaner data structure, but also would be a rather large breaking change affecting e.g. all of test code).

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 25, 2025

I went ahead and made that change relaxing the builder requirements.

@ltratt
Copy link
Member

ltratt commented Feb 25, 2025

I just wasn't sure if we didn't want to take the opportunity to thread Option all throughout YaccGrammar::new instead (that would perhaps lead to a cleaner data structure, but also would be a rather large breaking change affecting e.g. all of test code).

I must admit I haven't quite managed to visualise what this would do/look like in my head.

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 25, 2025

I just wasn't sure if we didn't want to take the opportunity to thread Option all throughout YaccGrammar::new instead (that would perhaps lead to a cleaner data structure, but also would be a rather large breaking change affecting e.g. all of test code).

I must admit I haven't quite managed to visualise what this would do/look like in my head.

I think was most likely wrong that it would be an option, but

Basically we just take the 3 methods new, new_with_storaget, new_from...

And change all the YaccKind parameters therein to Option<YaccKind>.

The reason it doesn't seem like a viable approach is because we would have to pick between the top-down and the bottom-up overriding, and it seems like we need both for different use cases.

I suppose that the way it would work is if we did Option<YaccKind> we'd still need SelfDescribing(Box<YaccKind>) rather than SelfDescribing(Option<Box<YaccKind>>) which doesn't seem like it is actually gaining us much.

Edit:
Said a different way, to truly get rid of all forms of the SelfDescribing variant, while keeping the same behavior,
we'd have to use something like the itertools -- Either type as the yacckind value in YaccGrammar::new... It would end up something like fn new(yacckind: Option<Either<YaccKind, YaccKind>>),

The mapping between the Either form into the YaccKind this patch implements would be like the following:

None => SelfDescribing(None)
Some(Either::Left(x)) => x
Some(Either::Right(x)) => SelfDescribing(Some(x))

It does allow us to get rid of the Box, but it's also quite a bit more complex type I think.
I guess, we could simplify that slightly to a type which maps to type UniformEither<T> = Either<T, T>;.
But that is the gist of how I think it would essentially work.

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 25, 2025

I think I'll take a bit of time to make a separate exploratory PR based on the edits in above comment,
Because there are potentially some benefits, e.g. we can get rid of the box, and subsequently keep the Copy impl.

Edit: But just dipping my toes in, it is sufficient to say it is way more involved than this patch

let yacc_y_path = PathBuf::from(&matches.free[1]);
let yacc_src = read_file(&yacc_y_path);
let ast_validation = ASTWithValidityInfo::new(yacckind, &yacc_src);
let ast_validation = ASTWithValidityInfo::new(yacckind.clone(), &yacc_src);
Copy link
Collaborator Author

@ratmice ratmice Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, working on the exploratory PR for the other approach, it became apparent that there is actually an issue here,
It's actually quite a major issue!

Basically the clone calls are keeping around a YaccKind::SelfDescribing, and because YaccParser.yacc_kind isn't pub, it never gets read back out after parser initialization. The testsuite isn't actually picking it up yet,

We don't seem to have tests for nimbleparse, and I haven't implemented the cttests since modifying CTParserBuilder yet.

After adding in the %grmtools {yacckind Grmtools} section to calc.y
echo -n "1+1" | target/debug/nimbleparse lrpar/examples/calc_actions/src/calc.{l,y} -

thread 'main' panicked at cfgrammar/src/lib/yacc/grammar.rs:154:17:
not implemented: Concrete YaccKind must be known by this point
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

So, that is definitely a hurdle that this PR needs to overcome.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed in 98b6ee2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cttests added, and fixed similarly in 23232f8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about nimbleparse. We could at least test it minimally in .buildbot.sh?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to pub(crate) the yacc_kind: YaccKind field?

Copy link
Collaborator Author

@ratmice ratmice Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a pub(crate) fn yacc_kind(&self) -> &YaccKind be okay?
We need it to escape, along side this YaccParser::ast() function which is pub(crate)?

Edit: Another option might be having something like that ast function that returns return a (YaccKind, GrammarAST) tuple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and tried that later option plus renaming the ast function which seemed like the cleanest to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I had a crisis of conscience in regard to that, e.g. whether what I did was right vs adding a yacc_kind field to GrammarAST. So I also tried adding it to GrammarAST, but the takeaway was that currently GrammarAST is completely devoid of anything related syntax. You can even construct one by hand with no syntax at all. Where YaccKind is kind of YaccParser specific thing. So my takeaway from that experiment was that I still think prefer that finish function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a pub(crate) fn yacc_kind(&self) -> &YaccKind be okay?

Yep!

@ratmice ratmice marked this pull request as draft February 25, 2025 17:12
@ratmice
Copy link
Collaborator Author

ratmice commented Feb 26, 2025

Here is the initial results of that other exploratory branch,
I didn't use Option<Either<YaccKind,YaccKind>> but made an Ad-hoc enum with no generics that is roughly equivalent.
I haven't added all of the parsing code from this branch yet (so it doesn't pass the testsuite)

But given that it is a much more breaking change, I wanted to post it for feedback on whether you think that approach is beneficial enough to justify the breakage, before actually going through the effort of making it fully functional.

https://github.com/ratmice/grmtools/tree/grmtools_directive_2

@ltratt
Copy link
Member

ltratt commented Feb 26, 2025

I think the key bit is:

pub enum YaccKindResolver {
    /// Uses exactly the given yacc kind
    Exactly(YaccKind),
    /// Attempt to read yacc kind from the `%grmtools` section,
    /// otherwise falls back to using the given yacc kind.
    Fallback(YaccKind),
    /// Attempt to read the yacc kind from the `%grmtools` section
    /// otherwise throws a `MissingYaccKind` error.
    SelfDescribing,
}

I can definitely bikeshed the names, but the intent is fairly clear. I'd perhaps call them something like:

enum YaccKindResolver {
    /// The user can specify `%grmtools` in their grammar but it differs from this `YaccKind`, it's an error
    Force(YaccKind),
    /// Use `YaccKind` if the user doesn't specify `%grmtools` in their grammar
    Default(YaccKind),
    /// The user must specify `%grmtools` in their grammars or we throw an error
    NoDefault
}

but that's a relatively minor tweak.

Obvious question: what would the default be? NoDefault or ... ?

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 26, 2025

Obvious question: what would the default be? NoDefault or ... ?

Well, I've used NoDefault for CTParserBuilder but nimbleparse (in current master branch) has the behavior of Force(YaccKind::Original(GenericParseTree)), after this patch I changed it to use Default(YaccKind::Original(GenericParseTree)).

i.e. nimbleparse and ctparserbuilder currently have different defaults in this patch and that other branch.

For nimbleparse though I just did that because that is what is most compatible with the current behavior.
Making the behavior NoDefault across the board leads to the best error messages.
So I would lean towards that if changing the behavior of nimbleparse is in play or we want the behavior to be the same across CTParserBuilder and nimbleparse

@ltratt
Copy link
Member

ltratt commented Feb 26, 2025

Making the behavior NoDefault across the board leads to the best error messages.
So I would lean towards that if changing the behavior of nimbleparse is in play or we want the behavior to be the same across CTParserBuilder and nimbleparse

I am inclined to agree. Yes, it's a breaking change, but it's a pretty sensible one. Even better, we can presumably make it so that lrpar users specifying yacckind(...) have a zero-change upgrade path. Basically the only people really affected are nimbleparse users, and I think changing that is more acceptable than the library API.

Go ahead!

@ratmice
Copy link
Collaborator Author

ratmice commented Feb 26, 2025

Sounds good, I'll go ahead and close this PR now then and open a new one once I've finished merging all the parsing code from this one, and updated it based on the things we've discussed.

@ratmice ratmice closed this Feb 26, 2025
@ratmice ratmice deleted the self_describing_yacc_kind branch February 28, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding a %grammar-kind declaration?

2 participants