Remove debug formatting for CTLexerBuilder src gen #494

ratmice · 2025-03-01T16:16:29Z

I was looking through issues, trying to see if there was anything therein that would require a breaking change,
I couldn't remember why I didn't finish fully migrating away from debug formats for CTLexerBuilder leaving the issue open. There were two issues, one was encoding Option types, there was a quote issue I've referenced/borrowed some code from for this.

The other one is (technically?) a breaking change technically because it required us to add a new bounds for StorageT,
though I think it should always be satisfied given the AsPrimitive bounds.

I went head and also did CTParserBuilder, so this one should fix #323 entirely.

This adds a dependency on the proc_macro2 crate, but technically quote already depends upon it, so I don't think it would actually introduce any additional overhead.

lrlex/Cargo.toml

lrpar/src/lib/ctbuilder.rs

ratmice · 2025-03-01T18:33:07Z

FWIW, there are some more cleanups that can be done here, there are a few places we are encoding Options that could be migrated to QuoteOption, then they won't have to match

A couple of these cases can be found with ripgrep, rg 'format!\("Some'

Edit: Looks like some/most of these occurrences aren't fixable, for various reasons, I fixed the ones I could

let n = match r.name() {
    Some(ref n) => format!("Some({}.to_string())", quote!(#n)),
    None => "None".to_owned(),
};

Here we need the to_string() call in the output string, quote! wants to encode this as Some("String") without the to_string() call.

let target_state = match &r.target_state() {
   Some((id, op)) => format!("Some(({}, {}))", id, quote!(#op)),
   None => "None".to_owned(),
}

In this one we just lack the ToTokens trait impl for tuples, I forget the specific rules on whether we can impl it ourselves or not, and whether the orphan rules allow us to implement it. Either way, seemed more trouble than it was worth.

lrpar/src/lib/ctbuilder.rs

ratmice · 2025-03-02T18:30:01Z

lrlex/src/lib/ctbuilder.rs

 pub struct CTLexerBuilder<'a, LexerTypesT: LexerTypes = DefaultLexerTypes<u32>>
 where
-    LexerTypesT::StorageT: Debug + Eq + Hash,
+    LexerTypesT::StorageT: Debug + Eq + Hash + ToTokens,


To me this here, and the related bounds immediately below this are the most controversial aspects to this patch,
and the part that we need to consider most carefully.

The other thing we could do here is change this ToTokens to Display,

Changing it to ToTokens kind of just enables one minor cleanup, the QuoteOption(tok_id) cleanup in 18f1d9d

I think it would definitely be fine to consider using Display here instead of ToTokens and dropping that last cleanup?

One more thing to note here, which I had mentioned in a previous comment on usize,
using ToTokens here also has the effect of including the type in the value e.g. this patch is currently encoding StorageT values as Some (5u32)), a change from Some(5) with Debug and Display.

Looking at the sources generated by the testsuite these StorageT values i'm seeing are all using u32.
So if we do go with ToTokens it would be good to try and exercise this in the testsuite in a way that uses another type here u8.

I don't have a very strong opinion here, but I'm somewhat leaning towards liking the 1_u8 form of ToTokens.

I tested the codegen of ToTokens here, by changing the lrpar/examples/calc_actions/build.rs to use
CTLexerBuilder::<DefaultLexerTypes<u8>>::new_with_lexemet() and it makes sense to me that in theory it should be fine for StorageT to include the concrete type in literals, since kind of the point is that it has a concrete size known to the caller.

Other places where we rely on the can be converted into a usize losslessly we could potentially use a newtype around known integer types to print literals without the type suffix.

Either way, since we didn't remove the Debug suffix, at worst we can go back to using Debug?

I'm fine with having the type suffix for literals.

One thing I haven't fully understood is: what types is ToTokens implemented on? I assume it's implemented on all primitive integers by quote? In that sense, this isn't going to be much of a breaking change unless a user does something odd like pass a newtype here: that would force them to know about the ToTokens trait and quote. In that sense, Display would be preferable, but I take your point in 18f1d9d that Display means we end up having to do our own string escaping, which is a bit nasty. On that basis, I'm fine using ToTokens.

It's basically all the concete value types... bool, numerical, strings, option + reference types (minus Arc), some macro specific types that represent tokens.

It's sort of just a hodgepodge of various types, I don't see any reason why some things seem to be missing (like tuples, Arc)

The full list is here:
https://docs.rs/quote/latest/quote/trait.ToTokens.html#foreign-impls

So long as the integer types are there, I think we'll have covered nearly every sensible use that I can conceive of.

The one thing I guess I would say is that for the specific case of StorageT we don't have to worry about string escaping at all, I think it is also probably the most stable of all format printing we could likely expect format printing to be. So I don't think it is too far fetched to use ToTokens internally, but not add the bounds for StorageT and use Debug or Display just for that.

So to me ToTokens over Debug or Display for StorageT mostly revolves around whether we want to be able to include the type suffix in integer literals. It isn't a huge thing but maybe adds a little bit of type checking we don't currently get (for better or worse!).

If our plan is to use quote (and friends) for all code generation inside grmtools, I think the ToTokens bound makes sense: we'll have bought into that ecosystem, and might as well take advantage of it, and make clear that we've done so to users. If we don't plan on going on all the way with quote then perhaps we should just fall back on Display.

Makes sense, I'd probably stick with the ToTokens bound then since it seems like using it throughout will allow us to take advantage of it more heavily, by implementing ToTokens for things that contain a StorageT.
Otherwise we're kind of doomed to a mixture of the quote ecosystem plus manually formatted values.

ratmice · 2025-03-02T19:28:31Z

One more thing to consider here, which we aren't currently doing, somewhat to the detriment of the code generated is
using prettyplease to format the quote! generated code, as described here in the quote! docs

lrlex/src/lib/ctbuilder.rs

ltratt · 2025-03-03T08:11:12Z

lrlex/src/lib/ctbuilder.rs

 pub struct CTLexerBuilder<'a, LexerTypesT: LexerTypes = DefaultLexerTypes<u32>>
 where
-    LexerTypesT::StorageT: Debug + Eq + Hash,
+    LexerTypesT::StorageT: Debug + Eq + Hash + ToTokens,


I'm fine with having the type suffix for literals.

One thing I haven't fully understood is: what types is ToTokens implemented on? I assume it's implemented on all primitive integers by quote? In that sense, this isn't going to be much of a breaking change unless a user does something odd like pass a newtype here: that would force them to know about the ToTokens trait and quote. In that sense, Display would be preferable, but I take your point in 18f1d9d that Display means we end up having to do our own string escaping, which is a bit nasty. On that basis, I'm fine using ToTokens.

lrlex/src/lib/ctbuilder.rs

lrpar/src/lib/ctbuilder.rs

ltratt · 2025-03-03T08:17:02Z

I have a few stylistic comments, and a couple of "deeper but I tend to agree with the approach taken in the PR" comments. I think we can tidy up things and get this merged fairly quickly.

ltratt · 2025-03-04T08:48:27Z

Let me summarise where I think we are:

This PR is complete and ready for squashing/merging.
By adding the ToTokens bound we are implicitly saying "we think we should move the code generation part of grmtools from its current stringy approach to a quote-and-friends approach".

Does that sound correct?

ratmice · 2025-03-04T09:11:45Z

I believe so to both points, however one thing to note on the second point. I didn't actually remove the Debug bounds, which allows us to revert back entirely to string printing for StorageT, so we could fall back to that for any reason.
In theory Debug printing of StorageT is still going to be useful for debugging, so it is not being kept around merely just-in-case.

But yeah, I'd say the intent is there to migrate, as it seems to me like it should lead to cleaner code, but requires some experimentation to get the appropriate code organization.

ltratt · 2025-03-04T09:21:00Z

Looking forward to the migration!

Please squash.

ratmice · 2025-03-04T09:49:52Z

Squashed.

ratmice · 2025-03-04T10:39:06Z

Should be fixed, will need another squash.

ltratt · 2025-03-04T17:25:54Z

Please squash.

ratmice · 2025-03-04T18:21:51Z

Squashed.

ratmice assigned ratmice and ltratt Mar 1, 2025

ratmice commented Mar 1, 2025

View reviewed changes

lrlex/Cargo.toml Outdated Show resolved Hide resolved

ratmice commented Mar 1, 2025

View reviewed changes

lrpar/src/lib/ctbuilder.rs Show resolved Hide resolved

ratmice commented Mar 1, 2025

View reviewed changes

lrpar/src/lib/ctbuilder.rs Show resolved Hide resolved

ratmice commented Mar 1, 2025

View reviewed changes

lrpar/src/lib/ctbuilder.rs Show resolved Hide resolved

ratmice commented Mar 2, 2025

View reviewed changes

ratmice commented Mar 3, 2025

View reviewed changes

lrlex/src/lib/ctbuilder.rs Outdated Show resolved Hide resolved

ratmice marked this pull request as draft March 3, 2025 06:59

ltratt reviewed Mar 3, 2025

View reviewed changes

ratmice marked this pull request as ready for review March 4, 2025 09:13

ratmice force-pushed the lrlex_323 branch from 8dfa2cb to 4470e22 Compare March 4, 2025 09:49

ltratt added this pull request to the merge queue Mar 4, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 4, 2025

ratmice added 2 commits March 4, 2025 10:20

Change CTLexerBuilder src gen to use the quote crate.

d70b294

Change CTParserBuilder src gen to use the quote crate.

3641e71

ratmice force-pushed the lrlex_323 branch from 83c5cec to 3641e71 Compare March 4, 2025 18:21

ltratt added this pull request to the merge queue Mar 4, 2025

Merged via the queue into softdevteam:master with commit 5135a01 Mar 4, 2025
2 checks passed

ratmice deleted the lrlex_323 branch March 4, 2025 20:06

Remove debug formatting for CTLexerBuilder src gen #494

Remove debug formatting for CTLexerBuilder src gen #494

Uh oh!

Conversation

ratmice commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ratmice commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice commented Mar 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ltratt commented Mar 3, 2025

Uh oh!

ltratt commented Mar 4, 2025

Uh oh!

ratmice commented Mar 4, 2025

Uh oh!

ltratt commented Mar 4, 2025

Uh oh!

ratmice commented Mar 4, 2025

Uh oh!

Uh oh!

ratmice commented Mar 4, 2025

Uh oh!

ltratt commented Mar 4, 2025

Uh oh!

ratmice commented Mar 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ratmice commented Mar 1, 2025 •

edited

Loading

ratmice commented Mar 1, 2025 •

edited

Loading