Skip to content

Conversation

@ratmice
Copy link
Collaborator

@ratmice ratmice commented Mar 1, 2025

I was looking through issues, trying to see if there was anything therein that would require a breaking change,
I couldn't remember why I didn't finish fully migrating away from debug formats for CTLexerBuilder leaving the issue open. There were two issues, one was encoding Option types, there was a quote issue I've referenced/borrowed some code from for this.

The other one is (technically?) a breaking change technically because it required us to add a new bounds for StorageT,
though I think it should always be satisfied given the AsPrimitive bounds.

I went head and also did CTParserBuilder, so this one should fix #323 entirely.

This adds a dependency on the proc_macro2 crate, but technically quote already depends upon it, so I don't think it would actually introduce any additional overhead.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 1, 2025

FWIW, there are some more cleanups that can be done here, there are a few places we are encoding Options that could be migrated to QuoteOption, then they won't have to match

A couple of these cases can be found with ripgrep, rg 'format!\("Some'

Edit: Looks like some/most of these occurrences aren't fixable, for various reasons, I fixed the ones I could

let n = match r.name() {
    Some(ref n) => format!("Some({}.to_string())", quote!(#n)),
    None => "None".to_owned(),
};

Here we need the to_string() call in the output string, quote! wants to encode this as Some("String") without the to_string() call.

let target_state = match &r.target_state() {
   Some((id, op)) => format!("Some(({}, {}))", id, quote!(#op)),
   None => "None".to_owned(),
}

In this one we just lack the ToTokens trait impl for tuples, I forget the specific rules on whether we can impl it ourselves or not, and whether the orphan rules allow us to implement it. Either way, seemed more trouble than it was worth.

pub struct CTLexerBuilder<'a, LexerTypesT: LexerTypes = DefaultLexerTypes<u32>>
where
LexerTypesT::StorageT: Debug + Eq + Hash,
LexerTypesT::StorageT: Debug + Eq + Hash + ToTokens,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this here, and the related bounds immediately below this are the most controversial aspects to this patch,
and the part that we need to consider most carefully.

The other thing we could do here is change this ToTokens to Display,

Changing it to ToTokens kind of just enables one minor cleanup, the QuoteOption(tok_id) cleanup in 18f1d9d

I think it would definitely be fine to consider using Display here instead of ToTokens and dropping that last cleanup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing to note here, which I had mentioned in a previous comment on usize,
using ToTokens here also has the effect of including the type in the value e.g. this patch is currently encoding StorageT values as Some (5u32)), a change from Some(5) with Debug and Display.

Looking at the sources generated by the testsuite these StorageT values i'm seeing are all using u32.
So if we do go with ToTokens it would be good to try and exercise this in the testsuite in a way that uses another type here u8.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a very strong opinion here, but I'm somewhat leaning towards liking the 1_u8 form of ToTokens.

I tested the codegen of ToTokens here, by changing the lrpar/examples/calc_actions/build.rs to use
CTLexerBuilder::<DefaultLexerTypes<u8>>::new_with_lexemet() and it makes sense to me that in theory it should be fine for StorageT to include the concrete type in literals, since kind of the point is that it has a concrete size known to the caller.

Other places where we rely on the can be converted into a usize losslessly we could potentially use a newtype around known integer types to print literals without the type suffix.

Either way, since we didn't remove the Debug suffix, at worst we can go back to using Debug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having the type suffix for literals.

One thing I haven't fully understood is: what types is ToTokens implemented on? I assume it's implemented on all primitive integers by quote? In that sense, this isn't going to be much of a breaking change unless a user does something odd like pass a newtype here: that would force them to know about the ToTokens trait and quote. In that sense, Display would be preferable, but I take your point in 18f1d9d that Display means we end up having to do our own string escaping, which is a bit nasty. On that basis, I'm fine using ToTokens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically all the concete value types... bool, numerical, strings, option + reference types (minus Arc), some macro specific types that represent tokens.

It's sort of just a hodgepodge of various types, I don't see any reason why some things seem to be missing (like tuples, Arc)

The full list is here:
https://docs.rs/quote/latest/quote/trait.ToTokens.html#foreign-impls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as the integer types are there, I think we'll have covered nearly every sensible use that I can conceive of.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing I guess I would say is that for the specific case of StorageT we don't have to worry about string escaping at all, I think it is also probably the most stable of all format printing we could likely expect format printing to be. So I don't think it is too far fetched to use ToTokens internally, but not add the bounds for StorageT and use Debug or Display just for that.

So to me ToTokens over Debug or Display for StorageT mostly revolves around whether we want to be able to include the type suffix in integer literals. It isn't a huge thing but maybe adds a little bit of type checking we don't currently get (for better or worse!).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our plan is to use quote (and friends) for all code generation inside grmtools, I think the ToTokens bound makes sense: we'll have bought into that ecosystem, and might as well take advantage of it, and make clear that we've done so to users. If we don't plan on going on all the way with quote then perhaps we should just fall back on Display.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'd probably stick with the ToTokens bound then since it seems like using it throughout will allow us to take advantage of it more heavily, by implementing ToTokens for things that contain a StorageT.
Otherwise we're kind of doomed to a mixture of the quote ecosystem plus manually formatted values.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 2, 2025

One more thing to consider here, which we aren't currently doing, somewhat to the detriment of the code generated is
using prettyplease to format the quote! generated code, as described here in the quote! docs

@ratmice ratmice marked this pull request as draft March 3, 2025 06:59
pub struct CTLexerBuilder<'a, LexerTypesT: LexerTypes = DefaultLexerTypes<u32>>
where
LexerTypesT::StorageT: Debug + Eq + Hash,
LexerTypesT::StorageT: Debug + Eq + Hash + ToTokens,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having the type suffix for literals.

One thing I haven't fully understood is: what types is ToTokens implemented on? I assume it's implemented on all primitive integers by quote? In that sense, this isn't going to be much of a breaking change unless a user does something odd like pass a newtype here: that would force them to know about the ToTokens trait and quote. In that sense, Display would be preferable, but I take your point in 18f1d9d that Display means we end up having to do our own string escaping, which is a bit nasty. On that basis, I'm fine using ToTokens.

@ltratt
Copy link
Member

ltratt commented Mar 3, 2025

I have a few stylistic comments, and a couple of "deeper but I tend to agree with the approach taken in the PR" comments. I think we can tidy up things and get this merged fairly quickly.

@ltratt
Copy link
Member

ltratt commented Mar 4, 2025

Let me summarise where I think we are:

  1. This PR is complete and ready for squashing/merging.
  2. By adding the ToTokens bound we are implicitly saying "we think we should move the code generation part of grmtools from its current stringy approach to a quote-and-friends approach".

Does that sound correct?

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 4, 2025

I believe so to both points, however one thing to note on the second point. I didn't actually remove the Debug bounds, which allows us to revert back entirely to string printing for StorageT, so we could fall back to that for any reason.
In theory Debug printing of StorageT is still going to be useful for debugging, so it is not being kept around merely just-in-case.

But yeah, I'd say the intent is there to migrate, as it seems to me like it should lead to cleaner code, but requires some experimentation to get the appropriate code organization.

@ratmice ratmice marked this pull request as ready for review March 4, 2025 09:13
@ltratt
Copy link
Member

ltratt commented Mar 4, 2025

Looking forward to the migration!

Please squash.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 4, 2025

Squashed.

@ltratt ltratt added this pull request to the merge queue Mar 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 4, 2025
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 4, 2025

Should be fixed, will need another squash.

@ltratt
Copy link
Member

ltratt commented Mar 4, 2025

Please squash.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 4, 2025

Squashed.

@ltratt ltratt added this pull request to the merge queue Mar 4, 2025
Merged via the queue into softdevteam:master with commit 5135a01 Mar 4, 2025
2 checks passed
@ratmice ratmice deleted the lrlex_323 branch March 4, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove debug formatting in non-debug locations

2 participants