-
Notifications
You must be signed in to change notification settings - Fork 38
Remove debug formatting for CTLexerBuilder src gen #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
FWIW, there are some more cleanups that can be done here, there are a few places we are encoding Options that could be migrated to A couple of these cases can be found with ripgrep, Edit: Looks like some/most of these occurrences aren't fixable, for various reasons, I fixed the ones I could Here we need the In this one we just lack the |
| pub struct CTLexerBuilder<'a, LexerTypesT: LexerTypes = DefaultLexerTypes<u32>> | ||
| where | ||
| LexerTypesT::StorageT: Debug + Eq + Hash, | ||
| LexerTypesT::StorageT: Debug + Eq + Hash + ToTokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this here, and the related bounds immediately below this are the most controversial aspects to this patch,
and the part that we need to consider most carefully.
The other thing we could do here is change this ToTokens to Display,
Changing it to ToTokens kind of just enables one minor cleanup, the QuoteOption(tok_id) cleanup in 18f1d9d
I think it would definitely be fine to consider using Display here instead of ToTokens and dropping that last cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing to note here, which I had mentioned in a previous comment on usize,
using ToTokens here also has the effect of including the type in the value e.g. this patch is currently encoding StorageT values as Some (5u32)), a change from Some(5) with Debug and Display.
Looking at the sources generated by the testsuite these StorageT values i'm seeing are all using u32.
So if we do go with ToTokens it would be good to try and exercise this in the testsuite in a way that uses another type here u8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a very strong opinion here, but I'm somewhat leaning towards liking the 1_u8 form of ToTokens.
I tested the codegen of ToTokens here, by changing the lrpar/examples/calc_actions/build.rs to use
CTLexerBuilder::<DefaultLexerTypes<u8>>::new_with_lexemet() and it makes sense to me that in theory it should be fine for StorageT to include the concrete type in literals, since kind of the point is that it has a concrete size known to the caller.
Other places where we rely on the can be converted into a usize losslessly we could potentially use a newtype around known integer types to print literals without the type suffix.
Either way, since we didn't remove the Debug suffix, at worst we can go back to using Debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with having the type suffix for literals.
One thing I haven't fully understood is: what types is ToTokens implemented on? I assume it's implemented on all primitive integers by quote? In that sense, this isn't going to be much of a breaking change unless a user does something odd like pass a newtype here: that would force them to know about the ToTokens trait and quote. In that sense, Display would be preferable, but I take your point in 18f1d9d that Display means we end up having to do our own string escaping, which is a bit nasty. On that basis, I'm fine using ToTokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically all the concete value types... bool, numerical, strings, option + reference types (minus Arc), some macro specific types that represent tokens.
It's sort of just a hodgepodge of various types, I don't see any reason why some things seem to be missing (like tuples, Arc)
The full list is here:
https://docs.rs/quote/latest/quote/trait.ToTokens.html#foreign-impls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as the integer types are there, I think we'll have covered nearly every sensible use that I can conceive of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one thing I guess I would say is that for the specific case of StorageT we don't have to worry about string escaping at all, I think it is also probably the most stable of all format printing we could likely expect format printing to be. So I don't think it is too far fetched to use ToTokens internally, but not add the bounds for StorageT and use Debug or Display just for that.
So to me ToTokens over Debug or Display for StorageT mostly revolves around whether we want to be able to include the type suffix in integer literals. It isn't a huge thing but maybe adds a little bit of type checking we don't currently get (for better or worse!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our plan is to use quote (and friends) for all code generation inside grmtools, I think the ToTokens bound makes sense: we'll have bought into that ecosystem, and might as well take advantage of it, and make clear that we've done so to users. If we don't plan on going on all the way with quote then perhaps we should just fall back on Display.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'd probably stick with the ToTokens bound then since it seems like using it throughout will allow us to take advantage of it more heavily, by implementing ToTokens for things that contain a StorageT.
Otherwise we're kind of doomed to a mixture of the quote ecosystem plus manually formatted values.
|
One more thing to consider here, which we aren't currently doing, somewhat to the detriment of the code generated is |
| pub struct CTLexerBuilder<'a, LexerTypesT: LexerTypes = DefaultLexerTypes<u32>> | ||
| where | ||
| LexerTypesT::StorageT: Debug + Eq + Hash, | ||
| LexerTypesT::StorageT: Debug + Eq + Hash + ToTokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with having the type suffix for literals.
One thing I haven't fully understood is: what types is ToTokens implemented on? I assume it's implemented on all primitive integers by quote? In that sense, this isn't going to be much of a breaking change unless a user does something odd like pass a newtype here: that would force them to know about the ToTokens trait and quote. In that sense, Display would be preferable, but I take your point in 18f1d9d that Display means we end up having to do our own string escaping, which is a bit nasty. On that basis, I'm fine using ToTokens.
|
I have a few stylistic comments, and a couple of "deeper but I tend to agree with the approach taken in the PR" comments. I think we can tidy up things and get this merged fairly quickly. |
|
Let me summarise where I think we are:
Does that sound correct? |
|
I believe so to both points, however one thing to note on the second point. I didn't actually remove the But yeah, I'd say the intent is there to migrate, as it seems to me like it should lead to cleaner code, but requires some experimentation to get the appropriate code organization. |
|
Looking forward to the migration! Please squash. |
|
Squashed. |
|
Should be fixed, will need another squash. |
|
Please squash. |
|
Squashed. |
I was looking through issues, trying to see if there was anything therein that would require a breaking change,
I couldn't remember why I didn't finish fully migrating away from debug formats for CTLexerBuilder leaving the issue open. There were two issues, one was encoding
Optiontypes, there was a quote issue I've referenced/borrowed some code from for this.The other one is (technically?) a breaking change technically because it required us to add a new bounds for
StorageT,though I think it should always be satisfied given the
AsPrimitivebounds.I went head and also did CTParserBuilder, so this one should fix #323 entirely.
This adds a dependency on the
proc_macro2crate, but technicallyquotealready depends upon it, so I don't think it would actually introduce any additional overhead.