-
Notifications
You must be signed in to change notification settings - Fork 94
feat(core): handle struct-based UDT literals #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
133b704 to
e4f36ea
Compare
e4f36ea to
cef9798
Compare
cef9798 to
f3379c1
Compare
|
FYI, this PR makes no attempt to actually validate if the struct representation provided matches the definition in the yaml file. I think that this is the right thing to do, but it turned out to be slightly more complicated than I thought, as it involves threading an |
Support both opaque (google.protobuf.Any) and structured (Literal.Struct) encodings for user-defined type literals per Substrait spec. - Split UserDefinedLiteral into UserDefinedAny and UserDefinedStruct - Move type parameters to interface level for parameterized types - Comprehensive test coverage including roundtrip tests - Throw exception on unhandled struct-based representation in isthmus
1eb4fb4 to
f5b6341
Compare
| extensionCollector.getTypeReference(SimpleExtension.TypeAnchor.of(expr.urn(), expr.name())); | ||
| return lit( | ||
| bldr -> { | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception doesn't happen anymore because we don't parse the Any here. Instead, we have a reference to the pre-parsed proto directly.
| public ParameterizedType userDefined( | ||
| int ref, java.util.List<io.substrait.type.Type.Parameter> typeParameters) { | ||
| throw new UnsupportedOperationException( | ||
| "User defined types are not supported in Parameterized Types for now"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with the above, where we don't yet support ParamerizedType conversion.
| public DerivationExpression userDefined( | ||
| int ref, java.util.List<io.substrait.type.Type.Parameter> typeParameters) { | ||
| throw new UnsupportedOperationException( | ||
| "User defined types are not supported in Derivation Expressions for now"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also consistent with the above.
core/src/main/java/io/substrait/type/proto/TypeProtoConverter.java
Outdated
Show resolved
Hide resolved
| public RexNode visit(Expression.UserDefinedStructLiteral expr, Context context) | ||
| throws RuntimeException { | ||
| throw new UnsupportedOperationException( | ||
| "UserDefinedStructLiteral representation is not yet supported in Isthmus"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is #612
78488b4 to
e8cb862
Compare
vbarua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started taking a look at this and left some comments. Mostly looks reasonable. Want to come back and look at your tests with fresh eyes, and also think about parameterized types with a fresh 🧠.
| @Override | ||
| public abstract List<io.substrait.type.Type.Parameter> typeParameters(); | ||
|
|
||
| public abstract com.google.protobuf.Any value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release Notes
Capturing the value as an Any instead of a ByteString does feel nicer ✨
| * @see UserDefinedAnyLiteral | ||
| * @see UserDefinedStructLiteral | ||
| */ | ||
| interface UserDefinedLiteral extends Literal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release Notes
We should call out that we don't construct UserDefinedLiterals anymore.
| * parameters (like the {@code 10} in {@code VARCHAR<10>}). This interface provides a type-safe | ||
| * representation of all possible parameter kinds. | ||
| */ | ||
| interface Parameter {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the function of https://github.com/substrait-io/substrait-java/blob/main/core/src/main/java/io/substrait/function/ParameterizedType.java applicable here? It feels vaguely related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yes that is an interesting point. Looking into it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I understand it, ParameterizedType.java is used for representing abstract types with parameters in yaml files. Where as the Parameter above being introduced is actually a concrete argument passed into the type.
So for example, List<any1> could be a ParameterizedType, whereas List<int32> is a type with parameters [int32].
fea5921 to
77f0bd7
Compare
3814053 to
4b4f5ee
Compare
b2742da to
3732cf9
Compare
|
FYI, I have a WIP PR for implementing this in Isthmus but I split it into two PRs because the code there is more complicated. This keeps the PRs a bit smaller! |
|
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
core/src/main/java/io/substrait/expression/ExpressionCreator.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/ExpressionCreator.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public abstract List<io.substrait.type.Type.Parameter> typeParameters(); | ||
|
|
||
| public abstract com.google.protobuf.Any value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One observation/remark I would have here is that in other places we have been trying to not expose protobuf objects in the Substrait core POJO API like e.g. in AdvancedExtension we provide the empty Optimization and Enhancement interfaces and ask users of the Substrait Java SDK to implement protobuf conversion logic for those instead. At first glance it feels like we would be a little inconsistent if for user defined any literals we expose the protobuf any value directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Niels, this was a great point. I just pushed a commit addressing this by introducing the interface UserDefinedAnyValue. I'll leave the PR open for a little while to give you the opportunity to provide feedback on that component if you'd like. Otherwise, I'll merge it in later today. Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, after implementing the abstraction pattern similar to AdvancedExtension, I think we should actually defer the change you are suggesting. The abstraction adds an extra layer (UserDefinedAnyValue) without clear benefits. Users still need to work with com.google.protobuf.Any directly to construct values, and the indirection makes the API harder to understand. Since you mentioned this could be revisited later, let's keep it simple for now and use Any directly as the spec defines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, on that point I also wasn't sure how much additional value the additional layer of abstraction would bring.
| * Explains the Sustrait relation | ||
| * | ||
| * @param plan Subsrait relation | ||
| * @param rel Subsrait relation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: this fix is also in #642
nielspardon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm fine with the changes. Just not sure whether we want to revisit exposing the Any proto value directly in a follow-up.
Co-authored-by: Niels Pardon <[email protected]>
This reverts commit 76cd0f0.
Support both opaque (google.protobuf.Any) and
structured (Literal.Struct) encodings for user-defined type literals per Substrait spec.
Closes #611