-
Notifications
You must be signed in to change notification settings - Fork 189
docs: forbid masked_reference for lambda parameters #955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
96477b1 to
34ba3ff
Compare
| // ReferenceSegment.StructField. | ||
| OuterReference outer_reference = 5; | ||
| // Must use direct_reference and the direct_reference | ||
| // must be of type ReferenceSegment.StructField. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a completely artificial constraint. I am unsure whether or not we want to allow masked_references with LambdaParameterReferences.
This particular PR was prompted by substrait-io/substrait-go#183 which is adding lambda support to substrait-go.
I am not very familiar with MaskExpressions. I could imagine using them for lambda if we wanted to implement something like (a1,a2,a3,...,a1000) -> <a3,a49,a235> but that seems like an unrealistic use case. Furthermore, we could always construct that output struct by hand. Since it makes implementation easier, I opted to forbid MaskReference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of forbidding it in the spec, wouldn't it be better to simply call it out as unimplemented in substrait-go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, calling out in substrait-go is a good option although I am fine with this change as I am also not familiar with mask expression and we can relax this later as well.
Does anyone know systems using MaskExpression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tokoko I am okay with allowing it in the spec provided we can think of a legitimate use case for it. Otherwise, I would prefer to be defensive. If we allow, then we may have to answer questions in the future about what it means :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as linked in the PR description, the markdown docs already suggest that StructField is the appropriate way to make this reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some google search and the closest thing I found in the engines was duckdb allowing EXCLUDE inside nested expressions.. my search wasn't exactly thorough though. I guess my point is that whatever reasoning we use for disallowing masked_reference for lambda_parameters specifically could just as easily be used for disallowing (removing) masked_references from substrait entirely.
Having said that, I'm fine with going ahead here if it makes anyone's life easier on the implementation side. We can always come back if it turns out to be an issue.
| MaskExpression masked_reference = 2; | ||
| } | ||
|
|
||
| // The origin of the data being referenced. When this is a RootReference and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, this comment equally applies to OuterReference, so I opted to allow each field in the oneof have its own comment.
|
Closing this issue. Instead, I opened up #956 where we can continue the discussion. I am going to take the position that the use of |
The markdown documentation already implies that lambda parameters must be referenced with
direct_referenceandStructField, but the proto comments didn't explicitly specify this constraint. This PR clarifies the proto to match the documentation.Additionally, updated the
RootReference/OuterReferenceproto comments for consistency.