-
Notifications
You must be signed in to change notification settings - Fork 26
feat: add lambda support #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add lambda support #183
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #183 +/- ##
==========================================
+ Coverage 68.79% 69.09% +0.29%
==========================================
Files 45 46 +1
Lines 10367 10576 +209
==========================================
+ Hits 7132 7307 +175
- Misses 2886 2912 +26
- Partials 349 357 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
expr/lambda.go
Outdated
| // Recurses into all descendants (e.g., Cast(FieldRef), ScalarFunction(Cast(FieldRef))). | ||
| func resolveLambdaParamTypes(body Expression, params *types.StructType, outerParams []*types.StructType) Expression { | ||
| // Handle if body itself is a nested lambda - must check BEFORE Visit | ||
| // because Lambda.Visit would bypass our nested lambda context handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambda.Visit(callback) internally calls callback(lambda.Body), which skips the current Lambda wrapper entirely. This loses the context we need to resolve stepsOut>0 references. Therefore, we need to handle if the body is a lambda first.
expr/lambda.go
Outdated
| } | ||
|
|
||
| // Resolve types for FieldReferences with LambdaParameterReference roots | ||
| resolvedBody := resolveLambdaParamTypes(body, b.parameters, outerParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After constructing the lambda, we walk the expression tree and explicitly resolve types by looking up params.Types[fieldIndex] for each lambda parameter reference. For nested lambdas, we pass down the outer lambda's parameters so stepsOut > 0 references can also be resolved.
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure where exactly to say this, but consider reducing the number of comments across all of the added code. In my view, the ideal code is self-documenting, i.e. the code should be optimized for readability. If the intent is clear, then no comments necessary. However, for anything tricky or for docstrings, comments are absolutely essential.
I haven't done a full review yet but here are a couple of things to consider. Great progress! 🚀
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
expr/builder.go
Outdated
| targetParams := lpb.b.lambdaContext[len(lpb.b.lambdaContext)-1-int(lpb.stepsOut)] | ||
|
|
||
| // Validate that the parameter exists by checking field index for StructFieldRef | ||
| if structRef, ok := lpb.ref.(*StructFieldRef); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline, but I think this ref must be a StructFieldRef, though the spec doesn't do a great job of making that clear in the proto messages. Investigating right now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| out := *f | ||
| out.Nullability = n | ||
| return &out | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is not a deep copy, so the same ParameterTypes is used in the implicit parameter and the returned parameter. Is that safe to do in this context? It might be, I genuinely am unsure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StructType also does a shallow copy. Maybe @vbarua can offer context on if this is the intended semantics of the WithNullability function? I don't think you wrote this code, but maybe you have historical context 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment just says:
// WithNullability returns a copy of this type but with
// the nullability set to the passed in valueYes, it is a copy... but unclear if it is meant to be a deep copy or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be very strange if you had a reason to alter the parameters of the type after it had been constructed, but it is probably worth adding a comment somewhere about this invariant if there isn't one already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I thought this function was a bit strange too. This definition was mostly me just following how previous types have defined this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe making a deep copy is too expensive for a deeply nested struct?
Co-authored-by: Ben Bellick <[email protected]>
…substrait-go into slim/adding-lambda-support
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
…substrait-go into slim/adding-lambda-support
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor cosmetic things, but the implementation itself is awesome 💯
Going through the tests now, but wanted to dump comments so that you can work on them in parallel.
| // the current lambda to find the target parameter (0 = current lambda, 1 = outer lambda, etc.). | ||
| // | ||
| // The Build method validates the stepsOut value and field index against the | ||
| // available lambda context, and resolves the reference type automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a discussion with @vbarua, I believe we may end up leaving it in the spec for now that you can use a MaskedExpression reference with lambdas, though we don't know what that means. Thus, this implementation is completely sufficient for now, we can just say we don't build that yet. But it will be good to call it out, specifically linking to an issue in upstream about clarification. I haven't made that issue, but when it exists, we should link it in these docs.
expr/builder.go
Outdated
|
|
||
| // Validate that the parameter field index exists | ||
| if int(lpb.ref.Field) >= len(targetParams.Types) { | ||
| return nil, fmt.Errorf("%w: lambda body references parameter %d but lambda only has %d parameters", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention something about the steps out here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, it could be ambiguous from the error message which one we are talking about. Let's also make sure some test captures that this info is present in the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a better error message to include the steps out value. I also updated the builder error tests to search for this info in the error message.
| } | ||
| case *proto.Expression_Lambda_: | ||
| if et.Lambda.Parameters == nil { | ||
| return nil, fmt.Errorf("%w: lambda parameters cannot be nil", substraitgo.ErrInvalidExpr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is totally fine, but it is a bummer that we use ErrInvalidExpr throughout this file already. IMO, there should be ErrInvalidExprBuilder and ErrInvalidExprProto or something like that. But that's for a later PR 😅
types/func_type_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually drop this file and move it to type_test.go? We can add the roundtrip tests as just additional cases in:
substrait-go/types/types_test.go
Line 61 in 579861e
| func TestTypeRoundtrip(t *testing.T) { |
I think there is similarly already also a test for type short names that we can add one more case too. Maybe one for the long name though I haven't looked.
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
Co-authored-by: Ben Bellick <[email protected]>
…substrait-go into slim/adding-lambda-support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library doesn't have the facilities to catch these yet, but I think all of the JSON tests here are syntactically correct, but semantically meaningless. To simplify the discussion, I am using a dummy explain format.
This plan here is basically the following thing:
Root
Project
expressions: [λ($0: i32) -> 42]
input: Read(virtual_table [[0]])
However, I think what you intend is something like:
Root
Project
expressions: [transform(field_ref(0), λ($0: i32) -> 42)]
input: Read(
virtual_table [[[0]]],
schema=[values: list<i32>]
)
The difference here is that in the first one, the expression just evaluates to a lambda. We may technically allows lambdas as column values... but I'm not sure about that. But the second says to apply the lambda.
I also think these don't all need to be virtual tables, and it might simplify them to just make a read rel of some made up table:
Root
Project
expressions: [transform(field_ref(0), λ($0: i32) -> 42)]
input: Read(
namedTable="test_table",
schema=[values: list<i32>]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, using transform means that we are testing that functions_list.yaml is loaded correctly. However, it seems like it isn't working locally for me. This YAML file must be missing somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all that other stuff is said and done though, then I think this is a possible form of that basic plan:
{
"version": {"majorNumber": 0, "minorNumber": 79},
"extensionUrns": [
{
"extensionUrnAnchor": 1,
"urn": "extension:io.substrait:functions_list"
}
],
"extensions": [
{
"extensionFunction": {
"extensionUrnReference": 1,
"functionAnchor": 1,
"name": "transform"
}
}
],
"relations": [{
"root": {
"input": {
"project": {
"common": {"direct": {}},
"input": {
"read": {
"common": {"direct": {}},
"baseSchema": {
"names": ["values"],
"struct": {
"nullability": "NULLABILITY_REQUIRED",
"types": [{
"list": {
"type": {"i32": {"nullability": "NULLABILITY_REQUIRED"}},
"nullability": "NULLABILITY_REQUIRED"
}
}]
}
},
"namedTable": {
"names": ["test_table"]
}
}
},
"expressions": [{
"scalarFunction": {
"functionReference": 1,
"outputType": {
"list": {
"type": {"i32": {"nullability": "NULLABILITY_REQUIRED"}},
"nullability": "NULLABILITY_REQUIRED"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {"field": 0}
},
"rootReference": {}
}
}
},
{
"value": {
"lambda": {
"parameters": {
"nullability": "NULLABILITY_REQUIRED",
"types": [{"i32": {"nullability": "NULLABILITY_REQUIRED"}}]
},
"body": {
"literal": {"i32": 42}
}
}
}
}
]
}
}]
}
},
"names": ["result"]
}
}]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be awesome to be able to have something like substrait-explain for this to make these readable, but that is for another day and another battle.
go.mod
Outdated
| @@ -16,7 +16,7 @@ require ( | |||
| github.com/google/uuid v1.6.0 | |||
| github.com/stretchr/testify v1.10.0 | |||
| github.com/substrait-io/substrait v0.78.1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| github.com/substrait-io/substrait v0.78.1 | |
| github.com/substrait-io/substrait v0.79.0 |
this should get us access to functions_list.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay... this is opening up a can of worms. If you do this, we will fail parsing these YAML files. To fix this, you then need to update this line:
substrait-go/grammar/generate.go
Line 5 in 579861e
| // using substrait v0.69.0 |
to v0.79.0
Then when you call go generate, it will update to use the new grammar which includes funcs.
Then unfortunately, you'll need to update some more code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unresolving this one, I think you still haven't called go generate to regenerate all of the code here
Co-authored-by: Ben Bellick <[email protected]>
closes #187
This PR adds lambda support to substrait-go.
Note:
ExprFromProtodoes not include the same validation process as.Build().ExprFromProtoresolves types but does not validate field references. I will be adding this in a future PR.LambdaInvocationis not included in this PR and will be implemented separately.