Skip to content

Conversation

@Slimsammylim
Copy link

@Slimsammylim Slimsammylim commented Jan 20, 2026

closes #187

This PR adds lambda support to substrait-go.

Note:

  • ExprFromProto does not include the same validation process as .Build(). ExprFromProto resolves types but does not validate field references. I will be adding this in a future PR.
  • LambdaInvocation is not included in this PR and will be implemented separately.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 78.46890% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.09%. Comparing base (579861e) to head (e1a53f0).

Files with missing lines Patch % Lines
types/types.go 67.69% 20 Missing and 1 partial ⚠️
expr/builder.go 85.93% 8 Missing and 1 partial ⚠️
expr/expression.go 55.55% 4 Missing and 4 partials ⚠️
expr/field_reference.go 75.00% 2 Missing and 2 partials ⚠️
expr/lambda.go 93.47% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #183      +/-   ##
==========================================
+ Coverage   68.79%   69.09%   +0.29%     
==========================================
  Files          45       46       +1     
  Lines       10367    10576     +209     
==========================================
+ Hits         7132     7307     +175     
- Misses       2886     2912      +26     
- Partials      349      357       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

expr/lambda.go Outdated
// Recurses into all descendants (e.g., Cast(FieldRef), ScalarFunction(Cast(FieldRef))).
func resolveLambdaParamTypes(body Expression, params *types.StructType, outerParams []*types.StructType) Expression {
// Handle if body itself is a nested lambda - must check BEFORE Visit
// because Lambda.Visit would bypass our nested lambda context handling
Copy link
Author

@Slimsammylim Slimsammylim Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda.Visit(callback) internally calls callback(lambda.Body), which skips the current Lambda wrapper entirely. This loses the context we need to resolve stepsOut>0 references. Therefore, we need to handle if the body is a lambda first.

expr/lambda.go Outdated
}

// Resolve types for FieldReferences with LambdaParameterReference roots
resolvedBody := resolveLambdaParamTypes(body, b.parameters, outerParams)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After constructing the lambda, we walk the expression tree and explicitly resolve types by looking up params.Types[fieldIndex] for each lambda parameter reference. For nested lambdas, we pass down the outer lambda's parameters so stepsOut > 0 references can also be resolved.

@Slimsammylim Slimsammylim changed the title First try at lambda support feat: Add Lambda support Jan 22, 2026
@Slimsammylim Slimsammylim changed the title feat: Add Lambda support feat: add lambda support Jan 22, 2026
@Slimsammylim Slimsammylim changed the title feat: add lambda support feat: add lambda support (#187) Jan 22, 2026
@Slimsammylim Slimsammylim changed the title feat: add lambda support (#187) feat: add lambda support Jan 22, 2026
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure where exactly to say this, but consider reducing the number of comments across all of the added code. In my view, the ideal code is self-documenting, i.e. the code should be optimized for readability. If the intent is clear, then no comments necessary. However, for anything tricky or for docstrings, comments are absolutely essential.

I haven't done a full review yet but here are a couple of things to consider. Great progress! 🚀

expr/builder.go Outdated
targetParams := lpb.b.lambdaContext[len(lpb.b.lambdaContext)-1-int(lpb.stepsOut)]

// Validate that the parameter exists by checking field index for StructFieldRef
if structRef, ok := lpb.ref.(*StructFieldRef); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline, but I think this ref must be a StructFieldRef, though the spec doesn't do a great job of making that clear in the proto messages. Investigating right now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out := *f
out.Nullability = n
return &out
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is not a deep copy, so the same ParameterTypes is used in the implicit parameter and the returned parameter. Is that safe to do in this context? It might be, I genuinely am unsure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructType also does a shallow copy. Maybe @vbarua can offer context on if this is the intended semantics of the WithNullability function? I don't think you wrote this code, but maybe you have historical context 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment just says:

		// WithNullability returns a copy of this type but with
		// the nullability set to the passed in value

Yes, it is a copy... but unclear if it is meant to be a deep copy or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be very strange if you had a reason to alter the parameters of the type after it had been constructed, but it is probably worth adding a comment somewhere about this invariant if there isn't one already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I thought this function was a bit strange too. This definition was mostly me just following how previous types have defined this function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe making a deep copy is too expensive for a deeply nested struct?

Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor cosmetic things, but the implementation itself is awesome 💯

Going through the tests now, but wanted to dump comments so that you can work on them in parallel.

// the current lambda to find the target parameter (0 = current lambda, 1 = outer lambda, etc.).
//
// The Build method validates the stepsOut value and field index against the
// available lambda context, and resolves the reference type automatically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion with @vbarua, I believe we may end up leaving it in the spec for now that you can use a MaskedExpression reference with lambdas, though we don't know what that means. Thus, this implementation is completely sufficient for now, we can just say we don't build that yet. But it will be good to call it out, specifically linking to an issue in upstream about clarification. I haven't made that issue, but when it exists, we should link it in these docs.

expr/builder.go Outdated

// Validate that the parameter field index exists
if int(lpb.ref.Field) >= len(targetParams.Types) {
return nil, fmt.Errorf("%w: lambda body references parameter %d but lambda only has %d parameters",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention something about the steps out here too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, it could be ambiguous from the error message which one we are talking about. Let's also make sure some test captures that this info is present in the error message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a better error message to include the steps out value. I also updated the builder error tests to search for this info in the error message.

}
case *proto.Expression_Lambda_:
if et.Lambda.Parameters == nil {
return nil, fmt.Errorf("%w: lambda parameters cannot be nil", substraitgo.ErrInvalidExpr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally fine, but it is a bummer that we use ErrInvalidExpr throughout this file already. IMO, there should be ErrInvalidExprBuilder and ErrInvalidExprProto or something like that. But that's for a later PR 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we actually drop this file and move it to type_test.go? We can add the roundtrip tests as just additional cases in:

func TestTypeRoundtrip(t *testing.T) {

I think there is similarly already also a test for type short names that we can add one more case too. Maybe one for the long name though I haven't looked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library doesn't have the facilities to catch these yet, but I think all of the JSON tests here are syntactically correct, but semantically meaningless. To simplify the discussion, I am using a dummy explain format.

This plan here is basically the following thing:

Root
  Project
    expressions: [λ($0: i32) -> 42]
    input: Read(virtual_table [[0]])

However, I think what you intend is something like:

Root
  Project
    expressions: [transform(field_ref(0), λ($0: i32) -> 42)]
    input: Read(
             virtual_table [[[0]]],
             schema=[values: list<i32>]
           )

The difference here is that in the first one, the expression just evaluates to a lambda. We may technically allows lambdas as column values... but I'm not sure about that. But the second says to apply the lambda.

I also think these don't all need to be virtual tables, and it might simplify them to just make a read rel of some made up table:

Root
  Project
    expressions: [transform(field_ref(0), λ($0: i32) -> 42)]
    input: Read(
             namedTable="test_table",
             schema=[values: list<i32>]
           )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using transform means that we are testing that functions_list.yaml is loaded correctly. However, it seems like it isn't working locally for me. This YAML file must be missing somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all that other stuff is said and done though, then I think this is a possible form of that basic plan:

{
  "version": {"majorNumber": 0, "minorNumber": 79},
  "extensionUrns": [
    {
      "extensionUrnAnchor": 1,
      "urn": "extension:io.substrait:functions_list"
    }
  ],
  "extensions": [
    {
      "extensionFunction": {
        "extensionUrnReference": 1,
        "functionAnchor": 1,
        "name": "transform"
      }
    }
  ],
  "relations": [{
    "root": {
      "input": {
        "project": {
          "common": {"direct": {}},
          "input": {
            "read": {
              "common": {"direct": {}},
              "baseSchema": {
                "names": ["values"],
                "struct": {
                  "nullability": "NULLABILITY_REQUIRED",
                  "types": [{
                    "list": {
                      "type": {"i32": {"nullability": "NULLABILITY_REQUIRED"}},
                      "nullability": "NULLABILITY_REQUIRED"
                    }
                  }]
                }
              },
              "namedTable": {
                "names": ["test_table"]
              }
            }
          },
          "expressions": [{
            "scalarFunction": {
              "functionReference": 1,
              "outputType": {
                "list": {
                  "type": {"i32": {"nullability": "NULLABILITY_REQUIRED"}},
                  "nullability": "NULLABILITY_REQUIRED"
                }
              },
              "arguments": [
                {
                  "value": {
                    "selection": {
                      "directReference": {
                        "structField": {"field": 0}
                      },
                      "rootReference": {}
                    }
                  }
                },
                {
                  "value": {
                    "lambda": {
                      "parameters": {
                        "nullability": "NULLABILITY_REQUIRED",
                        "types": [{"i32": {"nullability": "NULLABILITY_REQUIRED"}}]
                      },
                      "body": {
                        "literal": {"i32": 42}
                      }
                    }
                  }
                }
              ]
            }
          }]
        }
      },
      "names": ["result"]
    }
  }]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome to be able to have something like substrait-explain for this to make these readable, but that is for another day and another battle.

go.mod Outdated
@@ -16,7 +16,7 @@ require (
github.com/google/uuid v1.6.0
github.com/stretchr/testify v1.10.0
github.com/substrait-io/substrait v0.78.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
github.com/substrait-io/substrait v0.78.1
github.com/substrait-io/substrait v0.79.0

this should get us access to functions_list.yaml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... this is opening up a can of worms. If you do this, we will fail parsing these YAML files. To fix this, you then need to update this line:

// using substrait v0.69.0

to v0.79.0

Then when you call go generate, it will update to use the new grammar which includes funcs.

Then unfortunately, you'll need to update some more code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving this one, I think you still haven't called go generate to regenerate all of the code here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement lambdas

2 participants