Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1199,7 +1199,8 @@ message Expression {
Type.Struct parameters = 1;

// The lambda body expression. Lambda parameters can be referenced using FieldReference
// with FieldReference.LambdaParameterReference as root_type.
// with FieldReference.LambdaParameterReference as root_type and ReferenceSegment.StructField
// as direct_reference to select specific parameters.
Expression body = 2;
}

Expand Down Expand Up @@ -1575,13 +1576,17 @@ message Expression {
MaskExpression masked_reference = 2;
}

// The origin of the data being referenced. When this is a RootReference and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this comment equally applies to OuterReference, so I opted to allow each field in the oneof have its own comment.

// direct_reference above is used, the direct_reference must be of a type
// StructField.
// The origin of the data being referenced.
oneof root_type {
Expression expression = 3;
// When used with direct_reference, the direct_reference must be of type
// ReferenceSegment.StructField.
RootReference root_reference = 4;
// When used with direct_reference, the direct_reference must be of type
// ReferenceSegment.StructField.
OuterReference outer_reference = 5;
// Must use direct_reference and the direct_reference
// must be of type ReferenceSegment.StructField.
Copy link
Member Author

@benbellick benbellick Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a completely artificial constraint. I am unsure whether or not we want to allow masked_references with LambdaParameterReferences.

This particular PR was prompted by substrait-io/substrait-go#183 which is adding lambda support to substrait-go.

I am not very familiar with MaskExpressions. I could imagine using them for lambda if we wanted to implement something like (a1,a2,a3,...,a1000) -> <a3,a49,a235> but that seems like an unrealistic use case. Furthermore, we could always construct that output struct by hand. Since it makes implementation easier, I opted to forbid MaskReference here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of forbidding it in the spec, wouldn't it be better to simply call it out as unimplemented in substrait-go?

Copy link
Contributor

@yongchul yongchul Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, calling out in substrait-go is a good option although I am fine with this change as I am also not familiar with mask expression and we can relax this later as well.

Does anyone know systems using MaskExpression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tokoko I am okay with allowing it in the spec provided we can think of a legitimate use case for it. Otherwise, I would prefer to be defensive. If we allow, then we may have to answer questions in the future about what it means :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as linked in the PR description, the markdown docs already suggest that StructField is the appropriate way to make this reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some google search and the closest thing I found in the engines was duckdb allowing EXCLUDE inside nested expressions.. my search wasn't exactly thorough though. I guess my point is that whatever reasoning we use for disallowing masked_reference for lambda_parameters specifically could just as easily be used for disallowing (removing) masked_references from substrait entirely.

Having said that, I'm fine with going ahead here if it makes anyone's life easier on the implementation side. We can always come back if it turns out to be an issue.

LambdaParameterReference lambda_parameter_reference = 6;
}

Expand All @@ -1599,8 +1604,8 @@ message Expression {

// A reference to a lambda parameter within a lambda body expression.
// This identifies which lambda scope to reference, treating its parameters
// as a struct. Use FieldReference with this as root_type to access specific
// parameters or nested fields within parameters.
// as a struct. Further navigation into nested fields within parameters uses
// the child field of StructField.
message LambdaParameterReference {
// Number of lambda boundaries to traverse up for this reference.
// For nested lambdas:
Expand Down