Skip to content

Conversation

@benbellick
Copy link
Member

@benbellick benbellick commented Jan 29, 2026

The markdown documentation already implies that lambda parameters must be referenced with direct_reference and StructField, but the proto comments didn't explicitly specify this constraint. This PR clarifies the proto to match the documentation.

Additionally, updated the RootReference/OuterReference proto comments for consistency.

@benbellick benbellick force-pushed the benbellick/clarify-lambda-param-ref-constraint branch from 96477b1 to 34ba3ff Compare January 29, 2026 22:38
// ReferenceSegment.StructField.
OuterReference outer_reference = 5;
// Must use direct_reference and the direct_reference
// must be of type ReferenceSegment.StructField.
Copy link
Member Author

@benbellick benbellick Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a completely artificial constraint. I am unsure whether or not we want to allow masked_references with LambdaParameterReferences.

This particular PR was prompted by substrait-io/substrait-go#183 which is adding lambda support to substrait-go.

I am not very familiar with MaskExpressions. I could imagine using them for lambda if we wanted to implement something like (a1,a2,a3,...,a1000) -> <a3,a49,a235> but that seems like an unrealistic use case. Furthermore, we could always construct that output struct by hand. Since it makes implementation easier, I opted to forbid MaskReference here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of forbidding it in the spec, wouldn't it be better to simply call it out as unimplemented in substrait-go?

Copy link
Contributor

@yongchul yongchul Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, calling out in substrait-go is a good option although I am fine with this change as I am also not familiar with mask expression and we can relax this later as well.

Does anyone know systems using MaskExpression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tokoko I am okay with allowing it in the spec provided we can think of a legitimate use case for it. Otherwise, I would prefer to be defensive. If we allow, then we may have to answer questions in the future about what it means :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as linked in the PR description, the markdown docs already suggest that StructField is the appropriate way to make this reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some google search and the closest thing I found in the engines was duckdb allowing EXCLUDE inside nested expressions.. my search wasn't exactly thorough though. I guess my point is that whatever reasoning we use for disallowing masked_reference for lambda_parameters specifically could just as easily be used for disallowing (removing) masked_references from substrait entirely.

Having said that, I'm fine with going ahead here if it makes anyone's life easier on the implementation side. We can always come back if it turns out to be an issue.

@benbellick benbellick changed the title docs: clarify FieldReference constraints for struct-based root types docs: forbid masked_reference for lambda parameters Jan 29, 2026
MaskExpression masked_reference = 2;
}

// The origin of the data being referenced. When this is a RootReference and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this comment equally applies to OuterReference, so I opted to allow each field in the oneof have its own comment.

@benbellick
Copy link
Member Author

Closing this issue. Instead, I opened up #956 where we can continue the discussion. I am going to take the position that the use of masked_reference will blow up in substrait-go for now.

@benbellick benbellick closed this Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants