Skip to content

fix(isthmus): use explicit return type for scalar function expressions#355

Merged
vbarua merged 1 commit intosubstrait-io:mainfrom
nielspardon:par-subtract-return-type
Apr 7, 2025
Merged

fix(isthmus): use explicit return type for scalar function expressions#355
vbarua merged 1 commit intosubstrait-io:mainfrom
nielspardon:par-subtract-return-type

Conversation

@nielspardon
Copy link
Member

When converting a Substrait plan back to Apache Calcite which contains a datetime subtract such as the one generated by SQL-to-Substrait code for TPC-H query no. 1 the mapping fails with a java.lang.ArrayIndexOutOfBoundsException because the Substrait-to-Calcite mapping code relies on Calcite's type inference for determining the return type of the subtract function call.

Since the Substrait plan contains an explicitly return type in this case for this function call we can simply provide the return type explicitly to Calcite and avoid using the return type inference.

This PRs provides the function return type from the Substrait plan to Calcite and adds a test case for verifying that the expression conversion no longer throws an Exception and sets the function return type correctly in Calcite.

@nielspardon nielspardon force-pushed the par-subtract-return-type branch 3 times, most recently from 9497a5a to 9baddc2 Compare March 28, 2025 07:55
@nielspardon nielspardon changed the title fix(isthmus): use explicit return type for datetime subtract fix(isthmus): use explicit return type for scalar function expressions Mar 28, 2025

return rexBuilder.makeCall(operator, args);
RelDataType returnType = typeConverter.toCalcite(typeFactory, expr.outputType());
return rexBuilder.makeCall(returnType, operator, args);
Copy link
Member

@vbarua vbarua Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change also brings us in line with the conversion for window functions

RelDataType outputType = typeConverter.toCalcite(typeFactory, expr.outputType());

and aggregate functions

RelDataType returnType = typeConverter.toCalcite(typeFactory, measure.getFunction().getType());

assertEquals(
TypeConverter.DEFAULT.toCalcite(typeFactory, TypeCreator.REQUIRED.DATE),
calciteExpr.getType());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change above, but I don't think this is the correct test for it.

The fact that the Calcite return type inference throws makes me think that we're mapping "subtract:date_iday" to the wrong Calcite function and/or that the return type inference as defined doesn't correspond with Substrait. Your (good) change sort of papers over the issue, but it's still there.

Ideally, the explicit return type in Substrait AND the inferred return type from Calcite should agree. In fact, when it doesn't it can problems during optimization. This is also an issue beyond the scope of your change here.

All that being said, I'm not sure what a nice way to check this. One idea I have is to construct an invalid scalar function call invoking add:i64_i64 but with return type string, which is obviously wrong in both Calcite and Substrait, and check that the constructed call still has the wrong type. Effectively, we want to directly check a property like functionConversionUsesGivenReturnType even when the return type is wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the Calcite return type inference throws makes me think that we're mapping "subtract:date_iday" to the wrong Calcite function and/or that the return type inference as defined doesn't correspond with Substrait. Your (good) change sort of papers over the issue, but it's still there.

I did do a bit of digging while debugging the ArrayIndexOutOfBoundsException. What happens is that the default Apache Calcite datetime subtract (which we map to) is not a 2 argument function but a 3 argument function which expects the return type as the third argument:

A special operator for the subtraction of two DATETIMEs. The format of DATETIME subtraction is:

    "(" <datetime> "-" <datetime> ")" <interval qualifier>

This operator is special since it needs to hold the additional interval qualifier specification.

https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/fun/SqlStdOperatorTable.html#MINUS_DATE

https://github.com/apache/calcite/blob/b394abca407eea06ee8f8b50f30bd443f286bb42/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L1428-L1439

We're getting the ArrayIndexOutOfBoundsException because that third parameter is missing in this specific case.

Interestingly, there's an alternative implementation for Big Query which has 2 arguments:

https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/fun/SqlInternalOperators.html#MINUS_DATE2

Looks to me there's a bit of a special case going on here with the specific datetime subtract function that triggered my investigation into this and not necessarily a general type inference issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote #377 to capture the specific subtract issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further modifying these tests in #378

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vbarua vbarua merged commit 867697c into substrait-io:main Apr 7, 2025
12 checks passed
@nielspardon nielspardon deleted the par-subtract-return-type branch August 5, 2025 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants