Skip to content

Conversation

@vbarua
Copy link
Member

@vbarua vbarua commented Apr 1, 2025

BREAKING CHANGE: removed function-based table lookup conversion methods

BREAKING CHANGE: removed function-based table lookup conversion methods
@vbarua
Copy link
Member Author

vbarua commented Apr 1, 2025

This PR removes the following public methods:

  • SqlToSubstrait#execute(String sql, Function<List<String>, NamedStruct> tableLookup)
  • SubstraitToSql#substraitRelToCalciteRel(Rel relRoot, Function<List<String>, NamedStruct> tableLookup)

and adds 2 new methods that replace their functionality:

  • SqlToSubstrait#execute(String sql, Prepare.CatalogReader catalogReader)
  • SubstraitToSql#substraitRelToCalciteRel(Rel relRoot, Prepare.CatalogReader catalog)

The original API was introduced in #26 (along with #33) with the following intent:

This uses a function to look up a table schema without a create table statement. I've intentionally kept the interface free from calcite objects so as to not export them as part of the isthmus public API.

At this point though, we expose a number of Calcite objects in our public API in order for users to refine conversion between Substrait and Calcite.

Additionally, I would make the point that converting from SQL to Substrait is really 2 conversions:

  1. SQL -> Calcite, which is a capability of Calcite that we can leverage
  2. Calcite -> Substrait, which is the capability that Isthmus aims to provide

Given how tightly integrated 1 is with Calcite, I would argue that instead of hiding Calcite details from users it would be more productive to guide them towards using Calcite effectively to parse and unparse SQL. This is part of the aim of #362, which this PR works towards.

@vbarua vbarua marked this pull request as ready for review April 1, 2025 01:27
RelDataTypeFactory factory, CalciteCatalogReader catalog, SqlValidator.Config config) {
return new Validator(SubstraitOperatorTable.INSTANCE, catalog, factory, config);
RelDataTypeFactory factory,
SqlValidatorCatalogReader validatorCatalog,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calcite only requires a SqlValidatorCatalogReader, not a full CalciteCatalogReader when constructing a validator.

}
}
return Pair.of(validator, catalogReader);
return catalogReader;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decoupled the creation of the schema and the creation of the validator. The validator is not need as part of creating the schema. The validator can be created after with the schema we produce here. Splitting this apart makes this more explicit. It also simplifies some future changes I have in mind.

These methods are all internal.

var pair = registerCreateTables(tables);
return executeInner(sql, factory, pair.left, pair.right);
CalciteCatalogReader catalogReader = registerCreateTables(tables);
SqlValidator validator = Validator.create(factory, catalogReader, SqlValidator.Config.DEFAULT);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As indicated above, we now create the schema first and create validator from the schema. This works because the validator is not needed to validate the schema, but rather the query we are parsing.


public SubstraitToSql() {
super(FEATURES_DEFAULT);
CalciteSchema rootSchema = CalciteSchema.createRootSchema(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused.

import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;

public class TableLookupTest extends PlanTestBase {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is no longer needed as this functionality is no longer supported.

@vbarua
Copy link
Member Author

vbarua commented Apr 1, 2025

Tagging @nielspardon @mbwhite as you've both been poking at the SQL conversion side of things.

Tagging @rymurr, @jinfengni as you added the original APIs that are being removed.

@vbarua vbarua changed the title feat: new Prepare.CatalogReader based APIs for SQL to/from Substrait feat(isthmus): new Prepare.CatalogReader based APIs for SQL to/from Substrait Apr 1, 2025
@mbwhite
Copy link
Contributor

mbwhite commented Apr 1, 2025

Hi @vbarua agree with what you've proposed here; I understand the original intent, and the logic there but agree things have moved on as well.

As found with Calcite 1.39, there's (another) lazy mechanism to get table schema information; not sure how the lookup added in 1.39 is related to the Prepare.CatalogReader ?

Copy link
Member

@nielspardon nielspardon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vbarua
Copy link
Member Author

vbarua commented Apr 1, 2025

As found with Calcite 1.39, there's (another) lazy mechanism to get table schema information; not sure how the lookup added in 1.39 is related to the Prepare.CatalogReader ?

The usage of Prepare.CatalogReader isn't related to the new API. I haven't actually looked at 1.39 yet. I'm choosing Prepare.CatalogReader because that is what's needed to construct a SqlToRelConverter, which is the Calcite class that let's us convert from SQL to Calcite. It provides the most flexiblity to users because it's the datastructure they would need to build themselves anyways in order to apply the conversion.

return executeInner(sql, validator, catalogReader);
}

public Plan execute(String sql, String name, Schema schema) throws SqlParseException {
Copy link
Member Author

@vbarua vbarua Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is somewhat redundant when we add execute(String sql, Prepare.CatalogReader catalogReader) as users can easily convert the Schema to a Prepare.CatalogReader satisfying instance themselves.

Choosing to leave it in for now.

@mbwhite
Copy link
Contributor

mbwhite commented Apr 2, 2025

I can reapply the Calcite 1.39 changes after this - if they are needed at all after the API change.

@vbarua vbarua merged commit 3852640 into main Apr 3, 2025
13 checks passed
@vbarua vbarua deleted the vbarua/new-sql-conversion-apis branch April 3, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants