Skip to content

Unified OpenSearch PPL Data Type#3345

Merged
penghuo merged 9 commits intoopensearch-project:mainfrom
penghuo:issue3339
Mar 25, 2025
Merged

Unified OpenSearch PPL Data Type#3345
penghuo merged 9 commits intoopensearch-project:mainfrom
penghuo:issue3339

Conversation

@penghuo
Copy link
Copy Markdown
Collaborator

@penghuo penghuo commented Feb 24, 2025

Description

This PR introduces a language specification abstraction to support SQL and PPL query processing. The main changes include:

  • A new interface, LangSpec, is added with a default SQL implementation and a custom PPLLangSpec that maps specific expression types (e.g., mapping BYTE to “tinyint”).
  • Updates in index describe requests and query results ensure that the correct type names are used based on the active language specification (SQL or PPL).
  • Updated tests verify the behavior of the language specification mappings and system index utilities.
  • Updated OpenSearch PPL data type doc.

To Reviewer

  • Ideally, Ideally, the query engine should use well-defined data types, with LangSpec serving as the protocol for translating these engine types to language-specific types. But, currently, the describe implementation relies on OpenSearchDataType (an extension of ExprCoreDataType). After the Calcite migration, all data types should be unified as CalciteDataType. then the specially handing in describe is not necessary.

Related Issues

#3339

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@penghuo penghuo self-assigned this Feb 24, 2025
@penghuo penghuo added the v3.0.0 label Feb 24, 2025
Signed-off-by: Peng Huo <penghuo@gmail.com>
Comment on lines +25 to +30
static {
exprTypeToPPLType.put(ExprCoreType.BYTE, "tinyint");
exprTypeToPPLType.put(ExprCoreType.SHORT, "smallint");
exprTypeToPPLType.put(ExprCoreType.INTEGER, "int");
exprTypeToPPLType.put(ExprCoreType.LONG, "bigint");
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a breaking change, why don't we directly change the old type to the new type, instead of introducing LangSpec? Isn't this unnecessarily adding complexity?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The core type was not upgraded because I intended for the data type changes to affect only PPL and not SQL. This PR focuses on unifying PPL data types, while SQL data types can be addressed in a separate issue, as changes there would impact JDBC, ODBC, and CLI.

Ideally, the query engine should use well-defined data types, with LangSpec serving as the protocol for translating these engine types to language-specific types. Once the Calcite implementation is complete, CalciteDataType will translate to ExprDataType, and LangSpec will translate from ExprDataType to the PPL response data type.

Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
dai-chen
dai-chen previously approved these changes Mar 11, 2025
Signed-off-by: Peng Huo <penghuo@gmail.com>
dai-chen
dai-chen previously approved these changes Mar 24, 2025
Copy link
Copy Markdown
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the SystemFunctionIT be updated correspondingly?

*/
public interface LangSpec {
/** Enumerates the supported language types. */
enum LangType {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
@penghuo penghuo merged commit 41917ef into opensearch-project:main Mar 25, 2025
22 checks passed
penghuo added a commit that referenced this pull request Jun 16, 2025
---------

Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants