This guide explains how to develop and implement functions for PPL with Calcite.
- Create an issue describing the purpose and expected behavior of the function
- Ensure the function name is recognized by PPL syntax by checking
OpenSearchPPLLexer.g4,OpenSearchPPLParser.g4, andBuiltinFunctionName.java - Plan the documentation of the function under
docs/user/ppl/functions/directory
A user-defined function is an instance of SqlOperator that transforms input row expressions (RexNode) into a new one.
There are mainly three approaches to implementing UDFs:
Leverage operators already declared in
Calcite's SqlStdOperatorTable
or SqlLibraryOperators,
and defined
in RexImpTable.java.
For example, SqlStdOperatorTable.PLUS is used as one of the implementations for + in PPL.
This approach is useful when the function you need to implement already exists in Apache Calcite and you just need to expose it through your PPL interface.
Adapt Java static methods to UDFs using utility functions like UserDefinedFunctionUtils.adapt*ToUDF.
This approach allows you to leverage existing Java methods by wrapping them as UDFs, which can be more straightforward than implementing from scratch.
Among existing adaptation utilities, adaptExprMethodToUDF adapts a v2 function implementation into a UDF builder, while adaptMathFunctionToUDF adapts a static function from java.lang.Math to a
UDF builder. You can create your own adaptation utilities if you need to adapt other kinds of static methods to UDFs.
Example:
SqlOperator SINH = adaptMathFunctionToUDF(
"sinh", ReturnTypes.DOUBLE_FORCE_NULLABLE, NullPolicy.ANY, PPLOperandTypes.NUMERIC)
.toUDF("SINH");For more complex functions or when you need complete control over the implementation:
- Implement the
ImplementorUDFinterface, which is a simplified interface for creating a SqlUserDefinedFunction. - Instantiate and convert it to a
SqlOperatorinPPLBuiltinOperators - For optimal UDF performance, implement any data-independent logic during the compilation phase instead of at runtime. Specifically, use linq4j expressions for these operations rather than internal static method calls, as expressions are evaluated during compilation.
Example:
public class MyCustomUDF extends ImplementorUDF {
// Define operand types, return types, null policies, and constructors
// ...
public static class Crc32Implementor implements NotNullImplementor {
@Override
public Expression implement(
RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) {
// Implementation details goes here ...
}
}
}
// Converting to SqlUserDefinedFunction (an extension of SqlOperator)
SqlOperator myOperator = new MyCustomUDF().toUDF("FUNC_NAME");Type checking ensures that functions receive the correct argument types:
- Each
SqlOperatorprovides an operand type checker via itsgetOperandTypeCheckermethod - Calcite's built-in operators come with predefined type checkers of type
SqlOperandTypeChecker - For custom UDFs, the
UDFOperandMetadatainterface is used to feed function type information so that aSqlOperandTypeCheckercan be retrieved in the same way as Calcite's built-in operators. Most of the operand types are defined inPPLOperandTypesas instances ofUDFOperandMetadata. E.g.PPLOperandTypes.NUMERIC_NUMERIC - Since
SqlOperandTypeCheckerworks on parsed SQL trees (which aren't directly accessible in our architecture), thePPLTypeCheckerinterface was created to perform actual type checking. Most instances ofPPLTypeCheckerare created by wrapping Calcite's built-in type checkers.
The following code snippet explains their relationships:
// For built-in Calcite operators
SqlOperandTypeChecker cosSqlTypeChecker = SqlStdOperatorTable.COS.getOperandTypeChecker(); // FamilyOperandTypeChecker(NUMERIC)
// For user defined functions
// UDFOperandMetadata wraps a SqlOperandTypeChecker, so that the type information can be fed to a SqlUserDefinedFunction.
// Refer to the javadoc of UDFOperandMetadata class for more details on why this workaround is necessary
UDFOperandMetadata NUMERIC = UDFOperandMetadata.wrap((FamilyOperandTypeChecker) OperandTypes.NUMERIC);
SqlOperator COSH =
adaptMathFunctionToUDF(
"cosh", ReturnTypes.DOUBLE_FORCE_NULLABLE, NullPolicy.ANY, NUMERIC)
.toUDF("COSH");
SqlOperandTypeChecker coshTypeChecker = COSH.getOperandTypeChecker().getInnerTypeChecker(); // FamilyOperandTypeChecker(NUMERIC)
// SqlOperandTypeChecker works on parsed SQL trees, which don't exist in our architecture, so it cannot be directly
// applied to check operand types. We create another interface PPLTypeChecker to do the actual type checking.
// It works by retrieving operand type information from a SqlOperandTypeChecker, then checking against actual argument types.
PPLTypeChecker cosPplTypeChecker = PPLTypeChecker.wrapFamily(cosSqlTypeChecker);
// Equivalently, PPL type checkers can be created by directly specifying expected operand types
PPLTypeChecker numericTypeChecker = PPLTypeChecker.family(SqlTypeFamily.NUMERIC);UDFs should be registered in PPLFuncImpTable. The preferred API is
AbstractBuilder::
registerOperator(BuiltinFunctionName functionName, SqlOperator... operators)`- It automatically extracts type checkers from operators and converts them to
PPLTypeCheckerinstances - Multiple implementations can be registered to the same function name for overloading
- The system will try to resolve functions based on argument types, with automatic coercion when needed
For example, the following statement registers calcite's built-in COS operator as the cosine function in PPL. Under the
hood, it first retrieves a SqlOperandTypeChecker from SqlStdOperatorTable.COS, then converts it to a PPLTypeChecker,
finally registers it as cos function in PPL function registry.
registerOperator(COS, SqlStdOperatorTable.COS);The following example shows how to register overloadings to the same function name. + operator is registered for both
and number addition and string concatenation, controlled via type checkers. I.e. if both operands are number, they will
be resolved to SqlStdOperatorTable.PLUS since the operand types does not pass the type checking of
SqlStdOperatorTable.CONCAT,
which requires two strings.
registerOperator(ADD, SqlStdOperatorTable.PLUS, SqlStdOperatorTable.CONCAT);AbstractBuilder::
register(BuiltinFunctionName functionName, FunctionImp functionImp, PPLTypeChecker typeChecker)Use this approach when:
- You need a custom type checker
- You want to customize an existing function by tweaking its arguments
- Setting
typeCheckertonullwill bypass type checking (use with caution)
Some functions implementation depend on underlying data sources. They should be registered with
PPLFuncImpTable::registerExternalOperator
For example, the GEOIP function relies on
the opensearch-geospatial plugin. It is registered as an external
function in OpenSearchExecutionEngine.
Comprehensive testing is essential for UDFs:
- Integration tests in
Calcite*ITclasses to verify function result correctness - Unit tests in
CalcitePPLFunctionTypeTestto validate type checker behavior - Push-down tests in
CalciteExplainITif the function can be pushed down as a domain-specific language (DSL)
User-defined aggregation functions aggregate data across multiple rows.
There are two main approaches to create a UDAF:
Leverage existing aggregation operators from Calcite if they match your requirements.
For custom aggregation logic:
- Extend
SqlUserDefinedAggFunctionwith custom aggregation logic - Instantiate the new aggregation function in
PPLBuiltinOperators
- Use
AggBuilder::registerOperator(BuiltinFunctionName functionName, SqlAggFunction aggFunction)for standard registration - For more control, use
AggBuilder::register(BuiltinFunctionName functionName, AggHandler aggHandler, PPLTypeChecker typeChecker) - For functions dependent on data engines, use
PPLFuncImpTable::registerExternalAggOperator
- Verify result correctness in
CalcitePPLAggregationIT - Test logical plans in
CalcitePPLAggregationTest