Skip to content

Conversation

@mchades
Copy link
Contributor

@mchades mchades commented Dec 25, 2025

What changes were proposed in this pull request?

This PR implements the core operations for UDF (User-Defined Function) management in Gravitino, including:

ManagedFunctionOperations implementation:

  • registerFunction: Register scalar, aggregate, and table functions with multi-definition support
  • getFunction: Retrieve function by identifier (latest version)
  • listFunctions: List all functions in a schema namespace
  • dropFunction: Remove a function

Unit tests:

  • Added comprehensive unit tests for ManagedFunctionOperations with test cases covering CRUD operations (except alter) and validation scenarios

Why are the changes needed?

Fix: #9561

This is part of the UDF management feature for Gravitino. The core operations layer provides the business logic for function management, ensuring data integrity through proper validations.

Does this PR introduce any user-facing change?

No, this is an internal implementation.

How was this patch tested?

Added unit tests in TestManagedFunctionOperations.java covering:

  • Function registration and listing
  • Function retrieval (get)
  • Function deletion (drop)
  • Error cases for invalid operations

@mchades mchades marked this pull request as draft December 25, 2025 12:12
@mchades mchades changed the title feat(core): Add UDF management core operations and validations [#9561] feat(core): Add UDF management core operations and validations Dec 25, 2025
@mchades mchades force-pushed the pr-udf-core branch 2 times, most recently from 5599a23 to 3c47d32 Compare December 30, 2025 14:26
@mchades mchades marked this pull request as ready for review January 6, 2026 08:32
@mchades mchades requested a review from jerryshao January 6, 2026 08:32
@jerryshao jerryshao requested a review from Copilot January 7, 2026 06:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements core operations for UDF (User-Defined Function) management in Apache Gravitino, providing the foundation for managing user-defined functions within the catalog system. The implementation includes comprehensive validation logic to prevent ambiguous function invocations through arity overlap detection.

Key changes:

  • Implements ManagedFunctionOperations with full CRUD operations (register, get, list, alter, drop) for UDF management
  • Adds FunctionEntity as the metadata storage representation of functions with versioning support
  • Implements sophisticated validation for FunctionChange operations, including arity overlap detection for function definitions with default parameters

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/src/main/java/org/apache/gravitino/catalog/ManagedFunctionOperations.java Core implementation of function catalog operations including registration, retrieval, alteration, and deletion with validation logic
core/src/main/java/org/apache/gravitino/meta/FunctionEntity.java Entity class representing function metadata with builder pattern and support for versioning
core/src/test/java/org/apache/gravitino/catalog/TestManagedFunctionOperations.java Comprehensive unit tests covering all CRUD operations, function change validations, and edge cases
core/src/main/java/org/apache/gravitino/Entity.java Adds FUNCTION to the EntityType enum
core/src/main/java/org/apache/gravitino/storage/relational/JDBCBackend.java Adds FUNCTION entity type handling to garbage collection methods with TODO note for future versioning support

@mchades
Copy link
Contributor Author

mchades commented Jan 8, 2026

@jerryshao could you plz help to review this PR when you have time, thx!

@jerryshao
Copy link
Contributor

I suggest splitting this PR into two. You can move the alterFunction to another PR.

@mchades
Copy link
Contributor Author

mchades commented Jan 8, 2026

I suggest splitting this PR into two. You can move the alterFunction to another PR.

done

}

@Override
public Function getFunction(NameIdentifier ident, int version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do user know which versions they already have and which version to pick? Since you don't have a list version interface.

}
} catch (IOException e) {
throw new RuntimeException("Failed to get function " + ident, e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to distinguish NoSuchFunctionException and NoSuchFunctionVersionException, to handle this you need to call the DB twice, which is not so efficient.

throw new UnsupportedOperationException(
"registerFunction for table-valued functions: FunctionEntity not yet implemented");
return doRegisterFunction(
ident, comment, FunctionType.TABLE, deterministic, null, returnColumns, definitions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to use Optional instead of null, which has no type information.

// TODO: Implement when FunctionEntity is available
throw new UnsupportedOperationException("dropFunction: FunctionEntity not yet implemented");
try {
return store.delete(ident, Entity.EntityType.FUNCTION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this drop all the versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The version of the function is auto-incrementing with each alteration operation. The plan is to support rolling back the function to a specified version in the future, so this deletion will remove all versions of the function.

* @param definitions The array of definitions to validate.
* @throws IllegalArgumentException If any two definitions have overlapping arities.
*/
private void validateDefinitionsNoArityOverlap(FunctionDefinition[] definitions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you to add more tests for this method.

* <li>{@code foo(int a, float b)} → arities: {@code ["int,float"]}
* <li>{@code foo(int a, float b default 1.0)} → arities: {@code ["int", "int,float"]}
* <li>{@code foo(int a, float b default 1.0, string c default 'x')} → arities: {@code ["int",
* "int,float", "int,float,string"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be "int,string"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, "int,string" is not a valid arity here. Since the current arity design is based on positional parameter matching, you cannot skip the middle parameter b to directly pass c.

For engines that support named parameters (like Python), the parameter names and values would be passed together at invocation time (e.g., {a: 1, c: "hello"}). In that case, the engine can resolve the correct definition by mapping parameter names to their positions and validating that any missing parameters have default values. This would be handled at runtime by the engine, not during function registration.

Comment on lines +319 to +326
// Find the first parameter with a default value
int firstDefaultIndex = params.length;
for (int i = 0; i < params.length; i++) {
if (params[i].defaultValue() != Column.DEFAULT_VALUE_NOT_SET) {
firstDefaultIndex = i;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loop can be merged with above.

Comment on lines +71 to +72
public static final Field VERSION =
Field.required("version", Integer.class, "The version of the function entity.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was think the necessity of version support for UDF. If we want to support versions, we should let users know the existing versions, and can fetch the versions, otherwise, how do they know which version to use?

new FunctionParam[] {FunctionParams.of("input", Types.StringType.get())};
FunctionDefinition[] definitions = new FunctionDefinition[] {createSimpleDefinition(params)};

org.apache.gravitino.function.Function newFunc =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use full qualified package name.

}

@Test
public void testInvalidParameterOrder() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move all the public test methods above the private methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] Add UDF management core operations

2 participants