-
Notifications
You must be signed in to change notification settings - Fork 706
[#9561] feat(core): Add UDF management core operations and validations #9560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5599a23 to
3c47d32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements core operations for UDF (User-Defined Function) management in Apache Gravitino, providing the foundation for managing user-defined functions within the catalog system. The implementation includes comprehensive validation logic to prevent ambiguous function invocations through arity overlap detection.
Key changes:
- Implements
ManagedFunctionOperationswith full CRUD operations (register, get, list, alter, drop) for UDF management - Adds
FunctionEntityas the metadata storage representation of functions with versioning support - Implements sophisticated validation for
FunctionChangeoperations, including arity overlap detection for function definitions with default parameters
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
core/src/main/java/org/apache/gravitino/catalog/ManagedFunctionOperations.java |
Core implementation of function catalog operations including registration, retrieval, alteration, and deletion with validation logic |
core/src/main/java/org/apache/gravitino/meta/FunctionEntity.java |
Entity class representing function metadata with builder pattern and support for versioning |
core/src/test/java/org/apache/gravitino/catalog/TestManagedFunctionOperations.java |
Comprehensive unit tests covering all CRUD operations, function change validations, and edge cases |
core/src/main/java/org/apache/gravitino/Entity.java |
Adds FUNCTION to the EntityType enum |
core/src/main/java/org/apache/gravitino/storage/relational/JDBCBackend.java |
Adds FUNCTION entity type handling to garbage collection methods with TODO note for future versioning support |
core/src/main/java/org/apache/gravitino/catalog/ManagedFunctionOperations.java
Show resolved
Hide resolved
core/src/main/java/org/apache/gravitino/catalog/ManagedFunctionOperations.java
Show resolved
Hide resolved
|
@jerryshao could you plz help to review this PR when you have time, thx! |
|
I suggest splitting this PR into two. You can move the |
done |
| } | ||
|
|
||
| @Override | ||
| public Function getFunction(NameIdentifier ident, int version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do user know which versions they already have and which version to pick? Since you don't have a list version interface.
| } | ||
| } catch (IOException e) { | ||
| throw new RuntimeException("Failed to get function " + ident, e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to distinguish NoSuchFunctionException and NoSuchFunctionVersionException, to handle this you need to call the DB twice, which is not so efficient.
| throw new UnsupportedOperationException( | ||
| "registerFunction for table-valued functions: FunctionEntity not yet implemented"); | ||
| return doRegisterFunction( | ||
| ident, comment, FunctionType.TABLE, deterministic, null, returnColumns, definitions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to use Optional instead of null, which has no type information.
| // TODO: Implement when FunctionEntity is available | ||
| throw new UnsupportedOperationException("dropFunction: FunctionEntity not yet implemented"); | ||
| try { | ||
| return store.delete(ident, Entity.EntityType.FUNCTION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this drop all the versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The version of the function is auto-incrementing with each alteration operation. The plan is to support rolling back the function to a specified version in the future, so this deletion will remove all versions of the function.
| * @param definitions The array of definitions to validate. | ||
| * @throws IllegalArgumentException If any two definitions have overlapping arities. | ||
| */ | ||
| private void validateDefinitionsNoArityOverlap(FunctionDefinition[] definitions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest you to add more tests for this method.
| * <li>{@code foo(int a, float b)} → arities: {@code ["int,float"]} | ||
| * <li>{@code foo(int a, float b default 1.0)} → arities: {@code ["int", "int,float"]} | ||
| * <li>{@code foo(int a, float b default 1.0, string c default 'x')} → arities: {@code ["int", | ||
| * "int,float", "int,float,string"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be "int,string"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, "int,string" is not a valid arity here. Since the current arity design is based on positional parameter matching, you cannot skip the middle parameter b to directly pass c.
For engines that support named parameters (like Python), the parameter names and values would be passed together at invocation time (e.g., {a: 1, c: "hello"}). In that case, the engine can resolve the correct definition by mapping parameter names to their positions and validating that any missing parameters have default values. This would be handled at runtime by the engine, not during function registration.
| // Find the first parameter with a default value | ||
| int firstDefaultIndex = params.length; | ||
| for (int i = 0; i < params.length; i++) { | ||
| if (params[i].defaultValue() != Column.DEFAULT_VALUE_NOT_SET) { | ||
| firstDefaultIndex = i; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this loop can be merged with above.
| public static final Field VERSION = | ||
| Field.required("version", Integer.class, "The version of the function entity."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was think the necessity of version support for UDF. If we want to support versions, we should let users know the existing versions, and can fetch the versions, otherwise, how do they know which version to use?
| new FunctionParam[] {FunctionParams.of("input", Types.StringType.get())}; | ||
| FunctionDefinition[] definitions = new FunctionDefinition[] {createSimpleDefinition(params)}; | ||
|
|
||
| org.apache.gravitino.function.Function newFunc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use full qualified package name.
| } | ||
|
|
||
| @Test | ||
| public void testInvalidParameterOrder() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move all the public test methods above the private methods.
What changes were proposed in this pull request?
This PR implements the core operations for UDF (User-Defined Function) management in Gravitino, including:
ManagedFunctionOperations implementation:
registerFunction: Register scalar, aggregate, and table functions with multi-definition supportgetFunction: Retrieve function by identifier (latest version)listFunctions: List all functions in a schema namespacedropFunction: Remove a functionUnit tests:
Why are the changes needed?
Fix: #9561
This is part of the UDF management feature for Gravitino. The core operations layer provides the business logic for function management, ensuring data integrity through proper validations.
Does this PR introduce any user-facing change?
No, this is an internal implementation.
How was this patch tested?
Added unit tests in
TestManagedFunctionOperations.javacovering: