Skip to content

Commit 9af6017

Browse files
committed
Add rfc for native built in
1 parent 4a28092 commit 9af6017

File tree

1 file changed

+108
-0
lines changed

1 file changed

+108
-0
lines changed
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
2+
3+
## [Register Native Sidecar Functions in default namespace]
4+
5+
Proposers
6+
7+
* Kevin Tang
8+
* Feilong Liu
9+
10+
## [Related Issues]
11+
12+
13+
RFC for new SPI for SQL invoked functions
14+
https://github.com/prestodb/rfcs/pull/40/files
15+
16+
## Summary
17+
18+
Until we can do all the constant folding on the sidecar we need to be able to do constant folding in Java, even when getting the function list from the sidecar. Some functions will be cpp only functions, and those we want to skip constant folding for, since those will fail.
19+
20+
The goal is to be able to enable the sidecar while preserving the constant folding functionality for functions that have java implementation and SQL implementation. Once this is finished, sidecar can be enabled internally and inline_sql_functions can be fully enabled in C++ clusters. There is currently no support for constant folding in C++ functions, but this will be addressed in a followup design.
21+
22+
23+
## Background
24+
25+
### [Requirements]
26+
27+
* All functions with Java implementations get constant folded on the coordinator
28+
* All functions without Java implementations skip constant folding
29+
* Both functions with and without Java implementations get registered as "builtin", meaning that they don't need to be fully qualified to be used
30+
31+
Breakdown by case
32+
* Function with only java implementation: constant fold, existing behavior
33+
* Function with only C++ implementation: no constant fold
34+
* Function with java and c++ implementation: constant fold, existing behavior
35+
* Function with only SQL implementation: use sql implementation, existing behavior
36+
* Function with SQL and C++ implementation: use c++ implementation, new behavior
37+
38+
39+
### Non-goals
40+
41+
[Question]
42+
* Use presto.default namespace for every built in function or have separate namespace for native sidecar functions?
43+
44+
[Answer]
45+
* If we choose to use presto.default. prefix for every function (built in functions from coordinator and functions from sidecar), then we would need to resolve naming collisions by only choosing one implementation when there exists a coordinator implementation and a worker implementation.
46+
47+
* If we choose to have separate namespaces for built in functions from coordinator and functions from sidecar, then this would require extra changes on the worker code. Currently the worker code is using one prefix for every function, and this prefix is configurable. However, you cannot set a different prefix for different functions coming from the sidecar; they would all be the same prefix.
48+
49+
* Since adding more worker changes adds extra complexity and confusion, we decided to use presto.default. prefix for every function from sidecar that is registered.
50+
51+
52+
## Proposed Implementation
53+
54+
The approach will be somewhat similar to this PR: https://github.com/prestodb/presto/pull/25597
55+
56+
Instead of adding a BuiltInPluginFunctionNamespaceManager, add a BuiltInNativeFunctionNamespaceManager. Register the functions from native sidecar under the presto.default namespace. During function evaluation inside of FunctionAndTypeManager#resolveFunctionInternal and getMatchingFunctionHandle, add some logic to determine whether the function in BuiltInTypeAndFunctionNamespaceManager or the function in BuiltInNativeFunctionNamespaceManager is used. This is similar to the PR, except the logic in PR will throw an error if it sees that there is a conflict. If there is a conflict, follow this logic to determine which functions to use:
57+
58+
59+
* If the one already inside the BuiltInTypeAndFunctionNamespaceManager is a Java implementation, then use the one in BuiltInTypeAndFunctionNamespaceManager
60+
* If the one already inside the BuiltInTypeAndFunctionNamespaceManager is a SQL implementation, then use the one in BuiltInNativeFunctionNamespaceManager
61+
62+
63+
## Other Approaches Considered
64+
One other approach that was considered involved selectively registering either the worker or coordinator implementation within BuiltInTypeAndFunctionNamespaceManager. This can be done by adding the Built in functions in getBuiltInFunctions to the function map as well as adding the ones pulled in from sidecar. After that, deduplicate the functions using the same logic adobe when there is a conflict. This will ensure that a function name can only refer to exactly one implementation; each function name has a unique implementation.
65+
66+
The downside of this approach is that it can add complexity to the logic inside of the BuiltInTypeAndFunctionNamespaceManager. Another concern is that the native sidecar requires that there is at least one active worker sidecar node to read the function registry from. However, during coordinator server startup, all the builtin functions and all the plugin functions are registered before the node manager is ready, which means no worker nodes can be used to pull the function registry.
67+
68+
```
69+
// PrestoServer.java
70+
71+
// this is where the built in functions are registered because they are loaded when ServerMainModule is installed
72+
Bootstrap app = new Bootstrap(modules.build());
73+
74+
....
75+
76+
injector.getInstance(PluginManager.class).loadPlugins();
77+
78+
...
79+
80+
// This is where NodeManager is setup. Only after this point can the sidecar successfully fetch the functions from worker
81+
82+
NodeInfo nodeInfo = injector.getInstance(NodeInfo.class);
83+
PluginNodeManager pluginNodeManager = new PluginNodeManager(nodeManager, nodeInfo.getEnvironment());
84+
planCheckerProviderManager.loadPlanCheckerProviders(pluginNodeManager);
85+
86+
```
87+
88+
This should show why the approach is challenging. We need access to the sidecar functions when the server modules are built in order to know whether to keep the coordinator implementation or the worker implementation, but we can only have access to the sidecar functions after node manager setup. This issue does not affect the initially proposed approach because the BuiltInNativeFunctionNamespaceManager can just do something similar to NativeFunctionNamespaceManager by wrapping the FunctionMap in a memoize so that the functions are only attempted to be loaded when needed.
89+
90+
```
91+
// NativeFunctionNamespaceManager.java
92+
93+
this.memoizedFunctionsSupplier = Suppliers.memoize(this::bootstrapNamespace);
94+
95+
```
96+
97+
## Adoption Plan
98+
99+
In order to control rolling out this feature, add a new configuration file, like "etc/builtin-native-functions.properties". If the file exists, enable the feature. Otherwise, do not attempt to load any native functions as built in functions. Within the file, we can also add config that controls which type of functions use the worker implementation override or not. For example, only use worker implementation when there is a SQL implementation.
100+
101+
However, if this use case of only overriding specific functions is not needed anymore, the feature can be removed, and we can set native.default to be default namespace to just use the existing native function namespace manager.
102+
103+
104+
## Test Plan
105+
106+
1. Unit test to invoke a SQL function that has a C++ implementation
107+
2. Unit test to invoke a Java function that has Java implementation
108+
3. All other cases should just use coordinator implementation

0 commit comments

Comments
 (0)