Skip to content

Commit 8d4f04c

Browse files
authored
Add REST API for remote functions RFC (#25)
1 parent 0588574 commit 8d4f04c

File tree

2 files changed

+219
-0
lines changed

2 files changed

+219
-0
lines changed

RFC-0007-remote-functions.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# **RFC-0007 for Presto**
2+
3+
## [Title] REST API for remote functions
4+
5+
Proposers
6+
7+
* Tim Meehan
8+
* Abe Varghese
9+
* Joe Abraham
10+
* Jakob Khaliqi
11+
12+
## [Related Issues]
13+
14+
* https://github.com/prestodb/presto/issues/14053
15+
16+
## Summary
17+
18+
### Dynamic functions in remote function servers
19+
20+
A new REST API is defined, along with a REST plugin implementation, which allows for consistent and unified metadata and execution
21+
of remote functions in a single API definition. This API is designed to be dynamic, allowing for the definition of new functions
22+
at runtime.
23+
24+
25+
## Background
26+
27+
[prestodb/presto#14053](*https://github.com/prestodb/presto/issues/14053*) introduced the ability to define functions as being executed
28+
in a remote server, and made changes to the planner to accommodate remote function execution. A limitation of the current implementation is that it
29+
presumes the functions returned from the function server is purely static. However, it is common in many cloud data warehousing
30+
systems to allow for defining remote functions at runtime through `CREATE FUNCTION` statements. This RFC proposes to extend the
31+
remote function design to allow for dynamic function registration and execution.
32+
33+
Additionally, the current implementation of the remote function server plugin implementation is agnostic to the API of the function
34+
server. The lack of a reference API, and additionally lack of documentation around function namespace managers, makes it
35+
challenging to create a remote function server: in addition to creating a new function server API, you also need to define a new
36+
namespace manager plugin. To allow for a more consistent and unified experience, this RFC proposes a new REST API for remote function
37+
servers, which will allow for the definition of functions at runtime. The hope is to reduce the work required to integrate a custom
38+
function server with Presto, firstly by reducing or eliminating the work required to write a new plugin, and secondly, by providing
39+
a reference implementation which itself will be extensible and hopefully cover most needs.
40+
41+
### [Optional] Goals
42+
43+
* Standardize on a single preferred API for remote function servers
44+
* Unify the metadata and execution of remote scalar functions under a single API
45+
* Allow for the definition of scalar functions at runtime
46+
47+
### [Optional] Non-goals
48+
49+
* Deprecate existing function server APIs
50+
* Provide support for aggregate functions or table-valued functions
51+
52+
## Proposed Implementation
53+
54+
### Design
55+
56+
![Flow diagram](RFC-0007/Diagram.png)
57+
58+
Fundamentally, the design of the REST API for remote functions will be based on the existing `FunctionNamespaceManager` interface SPI. Additionally, for Presto C++, currently there is no corresponding SPI for function namespace managers. This RFC proposes to
59+
create a new REST-based implementation of the function execution framework. By creating a REST API for remote functions that unifies the metadata and execution of remote functions, C++ deployments can similarly customize their function servers by implementing the REST API, in the same way that current Presto Java users can implement the `FunctionNamespaceManager` SPI.
60+
61+
The REST API will power all `FunctionNamespaceManager` method implementations, including listing functions, retrieving function metadata, executing functions, and providing DDL support for functions (`ADD` and `DROP` support).
62+
63+
The serialization protocol for inputs and outputs will initially be the Presto Page format to support built-in functions, but support will later be added for Arrow format.
64+
65+
#### Service discovery
66+
67+
For function metadata, a central REST server URI will be provided which will server metadata requests. For function execution, the location of the URI supporting the function ID may be returned. If the URI is not present, then the same URI used for metadata should be used.
68+
69+
#### Presto C++ special considerations
70+
71+
The current Presto C++ implementation does not have a `FunctionNamespaceManager` SPI. This RFC proposes to extend the Velox function server to support the REST API for remote functions. This will allow for the definition of functions at runtime in Presto C++ deployments, and will unify function server implementations under one API.
72+
73+
In the future, a plugin concept for remote functions should be added to Presto C++ to allow for custom function server implementations, however this is out of scope for this RFC.
74+
75+
![REST API for remote functions](RFC-0007/Diagram.png)
76+
77+
### Functions API
78+
79+
An OpenAPI specification will be created which defines a REST API for remote function servers.
80+
81+
Functions are defined according to a hierarchy:
82+
83+
* Schema: a namespace for functions. Used for categorizing and cataloging functions. Example: `production`
84+
* Function name: the name used to reference the function. Example: `sum`
85+
* Function ID: a unique identifier of a function given a name and schema. Examples: UUID, serialized function signature
86+
87+
This API will feature the following endpoints:
88+
89+
#### Functions version
90+
91+
> Endpoint: /v1/functions
92+
>
93+
> HTTP verb: HEAD
94+
>
95+
> Request body: empty
96+
>
97+
> Response body: empty
98+
99+
Returns the headers of the GET response. This is useful for checking the version of the API, which will be returned as a header.
100+
101+
#### List all functions
102+
103+
> Endpoint: /v1/functions
104+
>
105+
> HTTP verb: GET
106+
>
107+
> Request body: empty
108+
>
109+
> Response body: JSON array of function metadata objects
110+
111+
Returns the complete listing of functions across all schemas.
112+
113+
#### List functions at schema
114+
115+
> Endpoint: /v1/functions/{schema}
116+
>
117+
> HTTP verb: GET
118+
>
119+
> Request body: empty
120+
>
121+
> Response body: JSON array of function metadata objects
122+
123+
Returns the complete listing of all functions in the specified schema.
124+
125+
#### List functions at schema with name
126+
127+
> Endpoint: /v1/functions/{schema}/{functionName}
128+
>
129+
> HTTP verb: GET
130+
>
131+
> Request body: empty
132+
>
133+
> Response body: JSON array of function metadata objects
134+
135+
Returns the complete listing of function IDs in the specified schema with the specified function name.
136+
137+
#### Add a function
138+
139+
> Endpoint: /v1/functions/{schema}/{functionName}
140+
>
141+
> HTTP verb: POST
142+
>
143+
> Request body: JSON object representing the function to be added
144+
>
145+
> Response body: the function ID of the newly created function
146+
147+
Creates a new function in the specified schema with the specified name. The function object will contain the metadata of the function, including its arguments, return type, and other metadata. It will return an identifier representing this specific function, which is useful to differentiate multiple functions which share the same name but have different arguments.
148+
149+
#### Update a function
150+
151+
> Endpoint: /v1/functions/{schema}/{functionName}/{functionId}
152+
>
153+
> HTTP verb: PUT
154+
>
155+
> Request body: JSON object representing the function to be updated
156+
>
157+
> Response body: the function ID of the newly created function
158+
159+
Updates the function in the specified schema with the specified name and function ID.
160+
161+
#### Delete a function
162+
163+
> Endpoint: /v1/functions/{schema}/{functionName}/{functionId}
164+
>
165+
> HTTP verb: DELETE
166+
>
167+
> Request body: empty
168+
>
169+
> Response body: empty
170+
171+
Deletes the function in the specified schema with the specified name and function ID.
172+
173+
#### Execute a function
174+
175+
> Endpoint: /v1/functions/{schema}/{functionName}/{functionId}/{version}
176+
>
177+
> HTTP verb: POST
178+
>
179+
> Request body: Presto Page of input data
180+
>
181+
> Response body: Presto Page of output data
182+
183+
Executes the function in the specified schema with the specified name and function ID. The version parameter is used to specify the version of the function to execute, and is required to ensure a consistent version of the function is used during query execution. The input data is passed as a Presto Page, and the output data is returned as a Presto Page.
184+
185+
186+
### Function Server Plugin
187+
188+
A new implementation of a function namespace manager will be created which will use the Functions API to list functions, retrieve function metadata, execute functions, and provide DDL support for functions (`ADD` and `DROP` support). It will use the REST API defined in this RFC to delegate implementations of these capabilities to a REST server.
189+
190+
## [Optional] Metrics
191+
192+
The performance of this approach should be measured in terms of the latency of function execution. This should be comparable to the latency of executing a function in a Thrift server.
193+
194+
## [Optional] Other Approaches Considered
195+
196+
Communication protocols:
197+
* Thrift: There is precedence for using Thrift based communication protocols in Presto. The serialization protocol for Presto may use Thrift, and Thrift communication is used by the resource manager. Thrift was not chosen as the communication protocol because of low external adoption.
198+
* gRPC: gRPC has gained notoriety as an efficient and high performance communication protocol. Nonetheless, as support for the REST protocol would be added for backwards compatibility of Java based functions in C++, this would entail a new communication protocol for Presto users. It is unclear how much more performant a gRPC based solution would be, and there is operational flexibility in using REST due to the richness of HTTP based infrastructure.
199+
200+
## Adoption Plan
201+
202+
- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar?
203+
- This is a new plugin and API so there is no impact to existing users.
204+
- If we are changing behaviour how will we phase out the older behaviour?
205+
- There is no change in behavior.
206+
- If we need special migration tools, describe them here.
207+
- N/A
208+
- When will we remove the existing behaviour, if applicable.
209+
- N/A
210+
- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed?
211+
- Documentation will need to be updated to include the new API and plugin. Documentation will also be added for the new
212+
REST function namespace manager.
213+
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
214+
- N/A
215+
216+
## Test Plan
217+
218+
There will be unit tests for all components in the function namespace manager, as well as integration tests for the REST API reference implementation. Additionally, there will be tests for the function server plugin to ensure that it can correctly list functions, retrieve function metadata, execute functions, and provide DDL support for functions. Finally, there will be infrastructure tests that show correctness of the function server plugin in a Presto cluster.
219+

RFC-0007/Diagram.png

57.7 KB
Loading

0 commit comments

Comments
 (0)