1- # ** RFC0x for Presto**
2-
3-
4- ## Replacing HTTP Exchange with Binary Exchange Protocol
1+ # Add support for Binary Exchange Protocol
52
63Proposers
74* Daniel Bauer (
[email protected] )
@@ -18,7 +15,7 @@ Above protocol enhancement is integrated into the proposed binary exchange proto
1815
1916The binary exchange protocol (BinX) is an alternative for the existing HTTP-based exchange protocol that
2017runs between Prestissimo worker nodes. It offers the same functionality and API
21- but uses binary encoding that can be more efficiently parsed than HTTP nessages .
18+ but uses binary encoding that can be more efficiently parsed than HTTP messages .
2219This translates into a performance benefit for exchange-intensive queries.
2320BinX does not replace the control protocol that runs between the coordinator and the
2421worker nodes. The control protocol continues to use HTTP.
@@ -33,7 +30,7 @@ is more complex than decoding binary encoded messages.
3330
3431### Goals
3532
36- The proposal is to use a binary exchange protocol as a light-weight alternative to the existinig HTTP exchange protocol.
33+ The proposal is to use a binary exchange protocol as a light-weight alternative to the existing HTTP exchange protocol.
3734As a prototypical implementation shows that such a protocol reduces query run-time of exchange heavy queries by
383520% to 30%.
3936
@@ -143,8 +140,9 @@ with the HTTP exchange.
143140
144141#### Implementation Notes
145142
146- The BinX server uses Wangle. It consists of the following components that are implemented in
147- the file ` BinaryExchangeServer.h ` :
143+ Like Proxygen, the BinX server uses Wangle as its underlying networking library.
144+ The BinX server is implemented in the file ` BinaryExchangeServer.h ` and consists of
145+ several components:
148146
149147* The ` BinaryExchangeServer ` is a controller for starting and stopping the Wangle protocol stack.
150148It takes the port number, the IO thread pool and the CPU thread pool as construction parameters.
@@ -159,9 +157,8 @@ service implementation on top of the stack.
159157The results from the TaskManager are packaged into replies and sent back to the requesting BinX exchange source.
160158This exchange service follows the design of the existing ` TaskResource ` service.
161159
162- The ` TaskManagerStub ` class is an implementation detail that enables the BinX server to interact with
163- a mock TaskManager implementation. This is used in the unit tests and allows to test the BinX server
164- implementation along with the BinX exchange source implementation.
160+ All of above components are templated to allow for different TaskManager implementations. In the production code,
161+ the Prestissimo TaskManager is used while for unit testing, a mock task manager is deployed.
165162
166163### Binary Exchange Source and Binary Exchange Client
167164
@@ -175,7 +172,7 @@ The `PrestoServer` registers a factory method for creating exchange sources. Thi
175172such that ` BinaryExchangeSource ` s are created instead of HTTP exchanges when enabled by configuration.
176173One exception are connections to the
177174Presto coordinator that always uses the HTTP based exchange protocol. In a Kubernetes environment with its virtual
178- networking, it is unfortunately not straight forward to detect whether the target host is the Presto connector
175+ networking, it is unfortunately not straight forward to detect whether the target host is the Presto coordinator
179176since the connector's service IP used in the Presto configuration doesn't correspond to the IP address used by the
180177pod running the coordinator. In order to circumvent this problem, a helper class called ` CoordinatorInfoResolver `
181178uses the node status endpoint of the coordinator to retrieve the coordinator's IP address. Using this address
@@ -237,17 +234,17 @@ the additional complexity.
237234 - There is one additional configuration option to enable BinX. Otherwise, there is no impact on session parameters, no API changes
238235 and no changes to SQL.
239236
240- - If we are changing behaviour how will we phase out the older behaviour ?
237+ - If we are changing behavior how will we phase out the older behavior ?
241238
242239 - The HTTP stack is still required for the control message. The cost of keeping the HttpExchangeSource is minimal.
243240
244241- If we need special migration tools, describe them here.
245242
246243 - No tools required.
247244
248- - When will we remove the existing behaviour , if applicable.
245+ - When will we remove the existing behavior , if applicable.
249246
250- - Existing behaviour will remain as the default option.
247+ - Existing behavior will remain as the default option.
251248
252249- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed?
253250
@@ -261,5 +258,5 @@ the additional complexity.
261258
262259Test plan involves running performance measurements using TPC-DS and TPC-H benchmarks that compare the performance of HTTP versus BinX.
263260
264- The TPC-DS benchmark test has been conducted using a dataset with scale factor 1000 on an on-prem cluster with 8 nodes. The results
261+ The TPC-DS benchmark test has been conducted using a dataset with scale factor 1000 on an on-premise cluster with 8 nodes. The results
265262for this 1TB dataset have shown that overall runtime for the 99 queries was ~ 56 minutes when using HTTP compared to ~ 43 minutes for BinX.
0 commit comments