|
| 1 | +# **RFC0 for Presto** |
| 2 | + |
| 3 | +See [CONTRIBUTING.md](CONTRIBUTING.md) for instructions on creating your RFC and the process surrounding it. |
| 4 | + |
| 5 | +## [Title] |
| 6 | + |
| 7 | +Proposers |
| 8 | + |
| 9 | +* Tim Meehan |
| 10 | +* Bryan Cutler |
| 11 | +* Rebecca Schlussel |
| 12 | + |
| 13 | +## [Related Issues] |
| 14 | + |
| 15 | +* RFC-0003 |
| 16 | + |
| 17 | +## Summary |
| 18 | + |
| 19 | +Add a new SPI to integrate a custom plan checker, and add a plugin to use the Presto sidecar to check if a Presto plan can be |
| 20 | +successfully translated into a Velox plan. |
| 21 | + |
| 22 | +## Background |
| 23 | + |
| 24 | +The optimizer makes decisions in part based on the capabilities of the underlying evaluation engine. With the Presto evaluation |
| 25 | +engine being migrated to Velox while concurrently supporting the Presto Java evaluation engine, there's now differences between |
| 26 | +what is supported between both evaluation engines. Because of these underlying differences, the optimizer may generate plans that |
| 27 | +can't be executed in C++ clusters, or likewise, a Presto Java cluster could be misconfigured to generate plans that only work with |
| 28 | +C++ clusters. |
| 29 | + |
| 30 | +If a plan is generated that can't be executed in the cluster, the query will fail with an error message during the query execution |
| 31 | +phase. This is a poor user experience, as the user has to wait for the query to fail before they can take corrective action, and |
| 32 | +this might have occurred after lengthy queueing. Additionally, this failure would occur after worker resources have already been |
| 33 | +allocated to the query, which is wasteful. This RFC proposes a plan checker that can be run before the query is executed to ensure |
| 34 | +a quick validation of the plan. |
| 35 | + |
| 36 | +### [Optional] Goals |
| 37 | + |
| 38 | +* Provide a mechanism to validate Presto to Velox plan conversion during planning phase |
| 39 | +* Add an SPI to allow custom validators to be added to suit individual business needs |
| 40 | +* Validate fragmented plans prior to scheduling |
| 41 | + |
| 42 | +### [Optional] Non-goals |
| 43 | + |
| 44 | +* Ensure all Velox plans are executable in a Presto C++ cluster--many checks are done at runtime and may not be caught by the |
| 45 | + plan checker |
| 46 | + |
| 47 | +## Proposed Implementation |
| 48 | + |
| 49 | +### Core SPI |
| 50 | + |
| 51 | +A new SPI `PlanConverter` will be added to the Presto codebase that takes in a Presto `PlanFragment` and returns a data |
| 52 | +structure with the following fields: |
| 53 | + |
| 54 | +* An optional error message. Presence indicates that the plan is invalid, absence indicates that the plan is valid. |
| 55 | +* An optional string representing the serialized converted plan fragment. Presence indicates that the plan was successfully |
| 56 | + converted to a Velox plan fragment, absence indicates that the plan was not converted. |
| 57 | + |
| 58 | +### Presto to Velox plan validation |
| 59 | + |
| 60 | +The Presto runtime centralizes plan validation logic into the `PlanChecker` class. There exist three phases to this class: |
| 61 | + |
| 62 | +* `validateIntermediatePlan` |
| 63 | +* `validateFinalPlan` |
| 64 | +* `validateFragmentedPlan` |
| 65 | + |
| 66 | +It is generally useful to allow this class to be configured with an SPI that allows for custom plan validation logic to be added. |
| 67 | +For example, a business may decide that a certain type is not allowed, or add a check to ensure that plans that are overly |
| 68 | +complicated are killed. |
| 69 | + |
| 70 | +An SPI will be added that will add more checks to the `PlanChecker` class that will allow additional checks for each of the planning |
| 71 | +phases. The SPI will contain a field indicating which phase of the plan checker to be added, and a validator that will be run during |
| 72 | +that phase. |
| 73 | + |
| 74 | +A new endpoint to the Presto sidecar will be added that will attempt to convert a Presto plan to a Velox plan. In the |
| 75 | +`presto-native-plugin` module, a new implementation of the SPI will be added which will add a check to the `PlanChecker` class |
| 76 | +which will call the sidecar to attempt to convert the plan. If the conversion fails, the plan checker will fail the plan. It will |
| 77 | +be added at the `validateFragmentedPlan` phase--this is because it's not until plan fragmentation occurs that we know which portions |
| 78 | +of the plan will be executed in the coordinator, and which will be executed in the workers. Plan fragments that are executed in the |
| 79 | +coordinator, such as `COORDINATORY_ONLY` distribution types, will be skipped by the plugin. |
| 80 | + |
| 81 | +The code for the plan checker will run `PrestoToVeloxQueryPlan`, which is used by the workers to convert the Presto plan fragment |
| 82 | +to a Velox plan fragment. If the conversion fails, the plan checker will fail the plan, returning with it the reason for the failure. |
| 83 | + |
| 84 | +#### Failing the plan quickly |
| 85 | + |
| 86 | +An additional code change will be made to allow the planner to execute prior to queuing. This is so that the plan checker can |
| 87 | +be run before the query is queued. This will allow the user to get feedback on the plan before the query is executed, and will |
| 88 | +allow the query to fail quickly if the plan is invalid. |
| 89 | + |
| 90 | +Because the queue limits concurrency, and too much concurrency during planning may require excessive resources in the coordinator, |
| 91 | +the plan checker will be run in a separate thread pool. This thread pool will be configured with a maximum number of threads |
| 92 | +that can be run concurrently. If the thread pool is full, then planning will wait until there is a free thread to run the planner. |
| 93 | +This will be configured with a new configuration parameter and session property. |
| 94 | + |
| 95 | +### EXPLAIN (TYPE NATIVE) |
| 96 | + |
| 97 | +The `EXPLAIN (TYPE NATIVE)` command will be updated to run the `PlanConverter` over all fragments which are not `COORDINATOR_ONLY`. |
| 98 | +This will allow the user to see the plan that will be executed in the workers, and will allow the user to see if the plan can be |
| 99 | +converted to Velox. |
| 100 | + |
| 101 | +Explain plans take in a format parameter. The format parameters that exist today (`TEXT`, `GRAPHVIZ`, and `JSON`) will be added |
| 102 | +to the SPI, and the `PlanConverter` will be run with the appropriate format parameter. When the call to the sidecar is made, the |
| 103 | +format parameter will match to an appropriate content type and added to the `accept` header in the request. For example, if the |
| 104 | +format is `JSON`, then the `accept` header will be set to `application/json`, and the server will be expected to return a JSON |
| 105 | +object. The response's content type header will be validated to be `application/json`, and if it is not, the call will fail. |
| 106 | + |
| 107 | +### Sidecar endpoint |
| 108 | + |
| 109 | +> Endpoint: /v1/velox/plan |
| 110 | +> |
| 111 | +> HTTP verb: POST |
| 112 | +> |
| 113 | +> Request body: serialized plan fragment |
| 114 | +> |
| 115 | +> Response body: serialized Velox plan fragment or error message if conversion failed (along with an HTTP 400 status code) |
| 116 | +
|
| 117 | +The request and response formats will be dictated by the `content-type` header. Initially, the only supported content type for |
| 118 | +the request will be `application/json`. The response will initially be `text/plain`, but in the future can support other formats |
| 119 | +such as `application/json` and `application/graphviz`. The client can specify the response format by setting the `accept` header. |
| 120 | +E.g. `accept: application/json` if the client wants the response in JSON format. |
| 121 | + |
| 122 | +#### Additional information |
| 123 | + |
| 124 | +1. What modules are involved |
| 125 | + 2. `presto-native-sidecar` (note: this is a new module that will be added to the Presto codebase) |
| 126 | + 3. `presto-main` |
| 127 | + 4. `presto-spi` |
| 128 | +2. Any new terminologies/concepts/SQL language additions |
| 129 | + 3. NA |
| 130 | +3. Method/class/interface contracts which you deem fit for implementation. |
| 131 | + 4. A new PlanChecker class will be added which can be implemented in the Java SPI. A default implementation will be added that |
| 132 | + will validate using the Presto sidecar. |
| 133 | +4. Code flow using bullet points or pseudo code as applicable |
| 134 | + 5. A query is fragmented. The fragmented query is sent to the `PlanChecker`, which runs a series of checks |
| 135 | + against the fragmented plan. |
| 136 | + 6. The `PlanChecker` runs the new plan fragment checks which have been registered to be included after all |
| 137 | + preexisting checks have been run. |
| 138 | + 7. If the `presto-native-sidecar` module has been registered, then the `PlanChecker` will call the checker code |
| 139 | + in the `presto-native-sidecar` module. |
| 140 | + 8. The `presto-native-sidecar` module will marshall the plan fragment into JSON and send to the Presto sidecar. |
| 141 | + 9. The Presto sidecar will attempt to convert the plan fragment to a Velox plan fragment. If it succeeds, a 200 |
| 142 | + response is sent. If it fails, a 400 response is sent with the reason for the failure as a JSON object. |
| 143 | +5. Any new user facing metrics that can be shown on CLI or UI. |
| 144 | + 1. NA |
| 145 | + |
| 146 | +## [Optional] Metrics |
| 147 | + |
| 148 | +This is a 0 to 1 feature and will not have any metrics. |
| 149 | + |
| 150 | +## [Optional] Other Approaches Considered |
| 151 | + |
| 152 | +https://github.com/prestodb/presto/pull/23423 added a hook for a similar plan validation. However, this hook |
| 153 | +is added at the plan conversion level at the worker. This RFC proposes a plan validation at the coordinator level |
| 154 | +to provide a quicker feedback loop to the user, and to allow this logic to be composed in other components such as |
| 155 | +a load balancer or external queueing service. |
| 156 | + |
| 157 | +## Adoption Plan |
| 158 | + |
| 159 | +- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar? |
| 160 | + - No impact to users. Because the plan checker is implemented as a plugin, the plugin must explicitly be added to a deployment |
| 161 | + in order to be used. |
| 162 | +- If we are changing behaviour how will we phase out the older behaviour? |
| 163 | + - NA |
| 164 | +- If we need special migration tools, describe them here. |
| 165 | + - A migration to use the Presto Sidecar will be needed, which entails additional infrastructure; specifically, |
| 166 | + deployments will need to deploy the sidecar with the coordinator. |
| 167 | +- When will we remove the existing behaviour, if applicable. |
| 168 | + - NA |
| 169 | +- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed? |
| 170 | + - This feature will be documented in the Presto documentation. |
| 171 | +- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? |
| 172 | + - It is not in scope to catch all runtime errors in the plan checker. This is a best effort to catch as many errors as possible |
| 173 | + before the query is executed. |
| 174 | + |
| 175 | +## Test Plan |
| 176 | + |
| 177 | +Infrastructure tests will be added that proves the end to end capability of the plan checker. This will include a test that |
| 178 | +validates that a plan that can be converted to Velox will pass, and a plan that can't be converted to Velox will fail. Additionally, |
| 179 | +unit tests will be added to the `PlanChecker` class to ensure that the SPIs are run in the correct order, and that the Presto sidecar |
| 180 | +is called when the SPI is registered. |
0 commit comments