|
| 1 | += Fault-tolerant execution |
| 2 | +:description: Configure fault-tolerant execution in Trino clusters for improved query resilience and automatic retry capabilities. |
| 3 | +:keywords: fault-tolerant execution, retry policy, exchange manager, spooling, query resilience |
| 4 | + |
| 5 | +Fault-tolerant execution is a mechanism in Trino that enables a cluster to mitigate query failures by retrying queries or their component tasks in the event of failure. |
| 6 | +With fault-tolerant execution enabled, intermediate exchange data is spooled and can be re-used by another worker in the event of a worker outage or other failures during query execution. |
| 7 | + |
| 8 | +By default, if a Trino node lacks the resources to execute a task or otherwise fails during query execution, the query fails and must be run again manually. |
| 9 | +The longer the runtime of a query, the more likely it is to be susceptible to such failures. |
| 10 | + |
| 11 | +NOTE: Fault tolerance does not apply to broken queries or other user errors. |
| 12 | +For example, Trino does not spend resources retrying a query that fails because its SQL cannot be parsed. |
| 13 | + |
| 14 | +Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html[Trino documentation for fault-tolerant execution {external-link-icon}^] to learn more. |
| 15 | + |
| 16 | +== Configuration |
| 17 | + |
| 18 | +Fault-tolerant execution is not enabled by default. |
| 19 | +It can be enabled in the `TrinoCluster` resource by adding a `faultTolerantExecution` section to the cluster configuration. |
| 20 | +The configuration uses a structured approach where you choose either `query` or `task` retry policy, each with their specific configuration options. |
| 21 | + |
| 22 | +=== Query retry policy |
| 23 | + |
| 24 | +A `query` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node. |
| 25 | +This policy is recommended when the majority of the Trino cluster's workload consists of many small queries. |
| 26 | + |
| 27 | +By default, Trino does not implement fault tolerance for queries whose result set exceeds 32Mi in size. |
| 28 | +This limit can be increased by modifying the `exchangeDeduplicationBufferSize` configuration property to be greater than the default value of `32Mi`, but this results in higher memory usage on the coordinator. |
| 29 | + |
| 30 | +[source,yaml] |
| 31 | +---- |
| 32 | +spec: |
| 33 | + clusterConfig: |
| 34 | + faultTolerantExecution: |
| 35 | + query: |
| 36 | + retryAttempts: 3 |
| 37 | + exchangeDeduplicationBufferSize: 64Mi # Increased from default 32Mi |
| 38 | +---- |
| 39 | + |
| 40 | +=== Task retry policy |
| 41 | + |
| 42 | +A `task` retry policy instructs Trino to retry individual query tasks in the event of failure. |
| 43 | +You **must** configure an exchange manager to use the task retry policy. |
| 44 | +This policy is recommended when executing large batch queries, as the cluster can more efficiently retry smaller tasks within the query, rather than retry the whole query. |
| 45 | + |
| 46 | +IMPORTANT: A `task` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume. |
| 47 | +As a best practice, it is recommended to run a dedicated cluster with a `task` retry policy for large batch queries, separate from another cluster that handles short queries. |
| 48 | +There are tools that can help you achieve this by automatically routing queries based on certain criteria (such as query estimates or user) to different Trino clusters. Notable mentions are link:https://github.com/stackabletech/trino-lb[trino-lb {external-link-icon}^] and link:https://github.com/trinodb/trino-gateway[trino-gateway {external-link-icon}^]. |
| 49 | + |
| 50 | +[source,yaml] |
| 51 | +---- |
| 52 | +spec: |
| 53 | + clusterConfig: |
| 54 | + faultTolerantExecution: |
| 55 | + task: |
| 56 | + retryAttemptsPerTask: 4 |
| 57 | + exchangeManager: # Mandatory for Task retry policy |
| 58 | + encryptionEnabled: true |
| 59 | + s3: |
| 60 | + baseDirectories: |
| 61 | + - "s3://trino-exchange-bucket/spooling" |
| 62 | + connection: |
| 63 | + reference: my-s3-connection # <1> |
| 64 | +---- |
| 65 | +<1> Reference to an xref:concepts:s3.adoc[S3Connection] resource |
| 66 | + |
| 67 | +== Exchange manager |
| 68 | + |
| 69 | +Exchange spooling is responsible for storing and managing spooled data for fault-tolerant execution. |
| 70 | +You can configure a filesystem-based exchange manager that stores spooled data in a specified location, such as AWS S3 and S3-compatible systems, HDFS, or local filesystem. |
| 71 | + |
| 72 | +NOTE: An exchange manager is required when using the `task` retry policy and optional for the `query` retry policy. |
| 73 | + |
| 74 | +=== S3-compatible storage |
| 75 | + |
| 76 | +You can use S3-compatible storage systems for exchange spooling, including AWS S3 and MinIO. |
| 77 | + |
| 78 | +[source,yaml] |
| 79 | +---- |
| 80 | +spec: |
| 81 | + clusterConfig: |
| 82 | + faultTolerantExecution: |
| 83 | + task: |
| 84 | + retryAttemptsPerTask: 4 |
| 85 | + exchangeManager: |
| 86 | + s3: |
| 87 | + baseDirectories: # <1> |
| 88 | + - "s3://exchange-bucket-1/trino-spooling" |
| 89 | + connection: |
| 90 | + reference: minio-s3-connection # <2> |
| 91 | +--- |
| 92 | +apiVersion: s3.stackable.tech/v1alpha1 |
| 93 | +kind: S3Connection |
| 94 | +metadata: |
| 95 | + name: minio-s3-connection |
| 96 | +spec: |
| 97 | + host: minio.default.svc.cluster.local |
| 98 | + port: 9000 |
| 99 | + accessStyle: Path |
| 100 | + credentials: |
| 101 | + secretClass: minio-secret-class |
| 102 | + tls: |
| 103 | + verification: |
| 104 | + server: |
| 105 | + caCert: |
| 106 | + secretClass: tls |
| 107 | +---- |
| 108 | +<1> Multiple S3 buckets can be specified to distribute I/O load |
| 109 | +<2> S3 connection defined as a reference to an xref:concepts:s3.adoc[S3Connection] resource |
| 110 | + |
| 111 | +For storage systems like Google Cloud Storage or Azure Blob Storage, you can use the S3-compatible configuration with `configOverrides` to provide the necessary exchange manager properties. |
| 112 | + |
| 113 | +=== HDFS storage |
| 114 | + |
| 115 | +You can configure HDFS as the exchange spooling destination: |
| 116 | + |
| 117 | +[source,yaml] |
| 118 | +---- |
| 119 | +spec: |
| 120 | + clusterConfig: |
| 121 | + faultTolerantExecution: |
| 122 | + task: |
| 123 | + retryAttemptsPerTask: 4 |
| 124 | + exchangeManager: |
| 125 | + hdfs: |
| 126 | + baseDirectories: |
| 127 | + - "hdfs://simple-hdfs/exchange-spooling" |
| 128 | + hdfs: |
| 129 | + configMap: simple-hdfs # <1> |
| 130 | +---- |
| 131 | +<1> ConfigMap containing HDFS configuration files (created by the HDFS operator) |
| 132 | + |
| 133 | +=== Local filesystem storage |
| 134 | + |
| 135 | +Local filesystem storage is supported but only recommended for development or single-node deployments: |
| 136 | + |
| 137 | +WARNING: It is only recommended to use a local filesystem for exchange in standalone, non-production clusters. |
| 138 | +A local directory can only be used for exchange in a distributed cluster if the exchange directory is shared and accessible from all nodes. |
| 139 | + |
| 140 | +[source,yaml] |
| 141 | +---- |
| 142 | +spec: |
| 143 | + clusterConfig: |
| 144 | + faultTolerantExecution: |
| 145 | + task: |
| 146 | + exchangeManager: |
| 147 | + local: |
| 148 | + baseDirectories: |
| 149 | + - "/trino-exchange" |
| 150 | + coordinators: |
| 151 | + roleGroups: |
| 152 | + default: |
| 153 | + replicas: 1 |
| 154 | + podOverrides: |
| 155 | + spec: |
| 156 | + volumes: |
| 157 | + - name: trino-exchange |
| 158 | + persistentVolumeClaim: |
| 159 | + claimName: trino-exchange-pvc |
| 160 | + containers: |
| 161 | + - name: trino |
| 162 | + volumeMounts: |
| 163 | + - name: trino-exchange |
| 164 | + mountPath: /trino-exchange |
| 165 | + workers: |
| 166 | + roleGroups: |
| 167 | + default: |
| 168 | + replicas: 1 |
| 169 | + podOverrides: |
| 170 | + spec: |
| 171 | + volumes: |
| 172 | + - name: trino-exchange |
| 173 | + persistentVolumeClaim: |
| 174 | + claimName: trino-exchange-pvc |
| 175 | + containers: |
| 176 | + - name: trino |
| 177 | + volumeMounts: |
| 178 | + - name: trino-exchange |
| 179 | + mountPath: /trino-exchange |
| 180 | +--- |
| 181 | +kind: PersistentVolumeClaim |
| 182 | +apiVersion: v1 |
| 183 | +metadata: |
| 184 | + name: trino-exchange-pvc |
| 185 | +spec: |
| 186 | + accessModes: |
| 187 | + - ReadWriteOnce |
| 188 | + resources: |
| 189 | + requests: |
| 190 | + storage: 50Gi |
| 191 | +---- |
| 192 | + |
| 193 | +== Connector support |
| 194 | + |
| 195 | +Support for fault-tolerant execution of SQL statements varies on a per-connector basis. |
| 196 | +Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html#configuration[Trino documentation {external-link-icon}^] to see which connectors support fault-tolerant execution. |
| 197 | + |
| 198 | +When using connectors that do not explicitly support fault-tolerant execution, you may encounter a "This connector does not support query retries" error message. |
| 199 | + |
| 200 | +== Example |
| 201 | + |
| 202 | +Here's an example of a Trino cluster with fault-tolerant execution enabled using the `task` retry policy and MinIO backed S3 as the exchange manager: |
| 203 | + |
| 204 | +[source,bash] |
| 205 | +---- |
| 206 | +stackablectl operator install commons secret listener trino |
| 207 | +helm install minio oci://registry-1.docker.io/bitnamicharts/minio --version 17.0.19 --set auth.rootUser=minio-access-key --set auth.rootPassword=minio-secret-key --set tls.enabled=true --set tls.server.existingSecret=minio-tls-certificates --set tls.existingSecret=minio-tls-certificates --set tls.existingCASecret=minio-tls-certificates --set tls.autoGenerated.enabled=false --set provisioning.enabled=true --set provisioning.buckets[0].name=trino-exchange-bucket --set global.security.allowInsecureImages=true --set image.repository=bitnamilegacy/minio --set clientImage.repository=bitnamilegacy/minio-client --set defaultInitContainers.volumePermissions.image.repository=bitnamilegacy/os-shell --set console.image.repository=bitnamilegacy/minio-object-browser |
| 208 | +---- |
| 209 | + |
| 210 | +[source,yaml] |
| 211 | +---- |
| 212 | +include::example$usage-guide/fault-tolerant-execution.yaml[] |
| 213 | +---- |
0 commit comments