Skip to content

Commit fdd338f

Browse files
committed
docs: fault-tolerant execution documentation
1 parent 4fbc3fa commit fdd338f

File tree

4 files changed

+401
-0
lines changed

4 files changed

+401
-0
lines changed

docs/modules/trino/pages/usage-guide/configuration.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ For a role or role group, at the same level of `config`, you can specify `config
1818

1919
For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference].
2020

21+
TIP: For fault-tolerant execution configuration, use the dedicated `faultTolerantExecution` section in the cluster configuration instead of `configOverrides`.
22+
See xref:usage-guide/fault-tolerant-execution.adoc[] for detailed instructions.
23+
2124
[source,yaml]
2225
----
2326
workers:
Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
= Fault-tolerant execution
2+
:description: Configure fault-tolerant execution in Trino clusters for improved query resilience and automatic retry capabilities.
3+
:keywords: fault-tolerant execution, retry policy, exchange manager, spooling, query resilience
4+
5+
Fault-tolerant execution is a mechanism in Trino that enables a cluster to mitigate query failures by retrying queries or their component tasks in the event of failure.
6+
With fault-tolerant execution enabled, intermediate exchange data is spooled and can be re-used by another worker in the event of a worker outage or other fault during query execution.
7+
8+
By default, if a Trino node lacks the resources to execute a task or otherwise fails during query execution, the query fails and must be run again manually.
9+
The longer the runtime of a query, the more likely it is to be susceptible to such failures.
10+
11+
NOTE: Fault tolerance does not apply to broken queries or other user error.
12+
For example, Trino does not spend resources retrying a query that fails because its SQL cannot be parsed.
13+
14+
Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html[Trino documentation for fault-tolerant execution {external-link-icon}^] to learn more.
15+
16+
== Configuration
17+
18+
Fault-tolerant execution is turned off by default.
19+
To enable the feature, you need to configure it in your `TrinoCluster` resource by adding a `faultTolerantExecution` section to the cluster configuration:
20+
21+
[source,yaml]
22+
----
23+
spec:
24+
clusterConfig:
25+
faultTolerantExecution:
26+
retryPolicy: QUERY # <1>
27+
queryRetryAttempts: 3 # <2>
28+
----
29+
<1> The retry policy - either `QUERY` or `TASK`
30+
<2> Maximum number of times to retry a query (QUERY policy only)
31+
32+
== Retry policies
33+
34+
The `retryPolicy` configuration property designates whether Trino retries entire queries or a query's individual tasks in the event of failure.
35+
36+
=== QUERY retry policy
37+
38+
A `QUERY` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node.
39+
A `QUERY` retry policy is recommended when the majority of the Trino cluster's workload consists of many small queries.
40+
41+
By default, Trino does not implement fault tolerance for queries whose result set exceeds 32MB in size.
42+
This limit can be increased by modifying the `exchangeDeduplicationBufferSize` configuration property to be greater than the default value of `32MB`, but this results in higher memory usage on the coordinator.
43+
44+
[source,yaml]
45+
----
46+
...
47+
spec:
48+
clusterConfig:
49+
faultTolerantExecution:
50+
retryPolicy: QUERY
51+
queryRetryAttempts: 3
52+
exchangeDeduplicationBufferSize: 64MB # Increased from default 32MB
53+
...
54+
----
55+
56+
=== TASK retry policy
57+
58+
A `TASK` retry policy instructs Trino to retry individual query tasks in the event of failure.
59+
You **must** configure an exchange manager to use the task retry policy.
60+
This policy is recommended when executing large batch queries, as the cluster can more efficiently retry smaller tasks within the query rather than retry the whole query.
61+
62+
IMPORTANT: A `TASK` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume.
63+
As a best practice, it is recommended to run a dedicated cluster with a `TASK` retry policy for large batch queries, separate from another cluster that handles short queries.
64+
65+
[source,yaml]
66+
----
67+
spec:
68+
clusterConfig:
69+
faultTolerantExecution:
70+
retryPolicy: TASK
71+
taskRetryAttemptsPerTask: 4
72+
exchangeManager:
73+
s3:
74+
baseDirectories:
75+
- "s3://trino-exchange-bucket/spooling"
76+
connection:
77+
reference: my-s3-connection # <1>
78+
----
79+
<1> Reference to an xref:concepts:s3.adoc[S3Connection] resource
80+
81+
== Exchange manager
82+
83+
Exchange spooling is responsible for storing and managing spooled data for fault-tolerant execution.
84+
You can configure a filesystem-based exchange manager that stores spooled data in a specified location, such as AWS S3 and S3-compatible systems, Azure Blob Storage or HDFS.
85+
86+
NOTE: An exchange manager is required when using the `TASK` retry policy and optional for the `QUERY` retry policy.
87+
88+
=== S3-compatible storage
89+
90+
You can use S3-compatible storage systems for exchange spooling, including AWS S3, MinIO, and Google Cloud Storage.
91+
92+
[source,yaml]
93+
----
94+
spec:
95+
clusterConfig:
96+
faultTolerantExecution:
97+
retryPolicy: TASK
98+
exchangeManager:
99+
s3:
100+
baseDirectories: # <1>
101+
- "s3://exchange-bucket-1/trino-spooling"
102+
connection:
103+
reference: minio-s3-connection # <2>
104+
---
105+
apiVersion: s3.stackable.tech/v1alpha1
106+
kind: S3Connection
107+
metadata:
108+
name: minio-s3-connection
109+
spec:
110+
host: minio.default.svc.cluster.local
111+
port: 9000
112+
accessStyle: Path
113+
credentials:
114+
secretClass: minio-secret-class
115+
tls:
116+
verification:
117+
server:
118+
caCert:
119+
secretClass: tls
120+
----
121+
<1> Multiple S3 buckets can be specified to distribute I/O load
122+
<2> S3 connection defined as a reference to an xref:concepts:s3.adoc[S3Connection] resource
123+
124+
For Google Cloud Storage, you can use GCS buckets with S3 compatibility:
125+
126+
[source,yaml]
127+
----
128+
spec:
129+
clusterConfig:
130+
faultTolerantExecution:
131+
exchangeManager:
132+
s3:
133+
baseDirectories:
134+
- "gs://my-gcs-bucket/trino-spooling"
135+
connection:
136+
inline:
137+
host: storage.googleapis.com
138+
port: 443
139+
accessStyle: Path
140+
credentials:
141+
secretClass: gcs-hmac-credentials
142+
tls:
143+
verification:
144+
server:
145+
caCert:
146+
webPki: {}
147+
gcsServiceAccountKey:
148+
secretClass: "gcs-service-account-secret-class"
149+
key: "service-account.json"
150+
----
151+
152+
=== Azure Blob Storage
153+
154+
You can configure Azure Blob Storage as the exchange spooling destination:
155+
156+
[source,yaml]
157+
----
158+
spec:
159+
clusterConfig:
160+
faultTolerantExecution:
161+
retryPolicy: TASK
162+
exchangeManager:
163+
azure:
164+
baseDirectories:
165+
- "abfs://[email protected]/exchange-spooling"
166+
secretClass: azure-credentials # <1>
167+
key: connectionString # <2>
168+
----
169+
<1> SecretClass providing the Azure connection string
170+
<2> Key name in the Secret that contains the connection string (defaults to `connectionString`)
171+
172+
The Azure connection string should be provided via a SecretClass that refers to a Kubernetes Secret containing the Azure storage account connection string, like this:
173+
174+
[source,yaml]
175+
----
176+
apiVersion: secrets.stackable.tech/v1alpha1
177+
kind: SecretClass
178+
metadata:
179+
name: azure-credentials
180+
spec:
181+
backend:
182+
k8sSearch:
183+
searchNamespace:
184+
pod: {}
185+
----
186+
187+
[source,yaml]
188+
----
189+
apiVersion: v1
190+
kind: Secret
191+
metadata:
192+
name: azure-secret
193+
labels:
194+
secrets.stackable.tech/class: azure-credentials
195+
type: Opaque
196+
stringData:
197+
connectionString: "DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=your_account_key;EndpointSuffix=core.windows.net"
198+
----
199+
200+
=== HDFS storage
201+
202+
You can configure HDFS as the exchange spooling destination:
203+
204+
[source,yaml]
205+
----
206+
spec:
207+
clusterConfig:
208+
faultTolerantExecution:
209+
retryPolicy: TASK
210+
exchangeManager:
211+
hdfs:
212+
baseDirectories:
213+
- "hdfs://simple-hdfs/exchange-spooling"
214+
hdfs:
215+
configMap: simple-hdfs # <1>
216+
----
217+
<1> ConfigMap containing HDFS configuration files (created by the HDFS operator)
218+
219+
=== Local filesystem storage
220+
221+
Local filesystem storage is supported but only recommended for development or single-node deployments:
222+
223+
WARNING: It is only recommended to use a local filesystem for exchange in standalone, non-production clusters.
224+
A local directory can only be used for exchange in a distributed cluster if the exchange directory is shared and accessible from all nodes.
225+
226+
[source,yaml]
227+
----
228+
spec:
229+
clusterConfig:
230+
faultTolerantExecution:
231+
retryPolicy: TASK
232+
exchangeManager:
233+
local:
234+
baseDirectories:
235+
- "/trino-exchange"
236+
coordinators:
237+
roleGroups:
238+
default:
239+
replicas: 1
240+
podOverrides:
241+
spec:
242+
volumes:
243+
- name: trino-exchange
244+
persistentVolumeClaim:
245+
claimName: trino-exchange-pvc
246+
containers:
247+
- name: trino
248+
volumeMounts:
249+
- name: trino-exchange
250+
mountPath: /trino-exchange
251+
workers:
252+
roleGroups:
253+
default:
254+
replicas: 1
255+
podOverrides:
256+
spec:
257+
volumes:
258+
- name: trino-exchange
259+
persistentVolumeClaim:
260+
claimName: trino-exchange-pvc
261+
containers:
262+
- name: trino
263+
volumeMounts:
264+
- name: trino-exchange
265+
mountPath: /trino-exchange
266+
---
267+
kind: PersistentVolumeClaim
268+
apiVersion: v1
269+
metadata:
270+
name: trino-exchange-pvc
271+
spec:
272+
accessModes:
273+
- ReadWriteOnce
274+
resources:
275+
requests:
276+
storage: 10Gi
277+
----
278+
279+
== Connector support
280+
281+
Support for fault-tolerant execution of SQL statements varies on a per-connector basis.
282+
Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html#configuration[Trino documentation {external-link-icon}^] to see which connectors support fault-tolerant execution.
283+
284+
When using connectors that do not explicitly support fault-tolerant execution, you may encounter a "This connector does not support query retries" error message.
285+
286+
== Examples
287+
288+
* link:https://github.com/stackabletech/trino-operator/blob/main/examples/simple-trino-cluster-fault-tolerant-execution.yaml[TrinoCluster with TASK retry policy and S3 exchange manager {external-link-icon}^]

docs/modules/trino/partials/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
** xref:trino:usage-guide/connect_to_trino.adoc[]
77
** xref:trino:usage-guide/listenerclass.adoc[]
88
** xref:trino:usage-guide/configuration.adoc[]
9+
** xref:trino:usage-guide/fault-tolerant-execution.adoc[]
910
** xref:trino:usage-guide/s3.adoc[]
1011
** xref:trino:usage-guide/security.adoc[]
1112
** xref:trino:usage-guide/monitoring.adoc[]

0 commit comments

Comments
 (0)