You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc
+41-54Lines changed: 41 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,66 +16,51 @@ Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execu
16
16
== Configuration
17
17
18
18
Fault-tolerant execution is not enabled by default.
19
-
To enable the feature, you need to configure it in your `TrinoCluster` resource by adding a `faultTolerantExecution` section to the cluster configuration:
20
-
21
-
[source,yaml]
22
-
----
23
-
spec:
24
-
clusterConfig:
25
-
faultTolerantExecution:
26
-
retryPolicy: Query # <1>
27
-
queryRetryAttempts: 3 # <2>
28
-
----
29
-
<1> The retry policy - either `Query` or `Task`
30
-
<2> Maximum number of times to retry a query (Query policy only)
31
-
32
-
== Retry policies
33
-
34
-
The `retryPolicy` configuration property designates whether Trino retries entire queries or a query's individual tasks in the event of failure.
19
+
To enable the feature, you need to configure it in your `TrinoCluster` resource by adding a `faultTolerantExecution` section to the cluster configuration.
20
+
The configuration uses a structured approach where you choose either `query` or `task` retry policy, each with their specific configuration options.
35
21
36
22
=== Query retry policy
37
23
38
-
A `Query` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node.
39
-
A `Query` retry policy is recommended when the majority of the Trino cluster's workload consists of many small queries.
24
+
A `query` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node.
25
+
This policy is recommended when the majority of the Trino cluster's workload consists of many small queries.
40
26
41
27
By default, Trino does not implement fault tolerance for queries whose result set exceeds 32MB in size.
42
28
This limit can be increased by modifying the `exchangeDeduplicationBufferSize` configuration property to be greater than the default value of `32MB`, but this results in higher memory usage on the coordinator.
43
29
44
30
[source,yaml]
45
31
----
46
-
...
47
32
spec:
48
33
clusterConfig:
49
34
faultTolerantExecution:
50
-
retryPolicy: Query
51
-
queryRetryAttempts: 3
52
-
exchangeDeduplicationBufferSize: 64MB # Increased from default 32MB
53
-
...
35
+
query:
36
+
retryAttempts: 3
37
+
exchangeDeduplicationBufferSize: 64MB # Increased from default 32MB
54
38
----
55
39
56
40
=== Task retry policy
57
41
58
-
A `Task` retry policy instructs Trino to retry individual query tasks in the event of failure.
42
+
A `task` retry policy instructs Trino to retry individual query tasks in the event of failure.
59
43
You **must** configure an exchange manager to use the task retry policy.
60
44
This policy is recommended when executing large batch queries, as the cluster can more efficiently retry smaller tasks within the query rather than retry the whole query.
61
45
62
-
IMPORTANT: A `Task` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume.
63
-
As a best practice, it is recommended to run a dedicated cluster with a `Task` retry policy for large batch queries, separate from another cluster that handles short queries.
46
+
IMPORTANT: A `task` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume.
47
+
As a best practice, it is recommended to run a dedicated cluster with a `task` retry policy for large batch queries, separate from another cluster that handles short queries.
64
48
There are tools that can help you achieve this by automatically routing queries based on certain criteria (such as query estimates or user) to different Trino clusters. Notable mentions are link:https://github.com/stackabletech/trino-lb[trino-lb {external-link-icon}^] and link:https://github.com/trinodb/trino-gateway[trino-gateway {external-link-icon}^].
65
49
66
50
[source,yaml]
67
51
----
68
52
spec:
69
53
clusterConfig:
70
54
faultTolerantExecution:
71
-
retryPolicy: Task
72
-
taskRetryAttemptsPerTask: 4
73
-
exchangeManager:
74
-
s3:
75
-
baseDirectories:
76
-
- "s3://trino-exchange-bucket/spooling"
77
-
connection:
78
-
reference: my-s3-connection # <1>
55
+
task:
56
+
retryAttemptsPerTask: 4
57
+
exchangeManager: # Mandatory for Task retry policy
58
+
encryptionEnabled: true
59
+
s3:
60
+
baseDirectories:
61
+
- "s3://trino-exchange-bucket/spooling"
62
+
connection:
63
+
reference: my-s3-connection # <1>
79
64
----
80
65
<1> Reference to an xref:concepts:s3.adoc[S3Connection] resource
81
66
@@ -84,7 +69,7 @@ spec:
84
69
Exchange spooling is responsible for storing and managing spooled data for fault-tolerant execution.
85
70
You can configure a filesystem-based exchange manager that stores spooled data in a specified location, such as AWS S3 and S3-compatible systems, HDFS, or local filesystem.
86
71
87
-
NOTE: An exchange manager is required when using the `Task` retry policy and optional for the `Query` retry policy.
72
+
NOTE: An exchange manager is required when using the `task` retry policy and optional for the `query` retry policy.
88
73
89
74
=== S3-compatible storage
90
75
@@ -95,13 +80,14 @@ You can use S3-compatible storage systems for exchange spooling, including AWS S
95
80
spec:
96
81
clusterConfig:
97
82
faultTolerantExecution:
98
-
retryPolicy: Task
99
-
exchangeManager:
100
-
s3:
101
-
baseDirectories: # <1>
102
-
- "s3://exchange-bucket-1/trino-spooling"
103
-
connection:
104
-
reference: minio-s3-connection # <2>
83
+
task:
84
+
retryAttemptsPerTask: 4
85
+
exchangeManager:
86
+
s3:
87
+
baseDirectories: # <1>
88
+
- "s3://exchange-bucket-1/trino-spooling"
89
+
connection:
90
+
reference: minio-s3-connection # <2>
105
91
---
106
92
apiVersion: s3.stackable.tech/v1alpha1
107
93
kind: S3Connection
@@ -133,13 +119,14 @@ You can configure HDFS as the exchange spooling destination:
133
119
spec:
134
120
clusterConfig:
135
121
faultTolerantExecution:
136
-
retryPolicy: Task
137
-
exchangeManager:
138
-
hdfs:
139
-
baseDirectories:
140
-
- "hdfs://simple-hdfs/exchange-spooling"
122
+
task:
123
+
retryAttemptsPerTask: 4
124
+
exchangeManager:
141
125
hdfs:
142
-
configMap: simple-hdfs # <1>
126
+
baseDirectories:
127
+
- "hdfs://simple-hdfs/exchange-spooling"
128
+
hdfs:
129
+
configMap: simple-hdfs # <1>
143
130
----
144
131
<1> ConfigMap containing HDFS configuration files (created by the HDFS operator)
145
132
@@ -155,11 +142,11 @@ A local directory can only be used for exchange in a distributed cluster if the
155
142
spec:
156
143
clusterConfig:
157
144
faultTolerantExecution:
158
-
retryPolicy: Task
159
-
exchangeManager:
160
-
local:
161
-
baseDirectories:
162
-
- "/trino-exchange"
145
+
task:
146
+
exchangeManager:
147
+
local:
148
+
baseDirectories:
149
+
- "/trino-exchange"
163
150
coordinators:
164
151
roleGroups:
165
152
default:
@@ -212,7 +199,7 @@ When using connectors that do not explicitly support fault-tolerant execution, y
212
199
213
200
== Example
214
201
215
-
Here's an example of a Trino cluster with fault-tolerant execution enabled using the `Task` retry policy and MinIO backed S3 as the exchange manager:
202
+
Here's an example of a Trino cluster with fault-tolerant execution enabled using the `task` retry policy and MinIO backed S3 as the exchange manager:
0 commit comments