Skip to content

Commit 28260dc

Browse files
dervoetisbernauermaltesander
authored
feat: Support for fault-tolerant execution (#779)
* CRD update * feat: fault tolerant execution * test: fault-tolerant execution integration test * docs: fault-tolerant execution documentation * chore: changelog * fix: lint fixes * Update docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Co-authored-by: Sebastian Bernauer <[email protected]> * fix: fixed review feedback * feat!: remove explicit Azure and GCS support * feat: use PascalCase for Query/Task / allow configOverrides for exchange manager * fix: always convert durations to seconds * feat!: restructured CRD * feat: adapted graceful shutdown docs * chore: add newlines after attributes * chore: MinIO legacy charts and updated version * feat: use quantities instead of strings * fix: moved to quantities in the FTE docs example * fix: moved to quantities in the FTE docs * chore: cargo fmt * chore: pre-commit fix * Update docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Co-authored-by: Malte Sander <[email protected]> * Update docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Co-authored-by: Malte Sander <[email protected]> * Update docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Co-authored-by: Malte Sander <[email protected]> * Update docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc Co-authored-by: Malte Sander <[email protected]> * fix: integration test fixes * fix: integration test fixes --------- Co-authored-by: Sebastian Bernauer <[email protected]> Co-authored-by: Malte Sander <[email protected]>
1 parent aa4011a commit 28260dc

30 files changed

+2345
-22
lines changed

.github/ISSUE_TEMPLATE/02-bug_report.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ body:
1616
attributes:
1717
label: Affected Trino version
1818
description: Which version of Trino do you see this bug in?
19-
#
19+
#
2020
- type: textarea
2121
attributes:
2222
label: Current and expected behavior

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- Support for fault-tolerant execution ([#779]).
10+
11+
[#779]: https://github.com/stackabletech/trino-operator/pull/779
12+
713
## [25.7.0] - 2025-07-23
814

915
## [25.7.0-rc1] - 2025-07-18

deploy/helm/trino-operator/crds/crds.yaml

Lines changed: 537 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
apiVersion: trino.stackable.tech/v1alpha1
3+
kind: TrinoCluster
4+
metadata:
5+
name: trino-fault-tolerant
6+
spec:
7+
image:
8+
productVersion: "476"
9+
clusterConfig:
10+
catalogLabelSelector:
11+
matchLabels:
12+
trino: trino-fault-tolerant
13+
faultTolerantExecution:
14+
task:
15+
retryAttemptsPerTask: 4
16+
retryInitialDelay: 10s
17+
retryMaxDelay: 60s
18+
retryDelayScaleFactor: 2.0
19+
exchangeDeduplicationBufferSize: 64Mi
20+
exchangeManager:
21+
encryptionEnabled: true
22+
sinkBufferPoolMinSize: 20
23+
sinkBuffersPerPartition: 4
24+
sinkMaxFileSize: 2Gi
25+
sourceConcurrentReaders: 8
26+
s3:
27+
baseDirectories:
28+
- "s3://trino-exchange-bucket/spooling"
29+
connection:
30+
reference: minio-connection
31+
maxErrorRetries: 10
32+
uploadPartSize: 10Mi
33+
coordinators:
34+
roleGroups:
35+
default:
36+
replicas: 1
37+
workers:
38+
roleGroups:
39+
default:
40+
replicas: 3
41+
---
42+
apiVersion: s3.stackable.tech/v1alpha1
43+
kind: S3Connection
44+
metadata:
45+
name: minio-connection
46+
spec:
47+
host: minio
48+
port: 9000
49+
accessStyle: Path
50+
credentials:
51+
secretClass: minio-credentials
52+
tls:
53+
verification:
54+
server:
55+
caCert:
56+
secretClass: minio-tls-certificates
57+
---
58+
apiVersion: secrets.stackable.tech/v1alpha1
59+
kind: SecretClass
60+
metadata:
61+
name: minio-tls-certificates
62+
spec:
63+
backend:
64+
k8sSearch:
65+
searchNamespace:
66+
pod: {}
67+
---
68+
apiVersion: v1
69+
kind: Secret
70+
metadata:
71+
name: minio-tls-certificates
72+
labels:
73+
secrets.stackable.tech/class: minio-tls-certificates
74+
data:
75+
ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQyVENDQXNHZ0F3SUJBZ0lVTmpxdUdZV3R5SjVhNnd5MjNIejJHUmNNbHdNd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2V6RUxNQWtHQTFVRUJoTUNSRVV4R3pBWkJnTlZCQWdNRWxOamFHeGxjM2RwWnkxSWIyeHpkR1ZwYmpFTwpNQXdHQTFVRUJ3d0ZWMlZrWld3eEtEQW1CZ05WQkFvTUgxTjBZV05yWVdKc1pTQlRhV2R1YVc1bklFRjFkR2h2CmNtbDBlU0JKYm1NeEZUQVRCZ05WQkFNTURITjBZV05yWVdKc1pTNWtaVEFnRncweU16QTJNVFl4TWpVeE1ESmEKR0E4eU1USXpNRFV5TXpFeU5URXdNbG93ZXpFTE1Ba0dBMVVFQmhNQ1JFVXhHekFaQmdOVkJBZ01FbE5qYUd4bApjM2RwWnkxSWIyeHpkR1ZwYmpFT01Bd0dBMVVFQnd3RlYyVmtaV3d4S0RBbUJnTlZCQW9NSDFOMFlXTnJZV0pzClpTQlRhV2R1YVc1bklFRjFkR2h2Y21sMGVTQkpibU14RlRBVEJnTlZCQU1NREhOMFlXTnJZV0pzWlM1a1pUQ0MKQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFOblYvdmJ5M1JvNTdhMnF2UVJubjBqZQplS01VMitGMCtsWk5DQXZpR1VENWJtOGprOTFvUFpuazBiaFFxZXlFcm1EUzRXVDB6ZXZFUklCSkpEamZMMEQ4CjQ2QmU3UGlNS2UwZEdqb3FJM3o1Y09JZWpjOGFMUEhTSWxnTjZsVDNmSXJ1UzE2Y29RZ0c0dWFLaUhGNStlV0YKRFJVTGR1NmRzWXV6NmRLanFSaVVPaEh3RHd0VUprRHdQditFSXRxbzBIK01MRkxMWU0wK2xFSWFlN2RONUNRNQpTbzVXaEwyY3l2NVZKN2xqL0VBS0NWaUlFZ0NtekRSRGNSZ1NTald5SDRibjZ5WDIwMjZmUEl5V0pGeUVkTC82CmpBT0pBRERSMEd5aE5PWHJFZXFob2NTTW5JYlFWcXdBVDBrTWh1WFN2d3Zscm5MeVRwRzVqWm00bFVNMzRrTUMKQXdFQUFhTlRNRkV3SFFZRFZSME9CQllFRkVJM1JNTWl5aUJqeVExUlM4bmxPUkpWZDFwQk1COEdBMVVkSXdRWQpNQmFBRkVJM1JNTWl5aUJqeVExUlM4bmxPUkpWZDFwQk1BOEdBMVVkRXdFQi93UUZNQU1CQWY4d0RRWUpLb1pJCmh2Y05BUUVMQlFBRGdnRUJBSHRLUlhkRmR0VWh0VWpvZG1ZUWNlZEFEaEhaT2hCcEtpbnpvdTRicmRrNEhmaEYKTHIvV0ZsY1JlbWxWNm1Cc0xweU11SytUZDhaVUVRNkpFUkx5NmxTL2M2cE9HeG5CNGFDbEU4YXQrQytUakpBTwpWbTNXU0k2VlIxY0ZYR2VaamxkVlE2eGtRc2tNSnpPN2RmNmlNVFB0VjVSa01lSlh0TDZYYW1FaTU0ckJvZ05ICk5yYStFSkJRQmwvWmU5ME5qZVlidjIwdVFwWmFhWkZhYVNtVm9OSERwQndsYTBvdXkrTWpPYkMzU3BnT3ExSUMKUGwzTnV3TkxWOFZiT3I1SHJoUUFvS21nU05iM1A4dmFUVnV4L1gwWWZqeS9TN045a1BCYUs5bUZqNzR6d1Y5dwpxU1ExNEtsNWpPM1YzaHJHV1laRWpET2diWnJyRVgxS1hFdXN0K1E9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
76+
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR5RENDQXJDZ0F3SUJBZ0lVQ0kyUE5OcnR6cDZRbDdHa3VhRnhtRGE2VUJvd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2V6RUxNQWtHQTFVRUJoTUNSRVV4R3pBWkJnTlZCQWdNRWxOamFHeGxjM2RwWnkxSWIyeHpkR1ZwYmpFTwpNQXdHQTFVRUJ3d0ZWMlZrWld3eEtEQW1CZ05WQkFvTUgxTjBZV05yWVdKc1pTQlRhV2R1YVc1bklFRjFkR2h2CmNtbDBlU0JKYm1NeEZUQVRCZ05WQkFNTURITjBZV05yWVdKc1pTNWtaVEFnRncweU16QTJNVFl4TWpVeE1ESmEKR0E4eU1USXpNRFV5TXpFeU5URXdNbG93WGpFTE1Ba0dBMVVFQmhNQ1JFVXhHekFaQmdOVkJBZ01FbE5qYUd4bApjM2RwWnkxSWIyeHpkR1ZwYmpFT01Bd0dBMVVFQnd3RlYyVmtaV3d4RWpBUUJnTlZCQW9NQ1ZOMFlXTnJZV0pzClpURU9NQXdHQTFVRUF3d0ZiV2x1YVc4d2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUIKQVFDanluVnorWEhCOE9DWTRwc0VFWW1qb2JwZHpUbG93d2NTUU4rWURQQ2tCZW9yMFRiODdFZ0x6SksrSllidQpwb1hCbE5JSlBRYW93SkVvL1N6U2s4ZnUyWFNNeXZBWlk0RldHeEp5Mnl4SXh2UC9pYk9HT1l1aVBHWEsyNHQ2ClpjR1RVVmhhdWlaR1Nna1dyZWpXV2g3TWpGUytjMXZhWVpxQitRMXpQczVQRk1sYzhsNVYvK2I4WjdqTUppODQKbU9mSVB4amt2SXlKcjVVa2VGM1VmTHFKUzV5NExGNHR5NEZ0MmlBZDdiYmZIYW5mdlltdjZVb0RWdE1YdFdvMQpvUVBmdjNzaFdybVJMenc2ZXVJQXRiWGM1Q2pCeUlha0NiaURuQVU4cktnK0IxSjRtdlFnckx3bzNxUHJ5Smd4ClNkaWRtWjJtRVI3RXorYzVCMG0vTGlJaEFnTUJBQUdqWHpCZE1Cc0dBMVVkRVFRVU1CS0NCVzFwYm1sdmdnbHMKYjJOaGJHaHZjM1F3SFFZRFZSME9CQllFRkpRMGdENWtFdFFyK3REcERTWjdrd1o4SDVoR01COEdBMVVkSXdRWQpNQmFBRkVJM1JNTWl5aUJqeVExUlM4bmxPUkpWZDFwQk1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQmNkaGQrClI0Sm9HdnFMQms1OWRxSVVlY2N0dUZzcmRQeHNCaU9GaFlOZ1pxZWRMTTBVTDVEenlmQUhmVk8wTGZTRURkZFgKUkpMOXlMNytrTVUwVDc2Y3ZkQzlYVkFJRTZIVXdUbzlHWXNQcXN1eVpvVmpOcEVESkN3WTNDdm9ubEpWZTRkcQovZ0FiSk1ZQitUU21ZNXlEUHovSkZZL1haellhUGI3T2RlR3VqYlZUNUl4cDk3QXBTOFlJaXY3M0Mwd1ViYzZSCmgwcmNmUmJ5a1NRVWg5dmdWZFhSU1I4RFQzV0NmZHFOek5CWVh2OW1xZlc1ejRzYkdqK2wzd1VsL0kzRi9tSXcKZnlPNEN0aTRha2lHVkhsZmZFeTB3a3pWYUJ4aGNYajJJM0JVVGhCNFpxamxzc2llVmFGa3d2WG1teVJUMG9FVwo1SCtOUEhjcXVTMXpQc2NsCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
77+
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2QUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktZd2dnU2lBZ0VBQW9JQkFRQ2p5blZ6K1hIQjhPQ1kKNHBzRUVZbWpvYnBkelRsb3d3Y1NRTitZRFBDa0Jlb3IwVGI4N0VnTHpKSytKWWJ1cG9YQmxOSUpQUWFvd0pFbwovU3pTazhmdTJYU015dkFaWTRGV0d4SnkyeXhJeHZQL2liT0dPWXVpUEdYSzI0dDZaY0dUVVZoYXVpWkdTZ2tXCnJlaldXaDdNakZTK2MxdmFZWnFCK1ExelBzNVBGTWxjOGw1Vi8rYjhaN2pNSmk4NG1PZklQeGprdkl5SnI1VWsKZUYzVWZMcUpTNXk0TEY0dHk0RnQyaUFkN2JiZkhhbmZ2WW12NlVvRFZ0TVh0V28xb1FQZnYzc2hXcm1STHp3NgpldUlBdGJYYzVDakJ5SWFrQ2JpRG5BVThyS2crQjFKNG12UWdyTHdvM3FQcnlKZ3hTZGlkbVoybUVSN0V6K2M1CkIwbS9MaUloQWdNQkFBRUNnZ0VBQWQzdDVzdUNFMjdXY0llc3NxZ3NoSFAwZHRzKyswVzF6K3h6WC8xTnhPRFkKWVhWNkJmbi9mRHJ4dFQ4aVFaZ2VVQzJORTFQaHZveXJXdWMvMm9xYXJjdEd1OUFZV29HNjJLdG9VMnpTSFdZLwpJN3VERTFXV2xOdlJZVFdOYW5DOGV4eGpRRzE4d0RKWjFpdFhTeEl0NWJEM3lrL3dUUlh0dCt1SnpyVjVqb2N1CmNoeERMd293aXUxQWo2ZFJDWk5CejlUSnh5TnI1ME5ZVzJVWEJhVC84N1hyRkZkSndNVFZUMEI3SE9uRzdSQlYKUWxLdzhtcVZiYU5lbmhjdk1qUjI5c3hUekhSK2p4SU8zQndPNk9Hai9PRmhGQllVN1RMWGVsZDFxb2UwdmIyRwpiOGhQcEd1cHRyNUF0OWx3MXc1d1EzSWdpdXRQTkg1cXlEeUNwRWw2RVFLQmdRRGNkYnNsT2ZLSmo3TzJMQXlZCkZ0a1RwaWxFMFYzajBxbVE5M0lqclY0K0RSbUxNRUIyOTk0MDdCVVlRUWoxL0RJYlFjb1oyRUVjVUI1cGRlSHMKN0RNRUQ2WExIYjJKVTEyK2E3c1d5Q05kS2VjZStUNy9JYmxJOFR0MzQwVWxIUTZ6U01TRGNqdmZjRkhWZ3YwcwpDYWpoRng3TmtMRVhUWnI4ZlQzWUloajR2UUtCZ1FDK01nWjFVbW9KdzlJQVFqMnVJVTVDeTl4aldlWURUQU8vCllhWEl6d2xnZTQzOE1jYmI0Y04yU2FOU0dEZ1Y3bnU1a3FpaWhwalBZV0lpaU9CcDlrVFJIWE9kUFc0N3N5ZUkKdDNrd3JwMnpWbFVnbGNNWlo2bW1WM1FWYUFOWmdqVTRSU3Y0ZS9WeFVMamJaYWZqUHRaUnNqWkdwSzBZVTFvdApWajhJZVE3Zk5RS0JnQ1ArWk11ekpsSW5VQ1FTRlF4UHpxbFNtN0pNckpPaHRXV2h3TlRxWFZTc050dHV5VmVqCktIaGpneDR1b0JQcFZSVDJMTlVEWmI0RnByRjVPYVhBK3FOVEdyS0s3SU1iUlZidHArSVVVeEhHNGFGQStIUVgKUVhVVFRhNUpRT1RLVmJnWHpWM1lyTVhTUk1valZNcDMyVWJHeTVTc1p2MXpBamJ2QzhYWjYxSFJBb0dBZEJjUQp2aGU1eFpBUzVEbUtjSGkvemlHa3ViZXJuNk9NUGdxYUtJSEdsVytVOExScFR0ajBkNFRtL1Rydk1PUEovVEU1CllVcUtoenBIcmhDaCtjdHBvY0k2U1dXdm5SenpLbzNpbVFaY0Y1VEFqUTBjY3F0RmI5UzlkRHR5bi9YTUNqYWUKYWlNdll5VUVVRll5TFpDelBGWnNycDNoVVpHKzN5RmZoQXB3TzJrQ2dZQkh3WWFQSWRXNld3NytCMmhpbjBvdwpqYTNjZXN2QTRqYU1Qd1NMVDhPTnRVMUdCU01md2N6TWJuUEhMclJ2Qjg3bjlnUGFSMndRR1VtckZFTzNMUFgvCmtSY09HcFlCSHBEWEVqRGhLa1dkUnVMT0ZnNEhMWmRWOEFOWmxRMFZTY0U4dTNkRERVTzg5cEdEbjA4cVRBcmwKeDlreHN1ZEVWcmtlclpiNVV4RlZxUT09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K
78+
---
79+
apiVersion: secrets.stackable.tech/v1alpha1
80+
kind: SecretClass
81+
metadata:
82+
name: minio-credentials
83+
spec:
84+
backend:
85+
k8sSearch:
86+
searchNamespace:
87+
pod: {}
88+
---
89+
apiVersion: v1
90+
kind: Secret
91+
metadata:
92+
name: minio-credentials-secret
93+
labels:
94+
secrets.stackable.tech/class: minio-credentials
95+
stringData:
96+
accessKey: minio-access-key
97+
secretKey: minio-secret-key
98+
---
99+
apiVersion: trino.stackable.tech/v1alpha1
100+
kind: TrinoCatalog
101+
metadata:
102+
name: tpch
103+
labels:
104+
trino: trino-fault-tolerant
105+
spec:
106+
connector:
107+
tpch: {}

docs/modules/trino/pages/usage-guide/configuration.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ For a role or role group, at the same level of `config`, you can specify `config
1818

1919
For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference].
2020

21+
TIP: For fault-tolerant execution configuration, use the dedicated `faultTolerantExecution` section in the cluster configuration instead of `configOverrides`.
22+
See xref:usage-guide/fault-tolerant-execution.adoc[] for detailed instructions.
23+
2124
[source,yaml]
2225
----
2326
workers:
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
= Fault-tolerant execution
2+
:description: Configure fault-tolerant execution in Trino clusters for improved query resilience and automatic retry capabilities.
3+
:keywords: fault-tolerant execution, retry policy, exchange manager, spooling, query resilience
4+
5+
Fault-tolerant execution is a mechanism in Trino that enables a cluster to mitigate query failures by retrying queries or their component tasks in the event of failure.
6+
With fault-tolerant execution enabled, intermediate exchange data is spooled and can be re-used by another worker in the event of a worker outage or other failures during query execution.
7+
8+
By default, if a Trino node lacks the resources to execute a task or otherwise fails during query execution, the query fails and must be run again manually.
9+
The longer the runtime of a query, the more likely it is to be susceptible to such failures.
10+
11+
NOTE: Fault tolerance does not apply to broken queries or other user errors.
12+
For example, Trino does not spend resources retrying a query that fails because its SQL cannot be parsed.
13+
14+
Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html[Trino documentation for fault-tolerant execution {external-link-icon}^] to learn more.
15+
16+
== Configuration
17+
18+
Fault-tolerant execution is not enabled by default.
19+
It can be enabled in the `TrinoCluster` resource by adding a `faultTolerantExecution` section to the cluster configuration.
20+
The configuration uses a structured approach where you choose either `query` or `task` retry policy, each with their specific configuration options.
21+
22+
=== Query retry policy
23+
24+
A `query` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node.
25+
This policy is recommended when the majority of the Trino cluster's workload consists of many small queries.
26+
27+
By default, Trino does not implement fault tolerance for queries whose result set exceeds 32Mi in size.
28+
This limit can be increased by modifying the `exchangeDeduplicationBufferSize` configuration property to be greater than the default value of `32Mi`, but this results in higher memory usage on the coordinator.
29+
30+
[source,yaml]
31+
----
32+
spec:
33+
clusterConfig:
34+
faultTolerantExecution:
35+
query:
36+
retryAttempts: 3
37+
exchangeDeduplicationBufferSize: 64Mi # Increased from default 32Mi
38+
----
39+
40+
=== Task retry policy
41+
42+
A `task` retry policy instructs Trino to retry individual query tasks in the event of failure.
43+
You **must** configure an exchange manager to use the task retry policy.
44+
This policy is recommended when executing large batch queries, as the cluster can more efficiently retry smaller tasks within the query, rather than retry the whole query.
45+
46+
IMPORTANT: A `task` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume.
47+
As a best practice, it is recommended to run a dedicated cluster with a `task` retry policy for large batch queries, separate from another cluster that handles short queries.
48+
There are tools that can help you achieve this by automatically routing queries based on certain criteria (such as query estimates or user) to different Trino clusters. Notable mentions are link:https://github.com/stackabletech/trino-lb[trino-lb {external-link-icon}^] and link:https://github.com/trinodb/trino-gateway[trino-gateway {external-link-icon}^].
49+
50+
[source,yaml]
51+
----
52+
spec:
53+
clusterConfig:
54+
faultTolerantExecution:
55+
task:
56+
retryAttemptsPerTask: 4
57+
exchangeManager: # Mandatory for Task retry policy
58+
encryptionEnabled: true
59+
s3:
60+
baseDirectories:
61+
- "s3://trino-exchange-bucket/spooling"
62+
connection:
63+
reference: my-s3-connection # <1>
64+
----
65+
<1> Reference to an xref:concepts:s3.adoc[S3Connection] resource
66+
67+
== Exchange manager
68+
69+
Exchange spooling is responsible for storing and managing spooled data for fault-tolerant execution.
70+
You can configure a filesystem-based exchange manager that stores spooled data in a specified location, such as AWS S3 and S3-compatible systems, HDFS, or local filesystem.
71+
72+
NOTE: An exchange manager is required when using the `task` retry policy and optional for the `query` retry policy.
73+
74+
=== S3-compatible storage
75+
76+
You can use S3-compatible storage systems for exchange spooling, including AWS S3 and MinIO.
77+
78+
[source,yaml]
79+
----
80+
spec:
81+
clusterConfig:
82+
faultTolerantExecution:
83+
task:
84+
retryAttemptsPerTask: 4
85+
exchangeManager:
86+
s3:
87+
baseDirectories: # <1>
88+
- "s3://exchange-bucket-1/trino-spooling"
89+
connection:
90+
reference: minio-s3-connection # <2>
91+
---
92+
apiVersion: s3.stackable.tech/v1alpha1
93+
kind: S3Connection
94+
metadata:
95+
name: minio-s3-connection
96+
spec:
97+
host: minio.default.svc.cluster.local
98+
port: 9000
99+
accessStyle: Path
100+
credentials:
101+
secretClass: minio-secret-class
102+
tls:
103+
verification:
104+
server:
105+
caCert:
106+
secretClass: tls
107+
----
108+
<1> Multiple S3 buckets can be specified to distribute I/O load
109+
<2> S3 connection defined as a reference to an xref:concepts:s3.adoc[S3Connection] resource
110+
111+
For storage systems like Google Cloud Storage or Azure Blob Storage, you can use the S3-compatible configuration with `configOverrides` to provide the necessary exchange manager properties.
112+
113+
=== HDFS storage
114+
115+
You can configure HDFS as the exchange spooling destination:
116+
117+
[source,yaml]
118+
----
119+
spec:
120+
clusterConfig:
121+
faultTolerantExecution:
122+
task:
123+
retryAttemptsPerTask: 4
124+
exchangeManager:
125+
hdfs:
126+
baseDirectories:
127+
- "hdfs://simple-hdfs/exchange-spooling"
128+
hdfs:
129+
configMap: simple-hdfs # <1>
130+
----
131+
<1> ConfigMap containing HDFS configuration files (created by the HDFS operator)
132+
133+
=== Local filesystem storage
134+
135+
Local filesystem storage is supported but only recommended for development or single-node deployments:
136+
137+
WARNING: It is only recommended to use a local filesystem for exchange in standalone, non-production clusters.
138+
A local directory can only be used for exchange in a distributed cluster if the exchange directory is shared and accessible from all nodes.
139+
140+
[source,yaml]
141+
----
142+
spec:
143+
clusterConfig:
144+
faultTolerantExecution:
145+
task:
146+
exchangeManager:
147+
local:
148+
baseDirectories:
149+
- "/trino-exchange"
150+
coordinators:
151+
roleGroups:
152+
default:
153+
replicas: 1
154+
podOverrides:
155+
spec:
156+
volumes:
157+
- name: trino-exchange
158+
persistentVolumeClaim:
159+
claimName: trino-exchange-pvc
160+
containers:
161+
- name: trino
162+
volumeMounts:
163+
- name: trino-exchange
164+
mountPath: /trino-exchange
165+
workers:
166+
roleGroups:
167+
default:
168+
replicas: 1
169+
podOverrides:
170+
spec:
171+
volumes:
172+
- name: trino-exchange
173+
persistentVolumeClaim:
174+
claimName: trino-exchange-pvc
175+
containers:
176+
- name: trino
177+
volumeMounts:
178+
- name: trino-exchange
179+
mountPath: /trino-exchange
180+
---
181+
kind: PersistentVolumeClaim
182+
apiVersion: v1
183+
metadata:
184+
name: trino-exchange-pvc
185+
spec:
186+
accessModes:
187+
- ReadWriteOnce
188+
resources:
189+
requests:
190+
storage: 50Gi
191+
----
192+
193+
== Connector support
194+
195+
Support for fault-tolerant execution of SQL statements varies on a per-connector basis.
196+
Take a look at the link:https://trino.io/docs/current/admin/fault-tolerant-execution.html#configuration[Trino documentation {external-link-icon}^] to see which connectors support fault-tolerant execution.
197+
198+
When using connectors that do not explicitly support fault-tolerant execution, you may encounter a "This connector does not support query retries" error message.
199+
200+
== Example
201+
202+
Here's an example of a Trino cluster with fault-tolerant execution enabled using the `task` retry policy and MinIO backed S3 as the exchange manager:
203+
204+
[source,bash]
205+
----
206+
stackablectl operator install commons secret listener trino
207+
helm install minio oci://registry-1.docker.io/bitnamicharts/minio --version 17.0.19 --set auth.rootUser=minio-access-key --set auth.rootPassword=minio-secret-key --set tls.enabled=true --set tls.server.existingSecret=minio-tls-certificates --set tls.existingSecret=minio-tls-certificates --set tls.existingCASecret=minio-tls-certificates --set tls.autoGenerated.enabled=false --set provisioning.enabled=true --set provisioning.buckets[0].name=trino-exchange-bucket --set global.security.allowInsecureImages=true --set image.repository=bitnamilegacy/minio --set clientImage.repository=bitnamilegacy/minio-client --set defaultInitContainers.volumePermissions.image.repository=bitnamilegacy/os-shell --set console.image.repository=bitnamilegacy/minio-object-browser
208+
----
209+
210+
[source,yaml]
211+
----
212+
include::example$usage-guide/fault-tolerant-execution.yaml[]
213+
----

0 commit comments

Comments
 (0)