Skip to content

Commit 52b6b35

Browse files
committed
feat: use PascalCase for Query/Task / allow configOverrides for exchange manager
1 parent f0d1b50 commit 52b6b35

File tree

5 files changed

+109
-114
lines changed

5 files changed

+109
-114
lines changed

deploy/helm/trino-operator/crds/crds.yaml

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ spec:
118118
nullable: true
119119
type: boolean
120120
exchangeManager:
121-
description: Exchange manager configuration for spooling intermediate data during fault tolerant execution. Required when using `TASK` retry policy, optional for `QUERY` retry policy.
121+
description: Exchange manager configuration for spooling intermediate data during fault tolerant execution. Required when using `Task` retry policy, optional for `Query` retry policy.
122122
nullable: true
123123
oneOf:
124124
- required:
@@ -128,6 +128,12 @@ spec:
128128
- required:
129129
- local
130130
properties:
131+
configOverrides:
132+
additionalProperties:
133+
type: string
134+
default: {}
135+
description: The `configOverrides` allow overriding arbitrary exchange manager properties.
136+
type: object
131137
hdfs:
132138
description: HDFS-based exchange manager.
133139
properties:
@@ -344,7 +350,7 @@ spec:
344350
type: integer
345351
type: object
346352
queryRetryAttempts:
347-
description: Maximum number of times Trino may attempt to retry a query before declaring it failed. Only applies to `QUERY` retry policy.
353+
description: Maximum number of times Trino may attempt to retry a query before declaring it failed. Only applies to `Query` retry policy.
348354
format: uint32
349355
minimum: 0.0
350356
nullable: true
@@ -363,13 +369,13 @@ spec:
363369
nullable: true
364370
type: string
365371
retryPolicy:
366-
description: The retry policy for fault tolerant execution. `QUERY` retries entire queries, `TASK` retries individual tasks. When set to `TASK`, an exchange manager must be configured.
372+
description: The retry policy for fault tolerant execution. `Query` retries entire queries, `Task` retries individual tasks. When set to `Task`, an exchange manager must be configured.
367373
enum:
368-
- query
369-
- task
374+
- Query
375+
- Task
370376
type: string
371377
taskRetryAttemptsPerTask:
372-
description: Maximum number of times Trino may attempt to retry a single task before declaring the query failed. Only applies to `TASK` retry policy.
378+
description: Maximum number of times Trino may attempt to retry a single task before declaring the query failed. Only applies to `Task` retry policy.
373379
format: uint32
374380
minimum: 0.0
375381
nullable: true

docs/modules/trino/examples/usage-guide/fault-tolerant-execution.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ spec:
1111
matchLabels:
1212
trino: trino-fault-tolerant
1313
faultTolerantExecution:
14-
retryPolicy: TASK
14+
retryPolicy: Task
1515
taskRetryAttemptsPerTask: 4
1616
retryInitialDelay: 10s
1717
retryMaxDelay: 60s

docs/modules/trino/pages/usage-guide/fault-tolerant-execution.adoc

Lines changed: 20 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -23,20 +23,20 @@ To enable the feature, you need to configure it in your `TrinoCluster` resource
2323
spec:
2424
clusterConfig:
2525
faultTolerantExecution:
26-
retryPolicy: QUERY # <1>
26+
retryPolicy: Query # <1>
2727
queryRetryAttempts: 3 # <2>
2828
----
29-
<1> The retry policy - either `QUERY` or `TASK`
30-
<2> Maximum number of times to retry a query (QUERY policy only)
29+
<1> The retry policy - either `Query` or `Task`
30+
<2> Maximum number of times to retry a query (Query policy only)
3131

3232
== Retry policies
3333

3434
The `retryPolicy` configuration property designates whether Trino retries entire queries or a query's individual tasks in the event of failure.
3535

36-
=== QUERY retry policy
36+
=== Query retry policy
3737

38-
A `QUERY` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node.
39-
A `QUERY` retry policy is recommended when the majority of the Trino cluster's workload consists of many small queries.
38+
A `Query` retry policy instructs Trino to automatically retry a query in the event of an error occurring on a worker node.
39+
A `Query` retry policy is recommended when the majority of the Trino cluster's workload consists of many small queries.
4040

4141
By default, Trino does not implement fault tolerance for queries whose result set exceeds 32MB in size.
4242
This limit can be increased by modifying the `exchangeDeduplicationBufferSize` configuration property to be greater than the default value of `32MB`, but this results in higher memory usage on the coordinator.
@@ -47,28 +47,28 @@ This limit can be increased by modifying the `exchangeDeduplicationBufferSize` c
4747
spec:
4848
clusterConfig:
4949
faultTolerantExecution:
50-
retryPolicy: QUERY
50+
retryPolicy: Query
5151
queryRetryAttempts: 3
5252
exchangeDeduplicationBufferSize: 64MB # Increased from default 32MB
5353
...
5454
----
5555

56-
=== TASK retry policy
56+
=== Task retry policy
5757

58-
A `TASK` retry policy instructs Trino to retry individual query tasks in the event of failure.
58+
A `Task` retry policy instructs Trino to retry individual query tasks in the event of failure.
5959
You **must** configure an exchange manager to use the task retry policy.
6060
This policy is recommended when executing large batch queries, as the cluster can more efficiently retry smaller tasks within the query rather than retry the whole query.
6161

62-
IMPORTANT: A `TASK` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume.
63-
As a best practice, it is recommended to run a dedicated cluster with a `TASK` retry policy for large batch queries, separate from another cluster that handles short queries.
62+
IMPORTANT: A `Task` retry policy is best suited for long-running queries, but this policy can result in higher latency for short-running queries executed in high volume.
63+
As a best practice, it is recommended to run a dedicated cluster with a `Task` retry policy for large batch queries, separate from another cluster that handles short queries.
6464
There are tools that can help you achieve this by automatically routing queries based on certain criteria (such as query estimates or user) to different Trino clusters. Notable mentions are link:https://github.com/stackabletech/trino-lb[trino-lb {external-link-icon}^] and link:https://github.com/trinodb/trino-gateway[trino-gateway {external-link-icon}^].
6565

6666
[source,yaml]
6767
----
6868
spec:
6969
clusterConfig:
7070
faultTolerantExecution:
71-
retryPolicy: TASK
71+
retryPolicy: Task
7272
taskRetryAttemptsPerTask: 4
7373
exchangeManager:
7474
s3:
@@ -82,20 +82,20 @@ spec:
8282
== Exchange manager
8383

8484
Exchange spooling is responsible for storing and managing spooled data for fault-tolerant execution.
85-
You can configure a filesystem-based exchange manager that stores spooled data in a specified location, such as AWS S3 and S3-compatible systems, Azure Blob Storage or HDFS.
85+
You can configure a filesystem-based exchange manager that stores spooled data in a specified location, such as AWS S3 and S3-compatible systems, HDFS, or local filesystem.
8686

87-
NOTE: An exchange manager is required when using the `TASK` retry policy and optional for the `QUERY` retry policy.
87+
NOTE: An exchange manager is required when using the `Task` retry policy and optional for the `Query` retry policy.
8888

8989
=== S3-compatible storage
9090

91-
You can use S3-compatible storage systems for exchange spooling, including AWS S3, MinIO, and Google Cloud Storage.
91+
You can use S3-compatible storage systems for exchange spooling, including AWS S3 and MinIO.
9292

9393
[source,yaml]
9494
----
9595
spec:
9696
clusterConfig:
9797
faultTolerantExecution:
98-
retryPolicy: TASK
98+
retryPolicy: Task
9999
exchangeManager:
100100
s3:
101101
baseDirectories: # <1>
@@ -122,81 +122,7 @@ spec:
122122
<1> Multiple S3 buckets can be specified to distribute I/O load
123123
<2> S3 connection defined as a reference to an xref:concepts:s3.adoc[S3Connection] resource
124124

125-
For Google Cloud Storage, you can use GCS buckets with S3 compatibility:
126-
127-
[source,yaml]
128-
----
129-
spec:
130-
clusterConfig:
131-
faultTolerantExecution:
132-
exchangeManager:
133-
s3:
134-
baseDirectories:
135-
- "gs://my-gcs-bucket/trino-spooling"
136-
connection:
137-
inline:
138-
host: storage.googleapis.com
139-
port: 443
140-
accessStyle: Path
141-
credentials:
142-
secretClass: gcs-hmac-credentials
143-
tls:
144-
verification:
145-
server:
146-
caCert:
147-
webPki: {}
148-
gcsServiceAccountKey:
149-
secretClass: "gcs-service-account-secret-class"
150-
key: "service-account.json"
151-
----
152-
153-
=== Azure Blob Storage
154-
155-
You can configure Azure Blob Storage as the exchange spooling destination:
156-
157-
[source,yaml]
158-
----
159-
spec:
160-
clusterConfig:
161-
faultTolerantExecution:
162-
retryPolicy: TASK
163-
exchangeManager:
164-
azure:
165-
baseDirectories:
166-
- "abfs://[email protected]/exchange-spooling"
167-
secretClass: azure-credentials # <1>
168-
key: connectionString # <2>
169-
----
170-
<1> SecretClass providing the Azure connection string
171-
<2> Key name in the Secret that contains the connection string (defaults to `connectionString`)
172-
173-
The Azure connection string should be provided via a SecretClass that refers to a Kubernetes Secret containing the Azure storage account connection string, like this:
174-
175-
[source,yaml]
176-
----
177-
apiVersion: secrets.stackable.tech/v1alpha1
178-
kind: SecretClass
179-
metadata:
180-
name: azure-credentials
181-
spec:
182-
backend:
183-
k8sSearch:
184-
searchNamespace:
185-
pod: {}
186-
----
187-
188-
[source,yaml]
189-
----
190-
apiVersion: v1
191-
kind: Secret
192-
metadata:
193-
name: azure-secret
194-
labels:
195-
secrets.stackable.tech/class: azure-credentials
196-
type: Opaque
197-
stringData:
198-
connectionString: "DefaultEndpointsProtocol=https;AccountName=mystorageaccount;AccountKey=your_account_key;EndpointSuffix=core.windows.net"
199-
----
125+
For storage systems like Google Cloud Storage or Azure Blob Storage, you can use the S3-compatible configuration with `configOverrides` to provide the necessary exchange manager properties.
200126

201127
=== HDFS storage
202128

@@ -207,7 +133,7 @@ You can configure HDFS as the exchange spooling destination:
207133
spec:
208134
clusterConfig:
209135
faultTolerantExecution:
210-
retryPolicy: TASK
136+
retryPolicy: Task
211137
exchangeManager:
212138
hdfs:
213139
baseDirectories:
@@ -229,7 +155,7 @@ A local directory can only be used for exchange in a distributed cluster if the
229155
spec:
230156
clusterConfig:
231157
faultTolerantExecution:
232-
retryPolicy: TASK
158+
retryPolicy: Task
233159
exchangeManager:
234160
local:
235161
baseDirectories:
@@ -286,7 +212,7 @@ When using connectors that do not explicitly support fault-tolerant execution, y
286212

287213
== Example
288214

289-
Here's an example of a Trino cluster with fault-tolerant execution enabled using the `TASK` retry policy and MinIO backed S3 as the exchange manager:
215+
Here's an example of a Trino cluster with fault-tolerant execution enabled using the `Task` retry policy and MinIO backed S3 as the exchange manager:
290216

291217
[source,bash]
292218
----

0 commit comments

Comments
 (0)