Skip to content

Commit 66c6837

Browse files
razvansbernauer
andauthored
feat: add support for the client spooling protocol (#793)
* add new client protocol module * update crd fields to match decision * add spooling config to STS and some unit tests * add kuttl test * fix spool secret length * integration test successful * rename python script * increase the number of rows to fetch from Trino * update changelog * update docs * remove unused enum * remove Optional from config_overrides * remove the "enabled" property * refactor crd to use spec.clusterConfig.clientProtocol.spooling * remove clientProtocol.configOverrides field * handle config overrides for spooling-manager.properties * update docs * use config-utils to resolve S3 credentials * add comment to function * revert the unsafe function * fix merge and update test * remove test timeout * not all Trino versions support spooling * Apply suggestions from code review Co-authored-by: Sebastian Bernauer <[email protected]> * remove leftovers * remove `clusterConfig.faultTolerantExecution.configOverrides` property * amend FTE changelog * client_protocol: refactor s3 config into crd::s3 and config::s3 to make it reusable * fte: refactor to reuse crd::s3 anc move resolved struct out of the crd module * Apply suggestions from code review Co-authored-by: Sebastian Bernauer <[email protected]> * remove unused imports after suggestion * remove crd::s3 which included iam fields * fte tests: replace inline structs with indoc * controller tests: apply PR review patch * client-spooling: update schema for docs and tests * Apply suggestions from code review Co-authored-by: Sebastian Bernauer <[email protected]> * client spooling: make s3 filesystem consistent with FTE backend * ensure error message are lowercased * client spooling: raise error for trino 451 --------- Co-authored-by: Sebastian Bernauer <[email protected]>
1 parent 4cff1ba commit 66c6837

38 files changed

+1838
-824
lines changed

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@ All notable changes to this project will be documented in this file.
66

77
### Added
88

9-
- Support for fault-tolerant execution ([#779]).
9+
- Support for fault-tolerant execution ([#779], [#793]).
10+
- Support for the client spooling protocol ([#793]).
1011
- Helm: Allow Pod `priorityClassName` to be configured ([#798]).
1112

1213
[#779]: https://github.com/stackabletech/trino-operator/pull/779
14+
[#793]: https://github.com/stackabletech/trino-operator/pull/793
1315
[#798]: https://github.com/stackabletech/trino-operator/pull/798
1416

1517
## [25.7.0] - 2025-07-23

deploy/helm/trino-operator/crds/crds.yaml

Lines changed: 148 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,154 @@ spec:
105105
description: matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.
106106
type: object
107107
type: object
108+
clientProtocol:
109+
description: Client spooling protocol configuration.
110+
nullable: true
111+
oneOf:
112+
- required:
113+
- spooling
114+
properties:
115+
spooling:
116+
properties:
117+
filesystem:
118+
oneOf:
119+
- required:
120+
- s3
121+
properties:
122+
s3:
123+
properties:
124+
connection:
125+
oneOf:
126+
- required:
127+
- inline
128+
- required:
129+
- reference
130+
properties:
131+
inline:
132+
description: S3 connection definition as a resource. Learn more on the [S3 concept documentation](https://docs.stackable.tech/home/nightly/concepts/s3).
133+
properties:
134+
accessStyle:
135+
default: VirtualHosted
136+
description: Which access style to use. Defaults to virtual hosted-style as most of the data products out there. Have a look at the [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html).
137+
enum:
138+
- Path
139+
- VirtualHosted
140+
type: string
141+
credentials:
142+
description: If the S3 uses authentication you have to specify you S3 credentials. In the most cases a [SecretClass](https://docs.stackable.tech/home/nightly/secret-operator/secretclass) providing `accessKey` and `secretKey` is sufficient.
143+
nullable: true
144+
properties:
145+
scope:
146+
description: '[Scope](https://docs.stackable.tech/home/nightly/secret-operator/scope) of the [SecretClass](https://docs.stackable.tech/home/nightly/secret-operator/secretclass).'
147+
nullable: true
148+
properties:
149+
listenerVolumes:
150+
default: []
151+
description: The listener volume scope allows Node and Service scopes to be inferred from the applicable listeners. This must correspond to Volume names in the Pod that mount Listeners.
152+
items:
153+
type: string
154+
type: array
155+
node:
156+
default: false
157+
description: The node scope is resolved to the name of the Kubernetes Node object that the Pod is running on. This will typically be the DNS name of the node.
158+
type: boolean
159+
pod:
160+
default: false
161+
description: The pod scope is resolved to the name of the Kubernetes Pod. This allows the secret to differentiate between StatefulSet replicas.
162+
type: boolean
163+
services:
164+
default: []
165+
description: The service scope allows Pod objects to specify custom scopes. This should typically correspond to Service objects that the Pod participates in.
166+
items:
167+
type: string
168+
type: array
169+
type: object
170+
secretClass:
171+
description: '[SecretClass](https://docs.stackable.tech/home/nightly/secret-operator/secretclass) containing the LDAP bind credentials.'
172+
type: string
173+
required:
174+
- secretClass
175+
type: object
176+
host:
177+
description: 'Host of the S3 server without any protocol or port. For example: `west1.my-cloud.com`.'
178+
type: string
179+
port:
180+
description: Port the S3 server listens on. If not specified the product will determine the port to use.
181+
format: uint16
182+
minimum: 0.0
183+
nullable: true
184+
type: integer
185+
region:
186+
default:
187+
name: us-east-1
188+
description: |-
189+
Bucket region used for signing headers (sigv4).
190+
191+
This defaults to `us-east-1` which is compatible with other implementations such as Minio.
192+
193+
WARNING: Some products use the Hadoop S3 implementation which falls back to us-east-2.
194+
properties:
195+
name:
196+
default: us-east-1
197+
type: string
198+
type: object
199+
tls:
200+
description: Use a TLS connection. If not specified no TLS will be used.
201+
nullable: true
202+
properties:
203+
verification:
204+
description: The verification method used to verify the certificates of the server and/or the client.
205+
oneOf:
206+
- required:
207+
- none
208+
- required:
209+
- server
210+
properties:
211+
none:
212+
description: Use TLS but don't verify certificates.
213+
type: object
214+
server:
215+
description: Use TLS and a CA certificate to verify the server.
216+
properties:
217+
caCert:
218+
description: CA cert to verify the server.
219+
oneOf:
220+
- required:
221+
- webPki
222+
- required:
223+
- secretClass
224+
properties:
225+
secretClass:
226+
description: Name of the [SecretClass](https://docs.stackable.tech/home/nightly/secret-operator/secretclass) which will provide the CA certificate. Note that a SecretClass does not need to have a key but can also work with just a CA certificate, so if you got provided with a CA cert but don't have access to the key you can still use this method.
227+
type: string
228+
webPki:
229+
description: Use TLS and the CA certificates trusted by the common web browsers to verify the server. This can be useful when you e.g. use public AWS S3 or other public available services.
230+
type: object
231+
type: object
232+
required:
233+
- caCert
234+
type: object
235+
type: object
236+
required:
237+
- verification
238+
type: object
239+
required:
240+
- host
241+
type: object
242+
reference:
243+
type: string
244+
type: object
245+
required:
246+
- connection
247+
type: object
248+
type: object
249+
location:
250+
type: string
251+
required:
252+
- filesystem
253+
- location
254+
type: object
255+
type: object
108256
faultTolerantExecution:
109257
description: Fault tolerant execution configuration. When enabled, Trino can automatically retry queries or tasks in case of failures.
110258
nullable: true
@@ -132,12 +280,6 @@ spec:
132280
- required:
133281
- local
134282
properties:
135-
configOverrides:
136-
additionalProperties:
137-
type: string
138-
default: {}
139-
description: The `configOverrides` allow overriding arbitrary exchange manager properties.
140-
type: object
141283
encryptionEnabled:
142284
description: Whether to enable encryption of spooling data.
143285
nullable: true
@@ -312,14 +454,6 @@ spec:
312454
reference:
313455
type: string
314456
type: object
315-
externalId:
316-
description: External ID for the IAM role trust policy.
317-
nullable: true
318-
type: string
319-
iamRole:
320-
description: IAM role to assume for S3 access.
321-
nullable: true
322-
type: string
323457
maxErrorRetries:
324458
description: Maximum number of times the S3 client should retry a request.
325459
format: uint32
@@ -394,12 +528,6 @@ spec:
394528
- required:
395529
- local
396530
properties:
397-
configOverrides:
398-
additionalProperties:
399-
type: string
400-
default: {}
401-
description: The `configOverrides` allow overriding arbitrary exchange manager properties.
402-
type: object
403531
encryptionEnabled:
404532
description: Whether to enable encryption of spooling data.
405533
nullable: true
@@ -574,14 +702,6 @@ spec:
574702
reference:
575703
type: string
576704
type: object
577-
externalId:
578-
description: External ID for the IAM role trust policy.
579-
nullable: true
580-
type: string
581-
iamRole:
582-
description: IAM role to assume for S3 access.
583-
nullable: true
584-
type: string
585705
maxErrorRetries:
586706
description: Maximum number of times the S3 client should retry a request.
587707
format: uint32
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
= Client Spooling Protocol
2+
:description: Enable and configure the Client Spooling Protocol in Trino for efficient handling of large result sets.
3+
:keywords: client spooling protocol, Trino, large result sets, memory management
4+
:trino-docs-spooling-url: https://trino.io/docs/476/client/client-protocol.html
5+
6+
The Client Spooling Protocol in Trino is designed to efficiently handle large result sets. When enabled, this protocol allows the Trino server to spool results to external storage systems, reducing memory consumption and improving performance for queries that return large datasets.
7+
8+
For more details, refer to the link:{trino-docs-spooling-url}[Trino documentation on Client Spooling Protocol {external-link-icon}^].
9+
10+
[IMPORTANT]
11+
====
12+
The client spooling protocol was introduced in Trino 466 but it only works reliably starting with Trino 476.
13+
====
14+
15+
== Configuration
16+
17+
The client spooling protocol is disabled by default.
18+
To enable it, you need to set the `spec.clusterConfig.clientSpoolingProtocol` configuration property as shown below.
19+
20+
[source,yaml]
21+
----
22+
spec:
23+
clusterConfig:
24+
clientProtocol:
25+
spooling:
26+
location: "s3://spooling-bucket/trino/" # <1>
27+
filesystem:
28+
s3: # <2>
29+
connection:
30+
reference: "minio"
31+
----
32+
<1> Specifies the location where spooled data will be stored. This example uses an S3 bucket.
33+
<2> Configures the filesystem type for spooling. Only S3 is supported currently via the custom resource definition.
34+
35+
The operator automatically fills in additional settings required by Trino, such as the `protocol.spooling.shared-secret-key`.
36+
To add or replace properties in the generated `spooling-manager.properties` file, use the `configOverrides` property as describe here : xref:usage-guide/configuration.adoc[].
37+
38+
[IMPORTANT]
39+
====
40+
Even if enabled, Trino may decide to not use the client spooling protocol for certain queries. Clients cannot force Trino to use it.
41+
====
42+
43+
The clients need to have access to the same storage location configured for spooling.

docs/modules/trino/pages/usage-guide/configuration.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ For a role or role group, at the same level of `config`, you can specify `config
1616
* `password-authenticator.properties`
1717
* `security.properties`
1818
* `exchange-manager.properties`
19+
* `spooling-manager.properties`
1920

2021
For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference].
2122

docs/modules/trino/partials/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
** xref:trino:usage-guide/listenerclass.adoc[]
88
** xref:trino:usage-guide/configuration.adoc[]
99
** xref:trino:usage-guide/fault-tolerant-execution.adoc[]
10+
** xref:trino:usage-guide/client-spooling-protocol.adoc[]
1011
** xref:trino:usage-guide/s3.adoc[]
1112
** xref:trino:usage-guide/security.adoc[]
1213
** xref:trino:usage-guide/monitoring.adoc[]

rust/operator-binary/src/authentication/mod.rs

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,26 +43,26 @@ const HTTP_SERVER_AUTHENTICATION_TYPE: &str = "http-server.authentication.type";
4343
#[derive(Snafu, Debug)]
4444
pub enum Error {
4545
#[snafu(display(
46-
"The Trino Operator does not support the AuthenticationClass provider [{authentication_class_provider}] from AuthenticationClass [{authentication_class}]."
46+
"the Trino Operator does not support the AuthenticationClass provider [{authentication_class_provider}] from AuthenticationClass [{authentication_class}]."
4747
))]
4848
AuthenticationClassProviderNotSupported {
4949
authentication_class_provider: String,
5050
authentication_class: ObjectRef<core::v1alpha1::AuthenticationClass>,
5151
},
5252

53-
#[snafu(display("Failed to format trino authentication java properties"))]
53+
#[snafu(display("failed to format trino authentication java properties"))]
5454
FailedToWriteJavaProperties {
5555
source: product_config::writer::PropertiesWriterError,
5656
},
5757

58-
#[snafu(display("Failed to configure trino password authentication"))]
58+
#[snafu(display("failed to configure trino password authentication"))]
5959
InvalidPasswordAuthenticationConfig { source: password::Error },
6060

61-
#[snafu(display("Failed to configure trino OAuth2 authentication"))]
61+
#[snafu(display("failed to configure trino OAuth2 authentication"))]
6262
InvalidOauth2AuthenticationConfig { source: oidc::Error },
6363

6464
#[snafu(display(
65-
"OIDC authentication details not specified. The AuthenticationClass {auth_class_name:?} uses an OIDC provider, you need to specify OIDC authentication details (such as client credentials) as well"
65+
"oidc authentication details not specified. The AuthenticationClass {auth_class_name:?} uses an OIDC provider, you need to specify OIDC authentication details (such as client credentials) as well"
6666
))]
6767
OidcAuthenticationDetailsNotSpecified { auth_class_name: String },
6868

0 commit comments

Comments
 (0)