Skip to content

Commit de378af

Browse files
authored
feat(presto-clp): Update docs and config generation to support reading archives from S3 and Presto split-filtering config. (#1228)
1 parent bcf44ee commit de378af

File tree

9 files changed

+375
-202
lines changed

9 files changed

+375
-202
lines changed

docs/src/user-docs/guides-using-presto.md

Lines changed: 32 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ been merged into the main Presto repository so that you can use official Presto
1616

1717
## Requirements
1818

19-
* [CLP][clp-releases] (clp-json) v0.4.0 or higher
19+
* [CLP][clp-releases] (clp-json) v0.5.0 or higher
2020
* [Docker] v28 or higher
2121
* [Docker Compose][docker-compose] v2.20.2 or higher
2222
* Python
@@ -31,7 +31,7 @@ Using Presto with CLP requires:
3131

3232
### Setting up CLP
3333

34-
1. Follow the [quick-start](./quick-start/index.md) guide to download and extract the CLP package,
34+
1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package,
3535
but don't start the package just yet.
3636
2. Before starting the package, update the package's config as follows:
3737

@@ -55,7 +55,15 @@ Using Presto with CLP requires:
5555
deployment infrastructure.
5656
:::
5757

58-
3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP and
58+
3. If you'd like to store your compressed logs on S3, follow the
59+
[using object storage](guides-using-object-storage/index.md) guide.
60+
61+
:::{note}
62+
Currently, the Presto integration only supports the
63+
[credentials](guides-using-object-storage/clp-config.md#credentials) authentication type.
64+
:::
65+
66+
4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP and
5967
compress your logs. A sample dataset that works well with Presto is [postgresql].
6068

6169
### Setting up Presto
@@ -78,17 +86,19 @@ Using Presto with CLP requires:
7886

7987
4. Configure Presto to use CLP's metadata database as follows:
8088

81-
* Open and edit `coordinator/config-template/metadata-filter.json`.
89+
* Open and edit `coordinator/config-template/split-filter.json`.
8290
* For each dataset you want to query, add a filter config of the form:
8391

8492
```json
8593
{
8694
"clp.default.<dataset>": [
8795
{
8896
"columnName": "<timestamp-key>",
89-
"rangeMapping": {
90-
"lowerBound": "begin_timestamp",
91-
"upperBound": "end_timestamp"
97+
"customOptions": {
98+
"rangeMapping": {
99+
"lowerBound": "begin_timestamp",
100+
"upperBound": "end_timestamp"
101+
}
92102
},
93103
"required": false
94104
}
@@ -108,7 +118,7 @@ Using Presto with CLP requires:
108118
docker compose up
109119
```
110120

111-
* To use more than Presto worker, you can use the `--scale` option as follows:
121+
* To use more than one Presto worker, you can use the `--scale` option as follows:
112122

113123
```bash
114124
docker compose up --scale presto-worker=<num-workers>
@@ -143,8 +153,20 @@ Each dataset in CLP shows up as a table in Presto. To show all available dataset
143153
SHOW TABLES;
144154
```
145155

156+
:::{note}
146157
If you didn't specify a dataset when compressing your logs in CLP, your logs will have been stored
147-
in the `default` dataset. To query the logs in this dataset:
158+
in the `default` dataset.
159+
:::
160+
161+
To show all available columns in the `default` dataset:
162+
163+
```sql
164+
DESCRIBE default;
165+
```
166+
167+
If you wish to show the columns of a different dataset, replace `default` above.
168+
169+
To query the logs in this dataset:
148170

149171
```sql
150172
SELECT * FROM default LIMIT 1;
@@ -164,11 +186,10 @@ The Presto CLP integration has the following limitations at present:
164186
* Nested fields containing special characters cannot be queried (see [y-scope/presto#8]). Allowed
165187
characters are alphanumeric characters and underscores. To get around this limitation, you'll
166188
need to preprocess your logs to remove any special characters.
167-
* Only logs stored on the filesystem, rather than S3, can be queried through Presto.
168189

169190
These limitations will be addressed in a future release of the Presto integration.
170191

171-
[clp-connector-docs]: https://docs.yscope.com/presto/connector/clp.html#metadata-filter-config-file
192+
[clp-connector-docs]: https://docs.yscope.com/presto/connector/clp.html#split-filter-config-file
172193
[clp-releases]: https://github.com/y-scope/clp/releases
173194
[docker-compose]: https://docs.docker.com/compose/install/
174195
[Docker]: https://docs.docker.com/engine/install/
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/worker/config-template/clp.properties

tools/deployment/presto-clp/coordinator.env

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# clp.properties
22
PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE=mysql
3+
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
34
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER=mysql
45

56
# config.properties

tools/deployment/presto-clp/coordinator/config-template/clp.properties

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,5 @@ clp.metadata-db-user=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_USER}
66
clp.metadata-db-password=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_PASSWORD}
77
clp.metadata-table-prefix=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_TABLE_PREFIX}
88
clp.split-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER}
9-
clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json
9+
clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
10+
clp.split-filter-config=/opt/presto-server/etc/split-filter.json

tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py

Lines changed: 0 additions & 187 deletions
This file was deleted.

0 commit comments

Comments
 (0)