-
Notifications
You must be signed in to change notification settings - Fork 83
feat(presto-clp): Update docs and config generation to support reading archives from S3 and Presto split-filtering config. #1228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
af8ce88
e0a1820
4deb2ba
728efe2
2ba53ca
aa3e18d
67e16ce
ba16edf
78139be
8d6744d
2afac89
d4bf46a
5e13836
d7478f8
14b3a80
c09641b
d59af37
80959fc
ada03bb
9d92fc9
968c29c
cdaf3f5
40c352c
e5e1ed6
b9a898f
e012fb0
354a1aa
d1aa25b
29c4232
c4bcb8b
5cb13b9
570c18a
9842ef6
f769282
a388247
72ae05b
ccaf9a3
34617b9
b3694b7
6bcacfb
0a670b0
33592c9
bc86d60
3402f9c
2ac911a
06b5771
bc90d7f
2b3c624
8ac0b03
2306ba1
f4d08b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,7 +16,7 @@ been merged into the main Presto repository so that you can use official Presto | |
|
|
||
| ## Requirements | ||
|
|
||
| * [CLP][clp-releases] (clp-json) v0.4.0 or higher | ||
| * [CLP][clp-releases] (clp-json) v0.5.0 or higher | ||
| * [Docker] v28 or higher | ||
| * [Docker Compose][docker-compose] v2.20.2 or higher | ||
| * Python | ||
|
|
@@ -31,7 +31,7 @@ Using Presto with CLP requires: | |
|
|
||
| ### Setting up CLP | ||
|
|
||
| 1. Follow the [quick-start](./quick-start/index.md) guide to download and extract the CLP package, | ||
| 1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package, | ||
| but don't start the package just yet. | ||
| 2. Before starting the package, update the package's config as follows: | ||
|
|
||
|
|
@@ -55,7 +55,15 @@ Using Presto with CLP requires: | |
| deployment infrastructure. | ||
| ::: | ||
|
|
||
| 3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP and | ||
| 3. If you'd like to store your compressed logs on S3, follow the | ||
| [using object storage](guides-using-object-storage/index.md) guide. | ||
|
|
||
| :::{note} | ||
| Currently, the Presto integration only supports the | ||
| [credentials](guides-using-object-storage/clp-config.md#credentials) authentication type. | ||
| ::: | ||
|
|
||
| 4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP and | ||
| compress your logs. A sample dataset that works well with Presto is [postgresql]. | ||
|
|
||
| ### Setting up Presto | ||
|
|
@@ -78,17 +86,19 @@ Using Presto with CLP requires: | |
|
|
||
| 4. Configure Presto to use CLP's metadata database as follows: | ||
|
|
||
| * Open and edit `coordinator/config-template/metadata-filter.json`. | ||
| * Open and edit `coordinator/config-template/split-filter.json`. | ||
| * For each dataset you want to query, add a filter config of the form: | ||
|
|
||
| ```json | ||
| { | ||
| "clp.default.<dataset>": [ | ||
| { | ||
| "columnName": "<timestamp-key>", | ||
| "rangeMapping": { | ||
| "lowerBound": "begin_timestamp", | ||
| "upperBound": "end_timestamp" | ||
| "customOptions": { | ||
| "rangeMapping": { | ||
| "lowerBound": "begin_timestamp", | ||
| "upperBound": "end_timestamp" | ||
| } | ||
| }, | ||
| "required": false | ||
| } | ||
|
|
@@ -108,7 +118,7 @@ Using Presto with CLP requires: | |
| docker compose up | ||
| ``` | ||
|
|
||
| * To use more than Presto worker, you can use the `--scale` option as follows: | ||
| * To use more than one Presto worker, you can use the `--scale` option as follows: | ||
|
|
||
| ```bash | ||
| docker compose up --scale presto-worker=<num-workers> | ||
|
|
@@ -143,8 +153,20 @@ Each dataset in CLP shows up as a table in Presto. To show all available dataset | |
| SHOW TABLES; | ||
| ``` | ||
|
|
||
| :::{note} | ||
| If you didn't specify a dataset when compressing your logs in CLP, your logs will have been stored | ||
| in the `default` dataset. To query the logs in this dataset: | ||
| in the `default` dataset. | ||
| ::: | ||
|
|
||
| To show all available columns in the `default` dataset: | ||
|
|
||
| ```sql | ||
| DESCRIBE default; | ||
| ``` | ||
|
|
||
| If you wish to show the columns of a different dataset, replace `default` above. | ||
|
|
||
| To query the logs in this dataset: | ||
|
|
||
| ```sql | ||
| SELECT * FROM default LIMIT 1; | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to mention the limitation that doesn't work for postgreqsql (or potential other datasets)?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does work for postgresql. The problem was that we needed to update the worker image to work with the latest clp. For elasticsearch, it works, but
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For mongodb (original dataset w/o preprocessing) the query directly failed (
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have issues for all of the bugs causing those failures?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
@@ -164,11 +186,10 @@ The Presto CLP integration has the following limitations at present: | |
| * Nested fields containing special characters cannot be queried (see [y-scope/presto#8]). Allowed | ||
| characters are alphanumeric characters and underscores. To get around this limitation, you'll | ||
| need to preprocess your logs to remove any special characters. | ||
| * Only logs stored on the filesystem, rather than S3, can be queried through Presto. | ||
|
|
||
| These limitations will be addressed in a future release of the Presto integration. | ||
|
|
||
| [clp-connector-docs]: https://docs.yscope.com/presto/connector/clp.html#metadata-filter-config-file | ||
| [clp-connector-docs]: https://docs.yscope.com/presto/connector/clp.html#split-filter-config-file | ||
| [clp-releases]: https://github.com/y-scope/clp/releases | ||
| [docker-compose]: https://docs.docker.com/compose/install/ | ||
| [Docker]: https://docs.docker.com/engine/install/ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| /worker/config-template/clp.properties |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,5 +1,6 @@ | ||||||||||||||||||||||||||||||||||
| # clp.properties | ||||||||||||||||||||||||||||||||||
| PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE=mysql | ||||||||||||||||||||||||||||||||||
| PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql | ||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 💡 Verification agent 🧩 Analysis chainAdd worker parity and consider parameterizing split-filter config path Good addition; it matches the new property in the coordinator template. Two follow-ups:
Proposed env additions here: # clp.properties
PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE=mysql
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
+PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG=/opt/presto-server/etc/split-filter.jsonAnd update the coordinator clp.properties template accordingly (see my comment on that file). To verify parity and catch drift across files, run: 🏁 Script executed: #!/bin/bash
# Search for worker-side parity and usages.
rg -nP -C3 '\bPRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE\b|clp\.split-filter-provider-type|split-filter\.json' --glob 'tools/deployment/**'Length of output: 925 Add worker parity and parameterize split-filter config The split-filter provider setting and config path are only defined for the coordinator; to prevent runtime divergence, you must add the same settings for the worker. Additionally, rather than hard-coding the config file path in the template, expose it via an environment variable for flexibility. • tools/deployment/presto-clp/coordinator.env PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
+PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG=/opt/presto-server/etc/split-filter.json• tools/deployment/presto-clp/coordinator/config-template/clp.properties - clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
- clp.split-filter-config=/opt/presto-server/etc/split-filter.json
+ clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+ clp.split-filter-config=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}• tools/deployment/presto-clp/worker.env (create or update) +PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}• tools/deployment/presto-clp/worker/config-template/clp.properties (create or update) + clp.split-filter-provider-type=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+ clp.split-filter-config=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG}📝 Committable suggestion
Suggested change
Suggested change
Suggested change
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||
| PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER=mysql | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # config.properties | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,4 +6,5 @@ clp.metadata-db-user=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_USER} | |
| clp.metadata-db-password=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_PASSWORD} | ||
| clp.metadata-table-prefix=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_TABLE_PREFIX} | ||
| clp.split-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER} | ||
| clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json | ||
| clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE} | ||
| clp.split-filter-config=/opt/presto-server/etc/split-filter.json | ||
|
Comment on lines
+9
to
+10
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainParameterize the split-filter config path to avoid hard-coding The new properties look correct and align with the coordinator.env addition. Recommend wiring the file path through an env var to make the template relocatable and easier to override in different deployments. Apply: clp.split-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER}
-clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
-clp.split-filter-config=/opt/presto-server/etc/split-filter.json
+clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+clp.split-filter-config=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}Please confirm the worker’s clp.properties template is updated similarly so both roles read identical settings. 🏁 Script executed: #!/bin/bash
# Compare coordinator and worker template keys for split-filter
rg -nP -C2 'clp\.split-filter-(provider-type|config)|PRESTO_(COORDINATOR|WORKER)_CLPPROPERTIES_SPLIT_FILTER' tools/deploymentLength of output: 1171 Ensure worker CLP templates mirror coordinator for split-filter settings The RG comparison shows no occurrences of • tools/deployment/presto-clp/worker/config-template/clp.properties add: +clp.split-filter-provider-type=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+clp.split-filter-config=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG}• tools/deployment/presto-clp/worker.env +PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
+PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG=/opt/presto-server/etc/split-filter.jsonThis mandatory refactor ensures both coordinator and worker read identical, parametrized settings. 🤖 Prompt for AI Agents |
||
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use
quick-start/index.md#using-clp? Because at L63 it isguides-using-object-storage/clp-config.md#credentialsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure