Skip to content
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
af8ce88
Rename generate-user-env-vars-file since it will also need to generat…
kirkrodrigues Aug 12, 2025
e0a1820
Move CLP config load out of _add_clp_env_vars.
kirkrodrigues Aug 13, 2025
4deb2ba
Add S3 support
anlowee Aug 18, 2025
728efe2
Fix
anlowee Aug 18, 2025
2ba53ca
Fix
anlowee Aug 18, 2025
aa3e18d
Add logs
anlowee Aug 18, 2025
67e16ce
Revert "Add logs"
anlowee Aug 18, 2025
ba16edf
Merge branch 'main' into xwei/s3-support-config
anlowee Aug 18, 2025
78139be
Merge branch 'main' into xwei/s3-support-config
anlowee Aug 18, 2025
8d6744d
Merge branch 'main' into xwei/s3-support-config
anlowee Aug 19, 2025
2afac89
Fix lint
anlowee Aug 19, 2025
d4bf46a
Merge branch 'xwei/s3-support-config' of github.com:anlowee/clp into …
anlowee Aug 19, 2025
5e13836
Merge branch 'main' into xwei/s3-support-config
anlowee Aug 19, 2025
d7478f8
Address coderabbitai comments
anlowee Aug 20, 2025
14b3a80
Address coderabbitai comments
anlowee Aug 20, 2025
c09641b
Merge branch 'main' into xwei/s3-support-config
anlowee Aug 20, 2025
d59af37
Update docs to remove the limitation that only local file system file…
anlowee Aug 20, 2025
80959fc
Merge branch 'main' into xwei/s3-support-config
anlowee Aug 20, 2025
ada03bb
Generate clp.properties by pythong script
anlowee Aug 25, 2025
9d92fc9
Merge branch 'xwei/s3-support-config' of github.com:anlowee/clp into …
anlowee Aug 25, 2025
968c29c
Lint fix
anlowee Aug 25, 2025
cdaf3f5
Update the docs and config
anlowee Aug 25, 2025
40c352c
Merge branch 'main' into xwei/s3-support-config
anlowee Aug 25, 2025
e5e1ed6
Refactor _generate_worker_clp_properties.
kirkrodrigues Sep 2, 2025
b9a898f
Remove worker's clp.properties since it'll be generated.
kirkrodrigues Sep 2, 2025
e012fb0
Undo unnecessary changes in generate-configs.sh.
kirkrodrigues Sep 2, 2025
354a1aa
Refactor s3 config reading.
kirkrodrigues Sep 2, 2025
d1aa25b
Refactor path resolution. Use correct key for staging_directory.
kirkrodrigues Sep 2, 2025
29c4232
Extract CLP S3 env var extraction.
kirkrodrigues Sep 2, 2025
c4bcb8b
Remove obsolete method.
kirkrodrigues Sep 2, 2025
5cb13b9
Apply linter.
kirkrodrigues Sep 2, 2025
570c18a
Edit set-up-config.sh.
kirkrodrigues Sep 2, 2025
9842ef6
Note how to configure S3 config and known issue in docs.
kirkrodrigues Sep 2, 2025
f769282
Remove blank line.
kirkrodrigues Sep 2, 2025
a388247
Fix: Require secret_access_key.
kirkrodrigues Sep 2, 2025
72ae05b
Use correct type annotations.
kirkrodrigues Sep 2, 2025
ccaf9a3
Address coderabbitai comments
anlowee Sep 2, 2025
34617b9
Merge remote-tracking branch 'origin/main' into xwei/s3-support-config
anlowee Sep 2, 2025
b3694b7
Fix a bug
anlowee Sep 2, 2025
6bcacfb
Update issue
anlowee Sep 2, 2025
0a670b0
Merge branch 'main' into xwei/s3-support-config
anlowee Sep 3, 2025
33592c9
docs: Change supported release to clp v0.5.0; Remove SELECT * warning…
kirkrodrigues Sep 3, 2025
bc86d60
Set PRESTO_WORKER_CLPPROPERTIES_STORAGE_TYPE in both fs and s3 cases …
kirkrodrigues Sep 3, 2025
3402f9c
Fix type annotation for _get_config_value.
kirkrodrigues Sep 3, 2025
2ac911a
Use _get_required_config_value for database credentials; Add config f…
kirkrodrigues Sep 3, 2025
06b5771
Minor touch-up.
kirkrodrigues Sep 3, 2025
bc90d7f
Update link to split config file syntax.
kirkrodrigues Sep 3, 2025
2b3c624
Merge branch 'main' into xwei/s3-support-config
anlowee Sep 3, 2025
8ac0b03
Address comments
anlowee Sep 3, 2025
2306ba1
Merge branch 'xwei/s3-support-config' of github.com:anlowee/clp into …
anlowee Sep 3, 2025
f4d08b4
Merge branch 'main' into xwei/s3-support-config
anlowee Sep 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 35 additions & 8 deletions docs/src/user-docs/guides-using-presto.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,14 @@ Using Presto with CLP requires:
deployment infrastructure.
:::

3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP and
3. If you'd like to store your compressed logs on S3, follow the
[using object storage](guides-using-object-storage/index.md) guide.

:::{note}
Currently, the Presto integration only supports the
[credentials](guides-using-object-storage/clp-config.md#credentials) authentication type.

4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use quick-start/index.md#using-clp? Because at L63 it is guides-using-object-storage/clp-config.md#credentials

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

compress your logs. A sample dataset that works well with Presto is [postgresql].

### Setting up Presto
Expand All @@ -78,17 +85,19 @@ Using Presto with CLP requires:

4. Configure Presto to use CLP's metadata database as follows:

* Open and edit `coordinator/config-template/metadata-filter.json`.
* Open and edit `coordinator/config-template/split-filter.json`.
* For each dataset you want to query, add a filter config of the form:

```json
{
"clp.default.<dataset>": [
{
"columnName": "<timestamp-key>",
"rangeMapping": {
"lowerBound": "begin_timestamp",
"upperBound": "end_timestamp"
"customOptions": {
"rangeMapping": {
"lowerBound": "begin_timestamp",
"upperBound": "end_timestamp"
}
},
"required": false
}
Expand Down Expand Up @@ -143,13 +152,30 @@ Each dataset in CLP shows up as a table in Presto. To show all available dataset
SHOW TABLES;
```

:::{note}
If you didn't specify a dataset when compressing your logs in CLP, your logs will have been stored
in the `default` dataset. To query the logs in this dataset:
in the `default` dataset.
:::

To show all available columns in the `default` dataset:

```sql
SELECT * FROM default LIMIT 1;
DESCRIBE default;
```

If you wish to show the columns of a different dataset, replace `default` above.

To query the logs in this dataset:

```sql
SELECT user FROM default LIMIT 1;
```

:::{warning}
`SELECT *` currently causes a crash due to a [known issue][y-scope/velox#28]. This will be resolved
soon. See the [limitations](#limitations) section for all current limitations.
:::

All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs
contain the field `foo.bar`, you can query it using:

Expand All @@ -161,10 +187,10 @@ SELECT foo.bar FROM default LIMIT 1;

The Presto CLP integration has the following limitations at present:

* `SELECT *` currently causes a crash due to a [known issue][y-scope/velox#27].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix broken Velox reference (build failure under MD052).

Text cites [y-scope/velox#27] but only #28 is defined below; also your warning above references #28. Unify to #28.

Apply:

-* `SELECT *` currently causes a crash due to a [known issue][y-scope/velox#27].
+* `SELECT *` currently causes a crash due to a [known issue][y-scope/velox#28].
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `SELECT *` currently causes a crash due to a [known issue][y-scope/velox#27].
* `SELECT *` currently causes a crash due to a [known issue][y-scope/velox#28].
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

190-190: Reference links and images should use a label that is defined
Missing link or image reference definition: "y-scope/velox#27"

(MD052, reference-links-images)

🤖 Prompt for AI Agents
In docs/src/user-docs/guides-using-presto.md around line 190, the markdown
reference uses [y-scope/velox#27] but the defined reference and the warning
above use #28; update the inline citation to [y-scope/velox#28] so it matches
the defined reference and resolves the MD052 build failure.

* Nested fields containing special characters cannot be queried (see [y-scope/presto#8]). Allowed
characters are alphanumeric characters and underscores. To get around this limitation, you'll
need to preprocess your logs to remove any special characters.
* Only logs stored on the filesystem, rather than S3, can be queried through Presto.

These limitations will be addressed in a future release of the Presto integration.

Expand All @@ -175,4 +201,5 @@ These limitations will be addressed in a future release of the Presto integratio
[postgresql]: https://zenodo.org/records/10516401
[Presto]: https://prestodb.io/
[y-scope/presto#8]: https://github.com/y-scope/presto/issues/8
[y-scope/velox#28]: https://github.com/y-scope/velox/issues/28
[yscope-presto]: https://github.com/y-scope/presto
1 change: 1 addition & 0 deletions tools/deployment/presto-clp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/worker/config-template/clp.properties
1 change: 1 addition & 0 deletions tools/deployment/presto-clp/coordinator.env
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# clp.properties
PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE=mysql
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

💡 Verification agent

🧩 Analysis chain

Add worker parity and consider parameterizing split-filter config path

Good addition; it matches the new property in the coordinator template. Two follow-ups:

  • Ensure the worker gets the same provider-type setting to avoid coordinator/worker divergence at runtime.
  • Consider exposing the split-filter config path as an env var for flexibility instead of hard-coding it in the template.

Proposed env additions here:

 # clp.properties
 PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_PROVIDER_TYPE=mysql
 PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
+PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG=/opt/presto-server/etc/split-filter.json

And update the coordinator clp.properties template accordingly (see my comment on that file).

To verify parity and catch drift across files, run:


🏁 Script executed:

#!/bin/bash
# Search for worker-side parity and usages.
rg -nP -C3 '\bPRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE\b|clp\.split-filter-provider-type|split-filter\.json' --glob 'tools/deployment/**'

Length of output: 925


Add worker parity and parameterize split-filter config

The split-filter provider setting and config path are only defined for the coordinator; to prevent runtime divergence, you must add the same settings for the worker. Additionally, rather than hard-coding the config file path in the template, expose it via an environment variable for flexibility.

• tools/deployment/presto-clp/coordinator.env

 PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
+PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG=/opt/presto-server/etc/split-filter.json

• tools/deployment/presto-clp/coordinator/config-template/clp.properties

- clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
- clp.split-filter-config=/opt/presto-server/etc/split-filter.json
+ clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+ clp.split-filter-config=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}

• tools/deployment/presto-clp/worker.env (create or update)

+PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}

• tools/deployment/presto-clp/worker/config-template/clp.properties (create or update)

+ clp.split-filter-provider-type=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+ clp.split-filter-config=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
# File: tools/deployment/presto-clp/coordinator.env
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG=/opt/presto-server/etc/split-filter.json
Suggested change
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
# File: tools/deployment/presto-clp/coordinator/config-template/clp.properties
clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
clp.split-filter-config=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}
Suggested change
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
# File: tools/deployment/presto-clp/worker.env
PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}
Suggested change
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
# File: tools/deployment/presto-clp/worker/config-template/clp.properties
clp.split-filter-provider-type=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
clp.split-filter-config=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG}
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator.env around line 3, the split-filter
provider is hard-coded only for the coordinator; add the same variable to the
worker environment and parameterize the clp.properties path via an env var.
Create or update tools/deployment/presto-clp/worker.env to include
PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE with the same value
as coordinator.env, and add a new env var (e.g. PRESTO_CLP_PROPERTIES_PATH) in
both coordinator.env and worker.env instead of embedding the file path in
templates. Update
tools/deployment/presto-clp/coordinator/config-template/clp.properties and
tools/deployment/presto-clp/worker/config-template/clp.properties to reference
the PRESTO_CLP_PROPERTIES_PATH env var (or use template substitution) so the
config file location is configurable at deployment time.

PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER=mysql

# config.properties
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ clp.metadata-db-user=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_USER}
clp.metadata-db-password=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_DATABASE_PASSWORD}
clp.metadata-table-prefix=${PRESTO_COORDINATOR_CLPPROPERTIES_METADATA_TABLE_PREFIX}
clp.split-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER}
clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json
clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
clp.split-filter-config=/opt/presto-server/etc/split-filter.json
Comment on lines +9 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Parameterize the split-filter config path to avoid hard-coding

The new properties look correct and align with the coordinator.env addition. Recommend wiring the file path through an env var to make the template relocatable and easier to override in different deployments.

Apply:

 clp.split-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_PROVIDER}
-clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
-clp.split-filter-config=/opt/presto-server/etc/split-filter.json
+clp.split-filter-provider-type=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+clp.split-filter-config=${PRESTO_COORDINATOR_CLPPROPERTIES_SPLIT_FILTER_CONFIG}

Please confirm the worker’s clp.properties template is updated similarly so both roles read identical settings.
You can scan for drift with:


🏁 Script executed:

#!/bin/bash
# Compare coordinator and worker template keys for split-filter
rg -nP -C2 'clp\.split-filter-(provider-type|config)|PRESTO_(COORDINATOR|WORKER)_CLPPROPERTIES_SPLIT_FILTER' tools/deployment

Length of output: 1171


Ensure worker CLP templates mirror coordinator for split-filter settings

The RG comparison shows no occurrences of clp.split-filter-provider-type, clp.split-filter-config or their corresponding PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_* variables under tools/deployment/presto-clp/worker. We need to align the worker role with the coordinator to avoid configuration drift and support relocatable deployments.

• tools/deployment/presto-clp/worker/config-template/clp.properties
After the existing

clp.split-provider-type=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_PROVIDER}  

add:

+clp.split-filter-provider-type=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE}
+clp.split-filter-config=${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG}

• tools/deployment/presto-clp/worker.env
Define the new environment variables with suitable defaults, for example:

+PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE=mysql
+PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG=/opt/presto-server/etc/split-filter.json

This mandatory refactor ensures both coordinator and worker read identical, parametrized settings.

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/config-template/clp.properties (after the
line that sets clp.split-provider-type), add the same two properties present in
the coordinator: clp.split-filter-provider-type using
${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE} and
clp.split-filter-config using
${PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG}; then update
tools/deployment/presto-clp/worker.env to declare
PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_PROVIDER_TYPE and
PRESTO_WORKER_CLPPROPERTIES_SPLIT_FILTER_CONFIG with sensible defaults (e.g., a
default provider type value to match coordinator and
/opt/presto-server/etc/split-filter.json for the config path) so worker
templates mirror coordinator and remain parametrized.

187 changes: 0 additions & 187 deletions tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py

This file was deleted.

Loading