Skip to content
Merged
105 changes: 78 additions & 27 deletions docs/src/user-docs/guides-using-presto.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,27 +33,60 @@ Using Presto with CLP requires:

1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package,
but don't start the package just yet.
2. Before starting the package, update the package's config as follows:
2. Before starting the package, update the package's config file (`etc/clp-config.yml`) as follows:

* Open `etc/clp-config.yml` located within the package.
* Uncomment the `database` section.
* Change `database.host` value to a non-localhost hostname/IP.
* After the change, the `database` section should look something like this:
* Set the `package.query_engine` key to `"presto"`.

```yaml
database:
type: "mariadb" # "mariadb" or "mysql"
host: "<new-IP-address>"
port: 3306
name: "clp-db"
package:
storage_engine: "clp-s"
query_engine: "presto"
```

:::{note}
This change is necessary since the Presto containers run on a Docker network, whereas CLP's
database runs on the host network. So `localhost` refers to two different entities in those
networks. This limitation will be addressed in the future when we unify Presto and CLP's
deployment infrastructure.
:::
* Set the `database.host` key to a non-localhost hostname/IP.

```yaml
database:
# type: "mariadb"
host: "<non-local-ip-address>"
# port: 3306
# name: "clp-db"
```

:::{note}
This change is necessary because the Presto containers run on a Docker network, and CLP's
database runs on the host network. `localhost` will refer to a different entity in each of
those contexts. This limitation will be addressed in the future when we unify Presto and CLP's
deployment infrastructure.
:::
Comment on lines +46 to +61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i figure this docs PR should provide better context, although the code for the Presto Docker Compose project was originally added in #1132.)

The mitigation documented above is practical but may be inconvenient for users operating in diverse network environments.

In the long term, we plan to integrate the Presto services into the CLP Package Docker Compose project introduced in #1178. Running du -h tools/deployment/presto-clp shows the directory is only 132 KB, indicating that the integration would not significantly increase the package size while substantially improving user convenience.
@coderabbitai please file an issue to track this long-term integration plan.

Since implementing and reviewing this integration may take time, a short-term solution is needed before the next release. After #1178, the webui service may also encounter connectivity issues with the Presto connector, as host network mode was removed in that update.
A likely mitigation is to allow the Presto project to attach to an external Docker network defined in the CLP Package Compose project.
@coderabbitai please create a separate issue to track this short-term fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Get the repository owner and name from the current directory
REPO="y-scope/clp"
PR_NUMBER="1310"
COMMENT_URL="https://github.com/y-scope/clp/pull/1310#discussion_r2462666131"

# Create the long-term issue for full Presto integration
echo "Creating long-term issue for Presto integration..."
gh issue create \
  --repo "$REPO" \
  --title "Integrate Presto services into CLP Package Docker Compose project" \
  --body "## Context

As documented in PR #1310, the current Presto integration requires users to set \`database.host\` to a non-localhost hostname/IP because the Presto containers run on a Docker network while CLP's database runs on the host network. This mitigation is practical but inconvenient for users operating in diverse network environments.

## Long-term Solution

Integrate the Presto services into the CLP Package Docker Compose project introduced in #1178. The \`tools/deployment/presto-clp\` directory is only 132 KB in size, so this integration would not significantly increase the package size while substantially improving user convenience.

## Benefits

- Eliminates the need for non-localhost hostname configuration
- Provides a unified deployment experience
- Simplifies network connectivity between CLP and Presto services

## References

- PR #1310: $COMMENT_URL
- Original Presto Docker Compose addition: #1132
- CLP Package Docker Compose project: #1178
- Requested by: @junhaoliao" \
  --assignee junhaoliao

echo ""
echo "Creating short-term issue for external Docker network..."

# Create the short-term issue for external Docker network attachment
gh issue create \
  --repo "$REPO" \
  --title "Allow Presto project to attach to external Docker network for CLP Package Compose" \
  --body "## Context

After #1178, the \`webui\` service may encounter connectivity issues with the Presto connector because host network mode was removed in that update. Since implementing and reviewing the full integration (tracked separately) may take time, a short-term solution is needed before the next release.

## Short-term Solution

Allow the Presto project to attach to an external Docker network defined in the CLP Package Compose project. This will enable proper connectivity between the webui service and Presto without requiring full integration.

## Implementation Notes

- This is a temporary mitigation until the full integration is completed
- Should be implemented before the next release
- Will resolve connectivity issues between webui and Presto services

## References

- PR #1310: $COMMENT_URL
- CLP Package Docker Compose project: #1178
- Requested by: @junhaoliao" \
  --assignee junhaoliao

echo ""
echo "Both issues created successfully!"

Length of output: 229


@junhaoliao, I've created both issues as requested:

  1. Long-term integration: #1497 - Integrate Presto services into CLP Package Docker Compose project

    • Addresses the inconvenience of requiring non-localhost hostname configuration
    • Proposes full integration of the 132 KB Presto project into the CLP Package Docker Compose
  2. Short-term fix: #1498 - Allow Presto project to attach to external Docker network for CLP Package Compose

Both issues have been assigned to you and include references to PR #1310 and the relevant context.


🧠 Learnings used
Learnt from: davemarco
PR: y-scope/clp#1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.


* Set the `results_cache.retention_period` key to `null` since the CLP + Presto integration
doesn't yet support garbage collection.

```yaml
results_cache:
# host: "localhost"
# port: 27017
# db_name: "clp-query-results"
# stream_collection_name: "stream-files"
#
# # Retention period for search results, in minutes. Set to null to disable automatic deletion.
retention_period: null
```

* Update the `presto` key with the host and port of the Presto cluster. If you follow the
[Setting up Presto](#setting-up-presto) section, the host is `localhost` and the port is
`8889`.

```yaml
presto:
host: "<ip-address>"
port: <port>
```

:::{note}
Presto doesn't need to be running before you start CLP.
:::

3. If you'd like to store your compressed logs on S3, follow the
[using object storage](guides-using-object-storage/index.md) guide.
Expand Down Expand Up @@ -115,7 +148,7 @@ Using Presto with CLP requires:
5. Start a Presto cluster by running:

```bash
docker compose up
docker compose up --detach
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to also add the --wait flag so users can be notified if any service fails to start successfully

@coderabbitai add an issue to track

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track the suggestion

gh issue create \
  --repo y-scope/clp \
  --title "docs(presto): Add --wait flag to docker compose up command in Presto documentation" \
  --body "## Description

Add the \`--wait\` flag to the \`docker compose up --detach\` command in the Presto integration documentation so users can be notified if any service fails to start successfully.

## Current command
\`\`\`bash
docker compose up --detach
\`\`\`

## Suggested command
\`\`\`bash
docker compose up --detach --wait
\`\`\`

## Rationale
The \`--wait\` flag ensures that the command waits until services are running and healthy before returning. This provides better feedback to users if any service fails to start successfully.

## References
- PR: https://github.com/y-scope/clp/pull/1310
- Comment: https://github.com/y-scope/clp/pull/1310#discussion_r2461661299
- Requested by: @junhaoliao" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've created issue #1495 to track adding the --wait flag to the docker compose up --detach command in the Presto documentation. This will help users get notified if any service fails to start successfully.

```

* To use more than one Presto worker, you can use the `--scale` option as follows:
Expand All @@ -128,24 +161,22 @@ Using Presto with CLP requires:

### Stopping the Presto cluster

To stop the Presto cluster, use CTRL + C.
To stop the Presto cluster:

```bash
docker compose stop
```

To clean up the Presto cluster entirely:

```bash
docker compose rm
docker compose down
```

## Querying your logs through Presto

To query your logs through Presto, you can use the Presto CLI:

```bash
docker compose exec presto-coordinator \
presto-cli \
--catalog clp \
--schema default
```
You can query your compressed logs in your browser from [CLP's UI](#querying-from-clps-ui), or
from the command line using the [Presto CLI](#querying-from-the-presto-cli).

Each dataset in CLP shows up as a table in Presto. To show all available datasets:

Expand Down Expand Up @@ -179,6 +210,26 @@ contain the field `foo.bar`, you can query it using:
SELECT foo.bar FROM default LIMIT 1;
```

### Querying from CLP's UI

CLP's UI should be available at [http://localhost:4000](http://localhost:4000) (if you changed
`webui.host` or `webui.port` in `etc/clp-config.yml`, use the new values).

:::{note}
The UI can only run one query at a time, and queries must not end with a `;`.
:::

### Querying from the Presto CLI

To access the Presto CLI, navigate to the `tools/deployment/presto-clp` directory and run:

```bash
docker compose exec presto-coordinator \
presto-cli \
--catalog clp \
--schema default
```

## Limitations

The Presto CLP integration has the following limitations at present:
Expand Down