-
Notifications
You must be signed in to change notification settings - Fork 83
feat(presto-clp): Add Docker compose setup for Presto cluster that can connect to clp-json. #1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3a84840
9574059
8dd8795
a8af40c
2015a4a
b78682f
1247bba
1400a90
91040d1
9ec648c
4097478
b1ce135
70a56d7
99ac4e1
6dde297
b22746d
708756d
2eeda5c
8bb98f8
9fc487d
64e8edf
00bd3f1
28f5b70
cccf9fe
04f3d2a
3381eb8
bea5bb6
1f68965
94b0210
22c53e0
2f0de28
d621bf3
7300d20
4bc2f58
80c41d4
9d2146b
ff06c62
cf11694
2618f13
782b87d
8107518
4d896e5
713670a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # Setup local docker stack for presto + clp | ||
|
|
||
| ## Install docker | ||
|
|
||
| Follow the guide here: [docker] | ||
|
|
||
| # Launch clp-package | ||
|
||
|
|
||
| 1. Find the clp-package for test on our official website [clp-json-v0.4.0]. Here is a sample dataset for demo testing: [postgresql dataset]. | ||
|
|
||
| 2. Untar the clp-package and the postgresql dataset. | ||
|
|
||
| 3. Launch: | ||
|
|
||
| ```bash | ||
| # You probably want to run a python 3.9 or newer virtual environment | ||
| sbin/start-clp.sh | ||
| ``` | ||
|
|
||
| 5. Compress: | ||
|
|
||
| ```bash | ||
| # You can also use your own dataset | ||
| sbin/compress.sh --timestamp-key 'timestamp' /path/to/postgresql.log | ||
| ``` | ||
|
|
||
| 6. Use the following command to update `.env`: | ||
|
|
||
| ```bash | ||
| scripts/set-up-config.sh /path/to/clp-json-package | ||
| ``` | ||
|
|
||
| # Create Docker Cluster | ||
|
|
||
| Create a local docker stack: | ||
|
|
||
| ```bash | ||
| docker compose up | ||
| ``` | ||
|
|
||
| To create a docker stack with more than 1 worker (e.g., 3 workers): | ||
| ``` | ||
| docker compose up --scale presto-worker=3 | ||
| ``` | ||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # Use cli: | ||
|
|
||
| After all containers are in "Started" states (check by `docker ps`): | ||
|
|
||
| ```bash | ||
| # On your host | ||
| docker exec -it compose-presto-coordinator-1 sh | ||
|
|
||
| # In presto-coordinator container | ||
| /opt/presto-cli --catalog clp --schema default --server localhost:8080 | ||
| ``` | ||
|
|
||
| Example query: | ||
| ```sql | ||
| SELECT * FROM default LIMIT 1; | ||
| ``` | ||
|
||
|
|
||
| # Delete docker Cluster | ||
|
|
||
| ```bash | ||
| docker compose down | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| [clp-json-v0.4.0]: https://github.com/y-scope/clp/releases/tag/v0.4.0 | ||
| [docker]: https://docs.docker.com/engine/install | ||
| [postgresql dataset]: https://zenodo.org/records/10516402 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| PRESTO_COORDINATOR_HTTPPORT="8080" | ||
| PRESTO_COORDINATOR_SERVICENAME="presto-coordinator" | ||
|
|
||
| # node.properties | ||
| PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT="production" | ||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # clp.properties | ||
| PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE="mysql" | ||
| PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER="mysql" | ||
|
|
||
| # config.properties | ||
| PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY="1GB" | ||
| PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE="1GB" | ||
|
|
||
| # jvm.config | ||
| PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE="4G" | ||
| PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE="32M" | ||
|
|
||
| # log.properties | ||
| PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL="DEBUG" | ||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| connector.name=clp | ||
| clp.metadata-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE} | ||
| clp.metadata-db-url=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL} | ||
| clp.metadata-db-name=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME} | ||
| clp.metadata-db-user=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER} | ||
| clp.metadata-db-password=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD} | ||
| clp.metadata-table-prefix=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX} | ||
| clp.split-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER} | ||
| clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json | ||
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| coordinator=true | ||
| node-scheduler.include-coordinator=false | ||
| http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT} | ||
| query.max-memory=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY} | ||
| query.max-memory-per-node=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE} | ||
| discovery-server.enabled=true | ||
| discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT} | ||
| optimizer.optimize-hash-generation=false | ||
| regex-library=RE2J | ||
| use-alternative-function-signatures=true | ||
| inline-sql-functions=false | ||
| nested-data-serialization-enabled=false | ||
| native-execution-enabled=true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| -server | ||
| -Xmx${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE} | ||
| -XX:+UseG1GC | ||
| -XX:G1HeapRegionSize=${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE} | ||
| -XX:+UseGCOverheadLimit | ||
| -XX:+ExplicitGCInvokesConcurrent | ||
| -XX:+HeapDumpOnOutOfMemoryError | ||
| -XX:+ExitOnOutOfMemoryError | ||
| -Djdk.attach.allowAttachSelf=true |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1 @@ | ||||||
| com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL} | ||||||
|
||||||
| com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL} | |
| com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL:-INFO} |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/log.properties at
line 1, the log level value substituted from the environment variable may
include quotes, causing invalid log level syntax. To fix this, ensure that the
substitution process strips any surrounding quotes from the environment variable
value before inserting it into the config file, or update the .env file to
provide the log level without quotes.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| { | ||
| } | ||
|
Comment on lines
+1
to
+2
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Provide a minimal example or document expected schema An empty JSON object is syntactically valid, but future maintainers may be unsure what keys are supported. A commented exemplar or pointer to docs beside this file would improve clarity without affecting runtime. 🤖 Prompt for AI Agents
Comment on lines
+1
to
+2
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @anlowee I think we should explain to the user how to configure this for the timestamp field in their logs, right? And also that it may need to be different for each dataset they compress.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we direct them to the related presto-doc section?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's not published and is more general than they need, right? We should write a simplified section for them here. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | ||
| node.id=${PRESTO_COORDINATOR_SERVICENAME} |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,17 @@ | ||||||||||||||||||
| #!/bin/sh | ||||||||||||||||||
|
|
||||||||||||||||||
| # Exit on error | ||||||||||||||||||
| set -e | ||||||||||||||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||
|
|
||||||||||||||||||
| PRESTO_CONFIG_DIR="/opt/presto-server/etc" | ||||||||||||||||||
|
|
||||||||||||||||||
| # Substitute environemnt variables in config template | ||||||||||||||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||
| find /configs -type f | while read -r f; do | ||||||||||||||||||
| ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")" | ||||||||||||||||||
| done | ||||||||||||||||||
|
||||||||||||||||||
| # Substitute environemnt variables in config template | |
| find /configs -type f | while read -r f; do | |
| ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")" | |
| done | |
| # Substitute environment variables in config template | |
| find /configs -type f | while read -r f; do | |
| envsubst < "$f" > "${PRESTO_CONFIG_DIR}/$(basename "$f")" | |
| done |
🧰 Tools
🪛 Shellcheck (0.10.0)
[info] 10-10: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh around
lines 8 to 11, the script uses shell evaluation to substitute environment
variables in config templates, which is a security risk and mishandles filenames
with spaces. Replace the current shell evaluation approach with the use of
`envsubst` to safely substitute environment variables without executing
arbitrary shell code, and ensure all file path variables are properly quoted to
handle spaces correctly.
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| version: "3.9" | ||
|
|
||
| services: | ||
| presto-coordinator: | ||
| image: "ghcr.io/y-scope/presto/coordinator:dev" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider using stable image tags instead of 'dev' Using the 'dev' tag in production deployments can lead to unpredictable behavior as the image content may change without notice. Consider using specific version tags or at minimum a more stable tag: - image: "ghcr.io/y-scope/presto/coordinator:dev"
+ image: "ghcr.io/y-scope/presto/coordinator:${PRESTO_VERSION:-latest}"And similarly for the worker: - image: "ghcr.io/y-scope/presto/prestissimo-worker:dev"
+ image: "ghcr.io/y-scope/presto/prestissimo-worker:${PRESTO_VERSION:-latest}"🤖 Prompt for AI Agents |
||
| entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] | ||
| env_file: | ||
| - ".env" | ||
| - "coordinator-common.env" | ||
| - "coordinator.env" | ||
| volumes: | ||
| - "./coordinator/scripts:/scripts:ro" | ||
| - "coordinator-config:/opt/presto-server/etc" | ||
| - "./coordinator/config-template:/configs:ro" | ||
| networks: | ||
| - "presto" | ||
|
|
||
| presto-worker: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should define a @coderabbitai file an issue
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've filed issue #1504 to track adding a healthcheck to the Presto worker service. The issue has been assigned to you. |
||
| image: "ghcr.io/y-scope/presto/prestissimo-worker:dev" | ||
| depends_on: | ||
| presto-coordinator: | ||
| condition: "service_started" | ||
| entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"] | ||
| env_file: | ||
| - ".env" | ||
| - "coordinator-common.env" | ||
| - "worker.env" | ||
| volumes: | ||
| - "./worker/scripts:/scripts:ro" | ||
| - "worker-config:/opt/presto-server/etc" | ||
| - "./worker/config-template:/configs:ro" | ||
| - "${CLP_PACKAGE_ARCHIVES}:${CLP_PACKAGE_ARCHIVES}" | ||
| networks: | ||
| - "presto" | ||
|
|
||
| volumes: | ||
| coordinator-config: | ||
| worker-config: | ||
|
|
||
| networks: | ||
| presto: | ||
| driver: "bridge" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| /.venv/ |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,120 @@ | ||||||||||||||||||||||||||||||||||||||||||||
| import argparse | ||||||||||||||||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||||||||||||||||
| import sys | ||||||||||||||||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||||||||||||||||
| from typing import Optional | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| import yaml | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| # Set up console logging | ||||||||||||||||||||||||||||||||||||||||||||
| logging_console_handler = logging.StreamHandler() | ||||||||||||||||||||||||||||||||||||||||||||
| logging_formatter = logging.Formatter( | ||||||||||||||||||||||||||||||||||||||||||||
| "%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", datefmt="%Y-%m-%dT%H:%M:%S" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| logging_console_handler.setFormatter(logging_formatter) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| # Set up root logger | ||||||||||||||||||||||||||||||||||||||||||||
| root_logger = logging.getLogger() | ||||||||||||||||||||||||||||||||||||||||||||
| root_logger.setLevel(logging.INFO) | ||||||||||||||||||||||||||||||||||||||||||||
| root_logger.addHandler(logging_console_handler) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| # Create logger | ||||||||||||||||||||||||||||||||||||||||||||
| logger = logging.getLogger(__name__) | ||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+10
to
+23
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Consider using structured logging configuration The logging setup could be more maintainable using dictionary configuration or the logging.basicConfig approach. Consider simplifying the logging setup: -# Set up console logging
-logging_console_handler = logging.StreamHandler()
-logging_formatter = logging.Formatter(
- "%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", datefmt="%Y-%m-%dT%H:%M:%S"
-)
-logging_console_handler.setFormatter(logging_formatter)
-
-# Set up root logger
-root_logger = logging.getLogger()
-root_logger.setLevel(logging.INFO)
-root_logger.addHandler(logging_console_handler)
-
-# Create logger
-logger = logging.getLogger(__name__)
+# Set up logging
+logging.basicConfig(
+ level=logging.INFO,
+ format="%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s",
+ datefmt="%Y-%m-%dT%H:%M:%S"
+)
+logger = logging.getLogger(__name__)📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| def main(argv=None) -> int: | ||||||||||||||||||||||||||||||||||||||||||||
| if argv is None: | ||||||||||||||||||||||||||||||||||||||||||||
| argv = sys.argv | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| args_parser = argparse.ArgumentParser( | ||||||||||||||||||||||||||||||||||||||||||||
| description="Generates an environment variables file for any user-configured properties." | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| args_parser.add_argument( | ||||||||||||||||||||||||||||||||||||||||||||
| "--clp-package-dir", help="CLP package directory.", required=True, type=Path | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| args_parser.add_argument( | ||||||||||||||||||||||||||||||||||||||||||||
| "--output-file", help="Path for the environment variables file.", required=True, type=Path | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| parsed_args = args_parser.parse_args(argv[1:]) | ||||||||||||||||||||||||||||||||||||||||||||
| clp_package_dir: Path = parsed_args.clp_package_dir.resolve() | ||||||||||||||||||||||||||||||||||||||||||||
| output_file: Path = parsed_args.output_file | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| clp_config_file_path = clp_package_dir / "etc" / "clp-config.yml" | ||||||||||||||||||||||||||||||||||||||||||||
| with open(clp_config_file_path, "r") as clp_config_file: | ||||||||||||||||||||||||||||||||||||||||||||
| clp_config = yaml.safe_load(clp_config_file) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| database_host = _get_config_value(clp_config, "database.host", "localhost") | ||||||||||||||||||||||||||||||||||||||||||||
| database_port = _get_config_value(clp_config, "database.port", 3306) | ||||||||||||||||||||||||||||||||||||||||||||
| database_name = _get_config_value(clp_config, "database.name", "clp-db") | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| clp_archive_output_storage_type = _get_config_value( | ||||||||||||||||||||||||||||||||||||||||||||
| clp_config, "archive_output.storage.type", "fs" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| if "fs" != clp_archive_output_storage_type: | ||||||||||||||||||||||||||||||||||||||||||||
| logger.error( | ||||||||||||||||||||||||||||||||||||||||||||
| "Expected CLP's archive_output.storage.type to be fs but found '%s'. Presto currently only supports" | ||||||||||||||||||||||||||||||||||||||||||||
| " reading archives from the fs storage type.", | ||||||||||||||||||||||||||||||||||||||||||||
| clp_archive_output_storage_type, | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||
| clp_archives_dir = _get_config_value( | ||||||||||||||||||||||||||||||||||||||||||||
| clp_config, | ||||||||||||||||||||||||||||||||||||||||||||
| "archive_output.storage.directory", | ||||||||||||||||||||||||||||||||||||||||||||
| str(clp_package_dir / "var" / "data" / "archives"), | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| credentials_file_path = clp_package_dir / "etc" / "credentials.yml" | ||||||||||||||||||||||||||||||||||||||||||||
| with open(credentials_file_path, "r") as credentials_file: | ||||||||||||||||||||||||||||||||||||||||||||
| credentials = yaml.safe_load(credentials_file) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| database_user = _get_config_value(credentials, "database.user") | ||||||||||||||||||||||||||||||||||||||||||||
| database_password = _get_config_value(credentials, "database.password") | ||||||||||||||||||||||||||||||||||||||||||||
| if not database_user or not database_password: | ||||||||||||||||||||||||||||||||||||||||||||
| logger.error( | ||||||||||||||||||||||||||||||||||||||||||||
| "database.user and database.password must be specified in '%s'.", credentials_file_path | ||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we tell user that to launch clp-package first? Because that will generate the credentials. |
||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| return 1 | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| with open(output_file, "w") as env_file: | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| "PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL" | ||||||||||||||||||||||||||||||||||||||||||||
| f"=jdbc:mysql://{database_host}:{database_port}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME={database_name}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER={database_user}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write( | ||||||||||||||||||||||||||||||||||||||||||||
| f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD={database_password}\n" | ||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write(f"PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX=clp_\n") | ||||||||||||||||||||||||||||||||||||||||||||
| env_file.write(f"CLP_PACKAGE_ARCHIVES={clp_archives_dir}\n") | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| return 0 | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| def _get_config_value(config: dict, key: str, default_value: Optional[str] = None) -> str: | ||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||
| Gets the value corresponding to `key` from `config` if it exists. | ||||||||||||||||||||||||||||||||||||||||||||
| :param config: The config. | ||||||||||||||||||||||||||||||||||||||||||||
| :param key: The key to look for in the config, in dot notation (e.g., "database.host"). | ||||||||||||||||||||||||||||||||||||||||||||
| :param default_value: The value to return if `key` doesn't exist in `config`. | ||||||||||||||||||||||||||||||||||||||||||||
| :return: The value corresponding to `key` if it exists, otherwise `default_value`. | ||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| keys = key.split(".") | ||||||||||||||||||||||||||||||||||||||||||||
| value = config | ||||||||||||||||||||||||||||||||||||||||||||
| for k in keys: | ||||||||||||||||||||||||||||||||||||||||||||
| if isinstance(value, dict) and k in value: | ||||||||||||||||||||||||||||||||||||||||||||
| value = value[k] | ||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||
| return default_value | ||||||||||||||||||||||||||||||||||||||||||||
| return value | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| if "__main__" == __name__: | ||||||||||||||||||||||||||||||||||||||||||||
| sys.exit(main(sys.argv)) | ||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1 @@ | ||||||||||
| PyYAML | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Pin dependency version to ensure deterministic builds Unversioned requirements can silently break the deployment if PyYAML ships a backwards-incompatible update. Pin to a tested version range, e.g.: -PyYAML
+PyYAML>=6.0,<7.0and revisit periodically. 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,28 @@ | ||||||||||||||||||
| #!/usr/bin/env bash | ||||||||||||||||||
|
|
||||||||||||||||||
| set -eu | ||||||||||||||||||
| set -o pipefail | ||||||||||||||||||
|
|
||||||||||||||||||
| script_dir=$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) | ||||||||||||||||||
|
|
||||||||||||||||||
| cUsage="Usage: ${BASH_SOURCE[0]} <clp-package-dir>" | ||||||||||||||||||
| if [ "$#" -lt 1 ] ; then | ||||||||||||||||||
| echo "$cUsage" | ||||||||||||||||||
| exit | ||||||||||||||||||
| fi | ||||||||||||||||||
|
||||||||||||||||||
| if [ "$#" -lt 1 ] ; then | |
| echo "$cUsage" | |
| exit | |
| fi | |
| if [ "$#" -lt 1 ] ; then | |
| echo "$cUsage" >&2 | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/set-up-config.sh at lines 9 to 12, the
script uses 'exit' without an argument on incorrect usage, which returns the
previous command's status (likely 0). Change the 'exit' command to 'exit 1' to
ensure the script returns a non-zero status code indicating failure when called
incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Quote positional parameter when capturing it
Keeps paths with spaces intact.
-clp_package_dir=$1
+clp_package_dir="$1"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| clp_package_dir=$1 | |
| clp_package_dir="$1" |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/set-up-config.sh at line 13, the
positional parameter $1 is assigned to clp_package_dir without quotes, which can
break paths containing spaces. Fix this by enclosing $1 in double quotes when
assigning it to clp_package_dir to preserve any spaces in the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the .env file should be added to https://github.com/y-scope/clp/blob/main/tools/deployment/presto-clp/.gitignore
@coderabbitai create an issue to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if .gitignore exists at tools/deployment/presto-clp/.gitignore
if [ -f "tools/deployment/presto-clp/.gitignore" ]; then
echo "=== Existing .gitignore file found ==="
cat tools/deployment/presto-clp/.gitignore
else
echo "=== No .gitignore file exists at tools/deployment/presto-clp/.gitignore ==="
fi
echo ""
echo "=== Creating GitHub issue ==="
# Create the issue
gh issue create \
--title "Add .env file to .gitignore in tools/deployment/presto-clp" \
--body "The generated \`.env\` file in \`tools/deployment/presto-clp/\` should be added to a \`.gitignore\` file to prevent it from being committed to the repository.
**Context:**
The \`set-up-config.sh\` script generates a \`.env\` file containing environment variables at \`tools/deployment/presto-clp/.env\`. This file may contain sensitive information (database credentials, etc.) and should not be tracked by Git.
**Requested by:** @junhaoliao
**Related PR:** https://github.com/y-scope/clp/pull/1132
**Comment:** https://github.com/y-scope/clp/pull/1132#discussion_r2433423044" \
--assignee junhaoliao
Length of output: 149
@junhaoliao, I've created issue #1422 to track adding the .env file to .gitignore in the tools/deployment/presto-clp/ directory. The issue has been assigned to you and includes all the relevant context and backlinks to this PR and comment.
🧠 Learnings used
Learnt from: quinntaylormitchell
PR: y-scope/clp#1170
File: tools/deployment/presto-clp/scripts/generate-metadata-filter-file.py:0-0
Timestamp: 2025-09-04T12:26:54.788Z
Learning: The `set-up-config.sh` script in tools/deployment/presto-clp/scripts/ ensures that the output directory for metadata-filter.json already exists before calling generate-metadata-filter-file.py, so directory creation in the Python script is not strictly necessary.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,4 @@ | ||||||||||||||||||
| PRESTO_WORKER_HTTPPORT="8080" | ||||||||||||||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||
|
|
||||||||||||||||||
| # node.properties | ||||||||||||||||||
| PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location" | ||||||||||||||||||
|
||||||||||||||||||
| PRESTO_WORKER_HTTPPORT="8080" | |
| # node.properties | |
| PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location" | |
| PRESTO_WORKER_HTTP_PORT="8080" | |
| # node.properties | |
| PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location" |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker.env lines 1 to 4, rename the environment
variable PRESTO_WORKER_HTTPPORT to PRESTO_WORKER_HTTP_PORT to improve
readability and maintain consistency with other variables like
PRESTO_COORDINATOR_HTTP_PORT. After renaming, update all references to this
variable in the generation scripts to match the new name.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| connector.name=clp |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should set @coderabbitai file an issue
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've created issue #1503 to track the |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,6 @@ | ||||||
| discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT} | ||||||
| presto.version=REPLACE_ME | ||||||
|
||||||
| presto.version=REPLACE_ME | |
| presto.version=${PRESTO_VERSION} |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/config-template/config.properties at line
2, replace the hard-coded "REPLACE_ME" for presto.version with an environment
variable placeholder like "${PRESTO_VERSION}". This change allows the bootstrap
script to export PRESTO_VERSION dynamically and use envsubst for substitution,
making the template declarative and avoiding sed ordering issues.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,4 @@ | ||||||||||||||||||
| node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | ||||||||||||||||||
| node.internal-address=REPLACE_ME | ||||||||||||||||||
| node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION} | ||||||||||||||||||
| node.id=REPLACE_ME | ||||||||||||||||||
|
||||||||||||||||||
| node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | |
| node.internal-address=REPLACE_ME | |
| node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION} | |
| node.id=REPLACE_ME | |
| node.environment=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT} | |
| node.internal-address=REPLACE_ME | |
| node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION} | |
| node.id=REPLACE_ME |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/config-template/node.properties lines 1 to
4, the environment variable used for node.environment is referencing a
coordinator-specific variable, creating an unintended dependency. Change the
variable to a worker-specific environment variable by renaming it to something
like PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT or create an alias in the
worker.env file that maps to the coordinator variable to avoid coupling and
ensure the worker config is self-contained.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| mutable-config=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Black/Ruff may recurse through large non-code trees under tools/deployment.
Running
black .fromtools/deploymentwill walk every sub-directory (configs, templates, logs). That can add noticeable time and, in worst cases, choke on extremely large data files a user may drop under that tree. Consider narrowing the path totools/deployment/**/*.pyor adding an--extend-excludefor obvious non-code directories (e.g.,config-template,scripts/generated).🤖 Prompt for AI Agents