feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. #1178

junhaoliao · 2025-08-08T09:51:50Z

Description

Docker Compose implementation: replace ad-hoc docker run orchestration with a controller-driven Compose workflow that provisions env/config, writes a .env, and brings services up/down with docker compose.
- Service dependencies: Compose files now encode inter-service health checks, startup ordering, and graceful stop windows for DB, queue, Redis, Mongo, schedulers, workers, reducer, UI, and garbage collector.
- Volume management: host paths for logs/data/archives/streams/AWS config are mounted consistently across compose projects, with root-aware chown handling.
- Health checks: critical services use active health probes (cli client / TCP checks) to surface readiness before dependent jobs run.
New controller architecture: introduce BaseController and DockerComposeController to encapsulate provisioning, environment generation, and orchestrator-specific lifecycle commands while persisting shared container config for the UI.
- Environment-based configuration: .env synthesis covers credentials, storage paths, concurrency, logging levels, AWS auth, and service endpoints.
- Component provisioning helpers: each CLP component now has a dedicated _set_up_env_for_* method that validates directories/configs, pre-creates paths, and resolves hostnames for Compose.
- Configuration transformation: CLPConfig.transform_for_container() (and per-subconfig overrides) translate host settings into container-friendly service names and absolute paths; generate_docker_compose_container_config() returns the transformed copy used by Compose.
Simplified start/stop scripts: start_clp.py/stop_clp.py now validate config, ensure directories exist, and delegate lifecycle management to DockerComposeController.
Packaging updates: Compose manifests (docker-compose.base.yaml, docker-compose.yaml) ship with the package, taskfile.yaml copies them into releases, and new docs describe the design and prerequisites.

Breaking Changes

sbin/start-clp.sh and sbin/stop-clp.sh no longer accept component names or partial targets. The new controller always spins up or tears down the full Compose project, so existing automation that started individual services (e.g., start-clp.sh query_worker) will fail.
- The --force option, which prevented removal of killed containers when not specified, is also removed from the sbin/stop-clp.sh.
Support for multi-node deployments are temporarily removed. Configuration fields for controller hosts remain, but the package scripts now assume single-host Docker Compose orchestration and the docs instruct users accordingly.
- The --num-workers option, which specifies a concurrency count for targeted component launches (e.g., compression_worker and query_worker), is also removed as a result. We will bring back the configuration ability with a potentially different interface. See Revisit worker parallelism strategy after multi-container worker migration #1424 for more updates.
Data and log directories are no longer auto-namespaced by hostname when loading clp-config.yml. Existing installations relying on <base>/<hostname> directories must update paths or pre-create per-host directories themselves.
docker --version, instead of command -v docker, is now used to check docker availability.
Minimum runtime requirements increased: Docker Compose plugin 2.28.1+ is required to start the project.
Python API removals/changes:
- clp_py_utils.clp_config.get_components_for_target() and related component group/target constants were deleted; external tooling importing these symbols will break.
- CLPConfig.get_runnable_components() replaced by CLPConfig.get_deployment_type(), altering how callers determine which services to launch.
- clp_package_utils.general.check_dependencies() replaced by check_docker_dependencies(...) (different signature/semantics); scripts calling the old helper need to migrate.

Added

components/clp-package-utils/clp_package_utils/controller.py: controller abstraction, Compose-specific implementation, shared config dump, .env writer, worker auto-sizing, instance ID persistence.
Compose descriptors under tools/deployment/package/: base stack and full-stack YAML defining services, volumes, health checks, and deployment-type variants.
Documentation: docs/src/dev-docs/design-docker-compose.md, index link, build doc note, quick-start Docker version requirements.
Config transformations: container path constants, deployment type enum, transform_for_container() overrides for DB, queue, Redis, Mongo, schedulers, reducers, FS/S3 storage, and log input configurations.
Dependency validation: check_docker_dependencies() verifies docker availability and Compose project state before start/stop.

Other Removed

Legacy per-component orchestration logic (start_clp.py helpers, Docker mount builders, manual docker exec/wait loops, get_components_for_target API) in favour of Compose-driven lifecycle management.

Documentation

Added Docker Compose design doc detailing controller pattern, env generation, deployment types, and service topology.
Quick-start guide updated with explicit Docker package version minimums and guidance that Compose orchestrates the stack.
Multi-node guide replaced with a warning about the temporary removal of multi-node support, leaving only the SeaweedFS appendix.
Build doc cross-links to Compose design and mentions Compose assets inclusion in packages.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

(to be updated with more details once they're all revalidated with the latest batch of refactoring changes). Previously:

All scripts in sbin:
1. admin-tools/archive-manager.sh
2. admin-tools/dataset-manager.sh
3. compress.sh
4. decompress.sh
5. search.sh
6. start-clp.sh
7. stop-clp.sh
And the webui:
1. Viewing ingestion stats / jobs
2. Submitting and viewing queries
3. Extracting stream files and opening them in the log viewer
With below config / use-case variations:
1. storage_engine: clp / clp-s
2. with clp-s:
  1. s3 input
  2. s3 archive output
  3. s3 stream output
  4. non-default dataset
3. garbage collector enablement

Summary by CodeRabbit

New Features
- Controller-driven Docker Compose start/stop with instance-ID management and automated per-component environment (.env) generation; packaged docker-compose manifests for single-node deployment.
Bug Fixes / Reliability
- Stronger Docker dependency/state checks, improved ownership handling, hostname→IP resolution, and safer startup/stop flows.
Documentation
- New deployment orchestration design doc, updated quick-start Docker requirements, and multi-node guide replaced with a Docker Compose notice.
Chores
- Container-friendly config transforms, packaging of deployment assets, and relocation of default config constant.

coderabbitai · 2025-08-08T09:51:57Z

Walkthrough

Adds Docker Compose orchestration and packaging: new BaseController and DockerComposeController to prepare per-service environments, emit a .env, and start/stop compose; container-aware config transforms and defaults; moves a default config constant; adds compose manifests, packaging steps, and documentation.

Changes

Cohort / File(s)	Summary
Controller & orchestration `components/clp-package-utils/clp_package_utils/controller.py`	New `BaseController` (ABC) with per-component env setup helpers and utilities; new `DockerComposeController` implementing start/stop, worker-count logic, env aggregation, `.env` emission, instance-id and chown helpers, and hostname→IP resolution.
Docker dependency & validation `components/clp-package-utils/clp_package_utils/general.py`	Added Docker-related exception hierarchy and `check_docker_dependencies()` (with project-running checks); `_is_docker_compose_project_running()`; centralized data/log directory validators; `generate_docker_compose_container_config()` added; `dump_shared_container_config()` signature changed.
Config model & container transforms `components/clp-py-utils/clp_py_utils/clp_config.py`	Added `DeploymentType` enum, container path constants (`CONTAINER_*`), CLP default path constants, DEFAULT_PORT classvars and `transform_for_container()` across components, `dump_to_primitive_dict()`, and `get_deployment_type()`; defaults adjusted for data/logs and worker config.
Start/Stop refactor `components/clp-package-utils/clp_package_utils/scripts/start_clp.py`, `components/clp-package-utils/clp_package_utils/scripts/stop_clp.py`	Start/stop scripts refactored to use `DockerComposeController` and `get_or_create_instance_id()`; per-target CLI logic removed; start delegates to `controller.start()` and stop to `controller.stop()`.
CLP_DEFAULT_CONFIG relocation (imports updated) `components/clp-package-utils/clp_package_utils/scripts/.py`, `components/clp-package-utils/clp_package_utils/scripts/native/.py`	`CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH` moved from `clp_package_utils.general` to `clp_py_utils.clp_config`; import sites updated across scripts.
Docker Compose manifests `tools/deployment/package/docker-compose.base.yaml`, `tools/deployment/package/docker-compose.yaml`	New base and package compose files: centralized service defaults, healthchecks, anchors and full set of CLP services plus package-level query services; package compose adds top-level name and include.
Packaging update `taskfile.yaml`	Include `tools/deployment/package/*/` in package sources and add rsync steps to copy deployment package into build OUTPUT_DIR.
Documentation `docs/src/dev-docs/design-deployment-orchestration.md`, `docs/src/dev-docs/building-package.md`, `docs/src/dev-docs/index.md`, `docs/src/user-docs/guides-multi-node.md`, `docs/src/user-docs/quick-start/index.md`	Added deployment orchestration design doc; building guide references docker-compose and package inclusion; index toctree updated; multi-node guide simplified to note Docker Compose usage; quick-start adds Docker/containerd minimum versions.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User
    participant StartScript as start_clp.py
    participant Controller as DockerComposeController
    participant EnvPrep as BaseController
    participant Compose as Docker Compose

    User->>StartScript: run
    StartScript->>StartScript: load config & instance-id
    StartScript->>Controller: instantiate
    StartScript->>Controller: controller.start()
    Controller->>EnvPrep: _set_up_env()
    EnvPrep->>EnvPrep: build per-component envs, update webui settings, resolve hosts
    EnvPrep-->>Controller: aggregated env (.env)
    Controller->>Compose: docker compose up (with .env, project name)
    Compose->>Compose: start services (depends_on / healthchecks)
    Controller-->>StartScript: return status

sequenceDiagram
    autonumber
    participant User
    participant StopScript as stop_clp.py
    participant Controller as DockerComposeController
    participant Compose as Docker Compose

    User->>StopScript: run
    StopScript->>StopScript: load config & instance-id
    StopScript->>Controller: instantiate
    StopScript->>Controller: controller.stop()
    Controller->>Compose: docker compose down (project)
    Compose->>Compose: stop & remove services
    Controller-->>StopScript: return status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Migrate start_clp.py to Docker Compose #1177 — Related: this PR implements Docker Compose orchestration and refactors start/stop to use a DockerComposeController, aligning with the migration described in the issue.
Upgrade third-party container images to latest stable versions #1427 — Related: this PR adds docker-compose manifests that declare third‑party service images, which overlaps with the issue about auditing/upgrading those container images.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments." clearly identifies the primary change: migration from ad-hoc docker run orchestration to a Docker Compose–based controller workflow. The title is specific and technical, accurately reflecting the main code changes including the new BaseController and DockerComposeController abstractions, environment synthesis, Docker Compose manifests, and refactored start/stop scripts. The breaking-change indicator (!) is appropriately applied, and the secondary point about multi-node removal is directly related to this primary migration rather than a separate concern. The title successfully communicates the key objective without vague terms or excessive noise.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

# Conflicts: # components/clp-package-utils/clp_package_utils/scripts/start_clp.py

…tion

…dation

…onment variable handling

Co-authored-by: Junhao Liao <[email protected]>

…ort back in.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/src/user-docs/guides-multi-node.md (1)
5-8: Add missing comma and date reference for breaking change clarity.

The warning text is missing a comma after "orchestration" (flagged in prior feedback), and should include a "Since" reference to indicate when this breaking change took effect.

Apply this diff:
 :::{warning}
-CLP now uses [Docker Compose][docker-compose] for orchestration and support for multi-node
-deployments has been removed temporarily.
+CLP now uses [Docker Compose][docker-compose] for orchestration, and support for multi-node
+deployments has been removed temporarily. Since: 2025‑08‑08 (PR #1178).
 :::

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed25fc3 and 127cb79.

📒 Files selected for processing (2)

docs/src/dev-docs/building-package.md (2 hunks)
docs/src/user-docs/guides-multi-node.md (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-25T19:26:32.436Z

Learnt from: junhaoliao
PR: y-scope/clp#1335
File: taskfiles/docker-images.yaml:15-15
Timestamp: 2025-09-25T19:26:32.436Z
Learning: In the CLP project's Taskfile, the `:package` task creates the G_PACKAGE_BUILD_DIR directory structure, so any task that depends on `:package` (like `docker-images:package`) can safely assume this directory exists without needing additional mkdir commands.

Applied to files:

docs/src/dev-docs/building-package.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
GitHub Check: manylinux_2_28-x86_64-static-linked-bins
GitHub Check: package-image
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: ubuntu-jammy-lint
GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: centos-stream-9-static-linked-bins
GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
GitHub Check: musllinux_1_2-x86_64-static-linked-bins
GitHub Check: centos-stream-9-dynamic-linked-bins
GitHub Check: build-macos (macos-14, true)
GitHub Check: build-macos (macos-14, false)
GitHub Check: build-macos (macos-15, false)
GitHub Check: build-macos (macos-15, true)

🔇 Additional comments (1)

docs/src/dev-docs/building-package.md (1)

91-92: Reference definitions are correct.

The new link references align properly with usage in the note block and point to appropriate destinations. No concerns here.

coderabbitai · 2025-10-23T13:24:57Z

docs/src/dev-docs/building-package.md

+:::{note}
+The package includes a `docker-compose.yaml` file that can be used to deploy CLP using [Docker
+Compose][docker-compose]. If you want to manually deploy with Docker Compose instead of using the
+package scripts, see the [Deployment orchestration][design-deployment-orchestration] design doc for
+more information.
+:::


⚠️ Potential issue | 🟠 Major

Correct filename and add manual Compose example.

Two issues in this note:

Filename mismatch (critical): Line 65 references docker-compose.yaml, but the package artifacts ship as docker-compose.yml. This will mislead users trying to locate the file.

Missing user example: Past review cycles suggested adding a quick bash snippet showing the basic Compose commands so users don't need to navigate elsewhere to understand how to manually start/stop services.

Apply this diff to correct the filename and add the example:

:::{note} -The package includes a `docker-compose.yaml` file that can be used to deploy CLP using [Docker +The package includes a `docker-compose.yml` file that can be used to deploy CLP using [Docker Compose][docker-compose]. If you want to manually deploy with Docker Compose instead of using the package scripts, see the [Deployment orchestration][design-deployment-orchestration] design doc for more information. + +Example: + +```bash +# From the package root where docker-compose.yml resides +docker compose up -d +# Tear down when done (preserves volumes; append -v to also remove volumes) +docker compose down +``` :::

🤖 Prompt for AI Agents

In docs/src/dev-docs/building-package.md around lines 64 to 69, the note incorrectly references docker-compose.yaml (the package ships docker-compose.yml) and is missing a quick manual Compose example; update the filename to docker-compose.yml and insert a short bash snippet showing how to start and stop the stack (from package root): a one-line docker compose up -d and a docker compose down comment about preserving volumes (and that -v removes volumes) so users can run the commands without leaving the doc.

junhaoliao · 2025-10-23T14:48:38Z

tools/deployment/package/docker-compose.base.yaml

+      # NOTE: Only one of `CLP_ARCHIVE_OUTPUT_DIR_HOST` and `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` are
+      # set at a time, but since `./var/data` on the host is mounted into the container and both
+      # `CLP_ARCHIVE_OUTPUT_DIR_HOST` and `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` default to
+      # directories under `./var/data`, we need to use a hack to avoid having Docker create the
+      # unset directory on the host (as root).
+      #
+      # For example, let's say we use the following as the mount for staged archives:
+      # "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-empty}:/var/data/staged-archives". If
+      # `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` is unset, Docker will create
+      # `/var/data/staged-archives` in the container, but it will also create
+      # `./var/data/staged-archives` on the host as root; this is because `/var/data` in the
+      # container is bind mounted to `./var/data` on the host.
+      #
+      # The hack to avoid this is if one of `CLP_ARCHIVE_OUTPUT_DIR_HOST` or
+      # `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` is unset, we set the target for the corresponding mount
+      # to a path that's not under `/var/data` in the container.


@coderabbitai please create an issue to track removal of this hack

an idea is to accept a var/tmp directory config from the user, and we generate all temporary runtime data in that directory instead of the current var/data directory. then we don't have to add the /var/data bind mount below

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Create GitHub issue to track removal of Docker Compose mount hack gh issue create \ --title "Remove Docker Compose mount hack for archive/staged-archive directories" \ --body "## Context Currently, the Docker Compose configuration contains a hack to avoid Docker creating directories as root on the host when mounting archive and staged-archive directories. The issue arises because only one of \`CLP_ARCHIVE_OUTPUT_DIR_HOST\` and \`CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST\` is set at a time, but since \`./var/data\` on the host is mounted into the container and both variables default to directories under \`./var/data\`, Docker would create the unset directory on the host as root if we naively mount with a default path under \`/var/data\`. ## Current Hack When one of the variables is unset, we set the target for the corresponding mount to a path that's not under \`/var/data\` in the container to prevent Docker from creating it on the host. See: https://github.com/y-scope/clp/blob/docker-compose/tools/deployment/package/docker-compose.base.yaml#L267-L283 ## Proposed Solution Accept a \`var/tmp\` directory config from the user and generate all temporary runtime data in that directory instead of the current \`var/data\` directory. This would eliminate the need for the \`/var/data\` bind mount and remove the hack. ## References - PR #1178 - Comment: https://github.com/y-scope/clp/pull/1178#discussion_r2455414543 - Requested by: @junhaoliao" \ --assignee junhaoliao

Length of output: 42

@junhaoliao I've created issue #1478 to track the removal of this Docker Compose mount hack. The issue describes the current workaround and includes your proposed solution of using a separate var/tmp directory for temporary runtime data.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (10)

tools/deployment/package/docker-compose.base.yaml (10)
88-107: Add explicit user to db-table-creator after removing global default.

Once the global user is removed from x-service-defaults, add an explicit user to this CLP-owned service:
 db-table-creator:
   <<: *service_defaults
   hostname: "db_table_creator"
+  user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
   environment:
Based on learnings

199-214: Add explicit user to results-cache-indices-creator after removing global default.

Once the global user is removed from x-service-defaults, add an explicit user to this CLP-owned service:
 results-cache-indices-creator:
   <<: *service_defaults
   hostname: "results_cache_indices_creator"
+  user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
   environment:
Based on learnings

4-9: 🔴 CRITICAL: Remove global user override; vendor services will fail to start with forced host UID/GID.

The global user setting in x-service-defaults (line 9) forces all services—including vendor images (MySQL, RabbitMQ, Redis, MongoDB)—to run as the host UID/GID. This breaks their packaged entrypoints and causes permission/ownership errors. Based on learnings, only CLP-owned workloads should specify a user; vendor services must run with their image defaults.

Remove line 9 entirely. Then add explicit user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}" only to CLP-owned services: db-table-creator, results-cache-indices-creator, compression-scheduler, compression-worker, webui, and garbage-collector.
 x-service-defaults: &service_defaults
   image: "${CLP_PACKAGE_CONTAINER_IMAGE_REF:-clp-package}"
   logging:
     driver: "local"
   stop_grace_period: "60s"
-  user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
52-87: 🔴 CRITICAL: Remove user override from database service; vendor image requires packaged user.

Line 56 forces the MySQL/MariaDB container to run as the host UID/GID, which breaks the vendor entrypoint. MySQL requires its packaged user (usually mysql). Remove this line entirely.
 database:
   <<: *service_defaults
   image: "${CLP_DB_IMAGE:-mariadb:10-jammy}"
   hostname: "database"
-  user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}"
   environment:
Additionally, the stop_grace_period: "60s" inherited from x-service-defaults is appropriate for safe MySQL shutdown.

132-167: 🔴 CRITICAL: Remove user override from redis service; Redis requires its packaged user.

Line 136 forces Redis to run as the host UID/GID, which breaks the vendor entrypoint. Remove this line:
 redis:
   <<: *service_defaults
   image: "redis:7.2.4"
   hostname: "redis"
-  user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}"
   ports:
350-379: Add explicit user to garbage-collector after removing global default.

Once the global user is removed from x-service-defaults, add an explicit user to this CLP-owned service:
 garbage-collector:
   <<: *service_defaults
   hostname: "garbage_collector"
+  user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
   stop_grace_period: "10s"
   deploy:
Based on learnings

215-247: Add explicit user to compression-scheduler after removing global default.

Once the global user is removed from x-service-defaults, add an explicit user to this CLP-owned service:
 compression-scheduler:
   <<: *service_defaults
   hostname: "compression_scheduler"
+  user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
   stop_grace_period: "300s"
   environment:
Based on learnings

168-198: 🔴 CRITICAL: Remove user override and fix trailing comma in command array.

Two issues:

Line 172: Remove the vendor user override so MongoDB runs with its packaged user:
 results-cache:
   <<: *service_defaults
   image: "mongo:7.0.1"
   hostname: "results_cache"
-  user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}"
   ports:
Line 191: Fix the trailing comma in the command array (YAML syntax error):
   command: [
     "--config", "/etc/mongo/mongod.conf",
-    "--bind_ip", "0.0.0.0",
+    "--bind_ip", "0.0.0.0"
   ]
304-349: Add explicit user to webui after removing global default.

Once the global user is removed from x-service-defaults, add an explicit user to this CLP-owned service:
 webui:
   <<: *service_defaults
   hostname: "webui"
+  user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
   environment:
Based on learnings

248-303: 🔴 CRITICAL: Add explicit user to compression-worker and fix archive mount paths.

Two issues:

Add explicit user after removing global default:
 compression-worker:
   <<: *service_defaults
   hostname: "compression_worker"
+  user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
   environment:
Fix archive mount path logic (lines 284–287). The conditional expansions ${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:+/tmp} and ${CLP_ARCHIVE_OUTPUT_DIR_HOST:+/tmp} incorrectly inject /tmp into the container target paths. When CLP_ARCHIVE_OUTPUT_DIR_HOST is set to /mnt/data, the mount becomes /mnt/data:/tmp/var/data/archives instead of /mnt/data:/var/data/archives. Use fixed container paths:
       - "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-empty}\
-        :${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:+/tmp}/var/data/archives"
+        :/var/data/archives"
       - "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-empty}\
-        :${CLP_ARCHIVE_OUTPUT_DIR_HOST:+/tmp}/var/data/staged-archives"
+        :/var/data/staged-archives"
Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 127cb79 and 649276f.

📒 Files selected for processing (1)

tools/deployment/package/docker-compose.base.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-09-25T05:13:13.298Z

Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

tools/deployment/package/docker-compose.base.yaml

📚 Learning: 2025-01-16T16:58:43.190Z

Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

tools/deployment/package/docker-compose.base.yaml

📚 Learning: 2025-10-13T03:32:19.293Z

Learnt from: junhaoliao
PR: y-scope/clp#1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Applied to files:

tools/deployment/package/docker-compose.base.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: manylinux_2_28-x86_64-static-linked-bins
GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
GitHub Check: musllinux_1_2-x86_64-static-linked-bins
GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
GitHub Check: ubuntu-jammy-lint
GitHub Check: build-macos (macos-14, false)
GitHub Check: build-macos (macos-15, true)
GitHub Check: lint-check (macos-15)
GitHub Check: lint-check (ubuntu-24.04)

coderabbitai · 2025-10-23T14:49:41Z

tools/deployment/package/docker-compose.base.yaml

+  queue:
+    <<: *service_defaults
+    image: "rabbitmq:3.9.8"
+    hostname: "queue"
+    user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}"
+    environment:
+      RABBITMQ_DEFAULT_PASS: "${CLP_QUEUE_PASS:?Please set a value.}"
+      RABBITMQ_DEFAULT_USER: "${CLP_QUEUE_USER:?Please set a value.}"
+      RABBITMQ_LOGS: "/var/log/rabbitmq/rabbitmq.log"
+    ports:
+      - host_ip: "${CLP_QUEUE_HOST:-127.0.0.1}"
+        published: "${CLP_QUEUE_PORT:-5672}"
+        target: 5672
+    volumes:
+      - type: "bind"
+        source: "${CLP_QUEUE_LOGS_DIR_HOST:-./var/log/queue}"
+        target: "/var/log/rabbitmq"
+    healthcheck:
+      <<: *healthcheck_defaults
+      test: [
+        "CMD",
+        "rabbitmq-diagnostics", "check_running"
+      ]
+


⚠️ Potential issue | 🔴 Critical

🔴 CRITICAL: Remove user override from queue service; RabbitMQ requires its packaged user.

Line 112 forces RabbitMQ to run as the host UID/GID, which breaks the vendor entrypoint. RabbitMQ requires its packaged user. Remove this line:

queue: <<: *service_defaults image: "rabbitmq:3.9.8" hostname: "queue" - user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}" ports:

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tools/deployment/package/docker-compose.base.yaml around lines 108 to 131, remove the user override line in the queue service (the "user: ${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}" entry) because RabbitMQ requires its packaged user and the override breaks the vendor entrypoint; delete that line and ensure no other uid/gid overrides are applied to this service (leave environment, ports, volumes and healthcheck as-is).

…abases"

kirkrodrigues

For the PR title, how about:

feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

docs/src/dev-docs/design-deployment-orchestration.md (3)
16-16: Fix missing conjunction; verify this was previously addressed.

The sentence still reads "lists the services their functions" and is missing "and". Past review comments indicate this was addressed in commit ed25fc3, but the error remains. Please verify whether this fix was reverted or incompletely applied.
-[Table 1](#table-1) below lists the services their functions, while [Table 2](#table-2) lists the
+[Table 1](#table-1) below lists the services and their functions, while [Table 2](#table-2) lists the
9-9: Add comma after parenthetical for clarity.

The phrase reads more naturally with a comma between the parenthetical and the following phrase.
-[Figure 1](#figure-1) shows the components (*services* in orchestrator terminology) in the CLP
+[Figure 1](#figure-1) shows the components (*services* in orchestrator terminology), in the CLP
153-153: Use "to" instead of "in order to" for conciseness.
  * `<Orchestrator>Controller` that implements (and/or overrides) any of the methods in
    `BaseController` (`<Orchestrator>` is a placeholder for the specific orchestrator for which the
-   class is being implemented in order to
+   class is being implemented to
    orchestrate a deployment).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 649276f and 7139740.

📒 Files selected for processing (1)

docs/src/dev-docs/design-deployment-orchestration.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

docs/src/dev-docs/design-deployment-orchestration.md

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...in orchestrator terminology) in the CLP package as well as their dependencies. The CLP ...

(AI_HYDRA_LEO_MISSING_COMMA)

[style] ~153-~153: Consider a more concise word here.
Context: ...specific derived classes must implement in order to orchestrate a deployment. * `<Orche...

(IN_ORDER_TO_PREMIUM)

🪛 markdownlint-cli2 (0.18.1)

docs/src/dev-docs/design-deployment-orchestration.md

9-9: Link fragments should be valid

(MD051, link-fragments)

16-16: Link fragments should be valid

(MD051, link-fragments)

16-16: Link fragments should be valid

(MD051, link-fragments)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
GitHub Check: centos-stream-9-dynamic-linked-bins
GitHub Check: musllinux_1_2-x86_64-static-linked-bins
GitHub Check: centos-stream-9-static-linked-bins
GitHub Check: ubuntu-jammy-lint
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: package-image
GitHub Check: manylinux_2_28-x86_64-static-linked-bins
GitHub Check: build-macos (macos-14, true)
GitHub Check: build-macos (macos-15, true)
GitHub Check: build-macos (macos-14, false)
GitHub Check: lint-check (macos-15)

coderabbitai · 2025-10-23T15:00:01Z

docs/src/dev-docs/design-deployment-orchestration.md

+
+## Architecture
+
+[Figure 1](#figure-1) shows the components (*services* in orchestrator terminology) in the CLP


🧹 Nitpick | 🔵 Trivial

Verify link fragment syntax for cross-references.

The Markdown link syntax (e.g., [Figure 1](#figure-1)) references anchors defined with MyST syntax (e.g., (figure-1)= at line 19). Consider using MyST cross-reference syntax ({ref}) instead, which integrates more reliably with the documentation build system.

-[Figure 1](#figure-1) shows the components (*services* in orchestrator terminology), in the CLP +{ref}`figure-1` shows the components (*services* in orchestrator terminology), in the CLP

Similarly, update line 16 to use {ref} instead of Markdown link syntax.

Also applies to: 16-16

🧰 Tools

🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...in orchestrator terminology) in the CLP package as well as their dependencies. The CLP ...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)

9-9: Link fragments should be valid

(MD051, link-fragments)

🤖 Prompt for AI Agents

In docs/src/dev-docs/design-deployment-orchestration.md around line 9 (and also update line 16), replace the Markdown anchor-style links like [Figure 1](#figure-1) with MyST cross-reference syntax using the defined anchor name (figure-1) — e.g., change the Markdown link to {ref}`figure-1`; do the same for the link on line 16 so all cross-references use {ref}`<anchor-name>` matching the (figure-1)= anchor at line 19.

coderabbitai · 2025-10-23T15:00:01Z

docs/src/dev-docs/design-deployment-orchestration.md

+  subgraph Databases
+    database
+    queue
+    redis
+    results_cache
+  end


🧹 Nitpick | 🔵 Trivial

Consider renaming "Databases" group to reflect its broader role.

Past review comments discussed whether this grouping should be renamed to "State persistence", "Shared data stores", or similar, since it contains not only a database (MySQL) but also a message broker (RabbitMQ), cache (Redis), and results storage (MongoDB). No final decision was recorded. If there is team consensus, update the label to avoid confusion about the role of these components.

🤖 Prompt for AI Agents

In docs/src/dev-docs/design-deployment-orchestration.md around lines 70 to 75, the diagram subgraph labeled "Databases" groups heterogeneous components (MySQL, RabbitMQ, Redis, MongoDB); update the label to reflect their broader role—e.g., "State persistence", "Shared data stores", or "Storage & Messaging"—to remove ambiguity. Pick the team-consensus name (or propose one if none) and replace the subgraph title text accordingly, then update any nearby explanatory text to use the new term for consistency.

junhaoliao · 2025-10-23T18:48:54Z

tools/deployment/package/docker-compose.yaml

+      CLP_DB_PASS: "${CLP_DB_PASS:?Please set a value.}"
+      CLP_DB_USER: "${CLP_DB_USER:?Please set a value.}"
+      CLP_LOGGING_LEVEL: "${CLP_QUERY_SCHEDULER_LOGGING_LEVEL:-INFO}"
+      CLP_LOGS_DIR: "/var/log"


the PR wrongly changes the query scheduler's logs directory base to /var/log (it should be /var/log/query_scheduler). a similar issue is found in the compression scheduler.

@coderabbitai create an issue to track the fix

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Create a GitHub issue to track the incorrect logs directory configuration gh issue create \ --repo y-scope/clp \ --title "Fix incorrect logs directory base for schedulers in Docker Compose" \ --body "The PR #1178 incorrectly configures the logs directory base for query scheduler and compression scheduler. **Issue:** - Query scheduler's \`CLP_LOGS_DIR\` is set to \`/var/log\` but should be \`/var/log/query_scheduler\` - Compression scheduler has a similar issue and should use \`/var/log/compression_scheduler\` **Context:** Reported by @junhaoliao in https://github.com/y-scope/clp/pull/1178#discussion_r2456746396 **Files affected:** - \`tools/deployment/package/docker-compose.yaml\` - \`tools/deployment/package/docker-compose.base.yaml\` (likely)" \ --assignee junhaoliao

Length of output: 42

@junhaoliao, I've created issue #1483 to track the incorrect logs directory configuration for the query scheduler and compression scheduler.

The issue documents that:

Query scheduler's CLP_LOGS_DIR should be /var/log/query_scheduler instead of /var/log

Compression scheduler has the same issue and should use /var/log/compression_scheduler

The issue has been assigned to you and includes a link back to this discussion.

🧠 Learnings used

Learnt from: junhaoliao PR: y-scope/clp#1152 File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613 Timestamp: 2025-08-08T06:59:42.436Z Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

sitaowang1998 · 2025-10-28T20:57:33Z

When running compression workflow, I see the following logs at the very start of some compression worker:

[2025-10-28 19:54:36,496: ERROR/MainProcess] consumer: Cannot connect to amqp://clp-user:**@queue:5672//: [Errno 111] Connection refused.
Trying again in 2.00 seconds... (1/100)

[2025-10-28 19:54:38,499: ERROR/MainProcess] consumer: Cannot connect to amqp://clp-user:**@queue:5672//: [Errno 111] Connection refused.
Trying again in 4.00 seconds... (2/100)

This is caused by the lack of dependency between compression worker and RabbitMQ. Compression worker starts before the RabbitMQ is fully setup, and thus cannot connect to it. Once the RabbitMQ is fully operational, everything works as normal.
@CodeRabbit Could you please open an issue about this, which also tracks the other missing dependencies in docker compose services?

coderabbitai · 2025-10-28T20:58:41Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the docker-compose files to understand current compression-worker dependencies
echo "=== docker-compose.base.yaml ==="
cat tools/deployment/package/docker-compose.base.yaml

echo -e "\n\n=== docker-compose.yaml ==="
cat tools/deployment/package/docker-compose.yaml

Length of output: 17710

🏁 Script executed:

#!/bin/bash
# Create GitHub issue tracking missing Docker Compose service dependencies
gh issue create \
  --title "Add missing service dependencies in Docker Compose configuration" \
  --body "## Description

During compression workflow execution, compression workers intermittently fail to connect to RabbitMQ at startup because they begin before RabbitMQ is fully operational. While workers eventually retry and connect successfully, the initial connection failures generate error logs and indicate missing \`depends_on\` health check dependencies in the Docker Compose configuration.

## Error Logs

\`\`\`
[2025-10-28 19:54:36,496: ERROR/MainProcess] consumer: Cannot connect to amqp://clp-user:**@queue:5672//: [Errno 111] Connection refused.
Trying again in 2.00 seconds... (1/100)

[2025-10-28 19:54:38,499: ERROR/MainProcess] consumer: Cannot connect to amqp://clp-user:**@queue:5672//: [Errno 111] Connection refused.
Trying again in 4.00 seconds... (2/100)
\`\`\`

## Root Cause

The compression worker service does not declare an explicit dependency on the RabbitMQ (\`queue\`) service with a health check condition. As a result, Docker Compose starts the worker before verifying that RabbitMQ is healthy and ready to accept connections.

## Scope

This issue tracks:
1. Adding \`depends_on\` with \`condition: service_healthy\` for compression worker → RabbitMQ
2. Reviewing and adding any other missing service dependencies across all Docker Compose services (compression-scheduler, query-scheduler, query-worker, reducer, webui, garbage-collector, etc.)

## Related

- PR: https://github.com/y-scope/clp/pull/1178
- Comment: https://github.com/y-scope/clp/pull/1178#discussion_r<will-be-generated>

Reported by: @sitaowang1998" \
  --assignee sitaowang1998

Length of output: 42

@sitaowang1998: I've opened issue #1516 to track this problem.

After examining the Docker Compose configuration, I can confirm the root cause: the compression-worker service in docker-compose.base.yaml is missing a depends_on block entirely. While the compression-scheduler correctly declares:

depends_on:
  queue:
    condition: "service_healthy"

The compression-worker has no such dependency declaration, allowing it to start before RabbitMQ is ready. The same issue also affects query-worker, which also connects to the queue but doesn't wait for it to be healthy.

The issue will track adding the missing dependencies for both workers and reviewing all other services for similar gaps.

junhaoliao · 2025-10-30T07:01:39Z

components/clp-py-utils/clp_py_utils/clp_config.py

-    directory: SerializablePath = CLP_DEFAULT_DATA_DIRECTORY_PATH / "archives"
+    directory: SerializablePath = CLP_DEFAULT_ARCHIVES_DIRECTORY_PATH
+
+    def transform_for_container(self):


The "native" scripts - particularly native/dataset_manager.py - should have been updated to use the new in-container path (/var/data) instead of relying on the previous direct host-to-container path mapping. Otherwise, the dataset manager isn't able to remove archives at the moment.

@coderabbitai create an issue to track the archive deletion bug in the dataset manager

and another one to migrate the sbin tools to Docker Compose - or at least update their Docker mappings to align with the Package Compose project

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Create issue 1: Archive deletion bug in dataset manager gh issue create \ --repo y-scope/clp \ --title "Archive deletion fails in dataset manager due to path mapping mismatch" \ --body "## Description The native dataset manager script (\`native/dataset_manager.py\`) is currently unable to remove archives because it hasn't been updated to use the new in-container path (\`/var/data\`) introduced in the Docker Compose migration. ## Root Cause The script still relies on the previous direct host-to-container path mapping instead of the new container path structure. ## Related - PR: https://github.com/y-scope/clp/pull/1178 - Comment: https://github.com/y-scope/clp/pull/1178#discussion_r2476634906 - Reported by: @junhaoliao ## Expected Fix Update \`native/dataset_manager.py\` and related native scripts to use the correct in-container paths that align with the Docker Compose deployment model." \ --assignee junhaoliao echo "---" # Create issue 2: Migrate sbin tools to Docker Compose gh issue create \ --repo y-scope/clp \ --title "Migrate sbin tools to Docker Compose or align their Docker mappings" \ --body "## Description The sbin tools need to be migrated to use Docker Compose orchestration, or at minimum, their Docker volume/path mappings should be updated to align with the Package Compose project structure. ## Context Following the migration to Docker Compose in PR #1178, the sbin tools' container orchestration and path mappings are out of sync with the new deployment model. ## Scope Either: - Fully migrate sbin tools to Docker Compose orchestration, or - Update their Docker run commands and volume mappings to align with the Package Compose project ## Related - PR: https://github.com/y-scope/clp/pull/1178 - Comment: https://github.com/y-scope/clp/pull/1178#discussion_r2476634906 - Reported by: @junhaoliao" \ --assignee junhaoliao

Length of output: 89

@junhaoliao, I've created both issues to track these follow-up items:

Archive deletion bug: #1526 - Archive deletion fails in dataset manager due to path mapping mismatch

Tracks the immediate bug where native/dataset_manager.py can't remove archives due to the path mapping mismatch with the new /var/data in-container path structure.

Sbin tools migration: #1527 - Migrate sbin tools to Docker Compose or align their Docker mappings

Tracks the broader task of either fully migrating sbin tools to Docker Compose or updating their Docker volume/path mappings to align with the Package Compose project.

Both issues are assigned to you and include backlinks to this PR and comment thread.

🧠 Learnings used

Learnt from: junhaoliao PR: y-scope/clp#1178 File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47 Timestamp: 2025-10-07T07:54:32.427Z Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.

Learnt from: haiqi96 PR: y-scope/clp#1144 File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170 Timestamp: 2025-08-13T15:07:37.767Z Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved/absolute Path object because FsStorage.make_config_paths_absolute() processes the directory path during configuration loading, making additional .resolve() calls redundant.

Learnt from: haiqi96 PR: y-scope/clp#1144 File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170 Timestamp: 2025-08-13T15:07:37.767Z Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved/absolute Path object because FsStorage.make_config_paths_absolute() processes the directory path during configuration loading, making additional .resolve() calls redundant.

Learnt from: haiqi96 PR: y-scope/clp#1144 File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114 Timestamp: 2025-08-13T14:48:49.020Z Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.

Learnt from: haiqi96 PR: y-scope/clp#594 File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110 Timestamp: 2024-11-15T16:21:52.122Z Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.

Learnt from: junhaoliao PR: y-scope/clp#1501 File: tools/deployment/presto-clp/scripts/init.py:10-13 Timestamp: 2025-10-27T07:07:37.901Z Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Learnt from: junhaoliao PR: y-scope/clp#1178 File: components/clp-package-utils/clp_package_utils/controller.py:315-315 Timestamp: 2025-10-17T19:59:25.596Z Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Learnt from: junhaoliao PR: y-scope/clp#1414 File: tools/docker-images/clp-package/Dockerfile:20-24 Timestamp: 2025-10-13T03:32:19.293Z Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Learnt from: Bill-hbrhbr PR: y-scope/clp#1122 File: components/core/src/clp/clp/CMakeLists.txt:175-195 Timestamp: 2025-07-23T09:54:45.185Z Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

junhaoliao changed the title ~~feat(deployment): Migrate package orchestration to Docker Compose (resolves #1177).~~ DO NOT MERGE (pending many other dependency PRs) - feat(deployment): Migrate package orchestration to Docker Compose (resolves #1177). Aug 8, 2025

junhaoliao force-pushed the docker-compose branch from 78b99f1 to fa2aff7 Compare August 20, 2025 22:40

junhaoliao and others added 16 commits August 24, 2025 10:10

remove FIXME

934a83c

Merge branch 'main' into docker-compose

5bf23c2

reformat

83cc9d1

reformat

82abc07

Update garbage collector logs directory mapping

0fb2294

Remove unused component argument parsers

c8ffb94

Remove unused component argument parsers

83e902a

Refactor dependency checks to include docker-compose status validation

5f2e5cd

Refactor log directory handling to use constant path definitions

edfa9c9

Add constants for archive and stream directory paths

7b3965e

remove unused component groups and functions

cd84be8

Remove unused CONTROLLER_TARGET_NAME constant from start_clp.py

aa12bdb

fix staging dirs

5365722

fix: update command to check if Docker Compose is running

21ef703

add AWS env credentials support

d2cdfbc

Merge branch 'main' into docker-compose

4f56709

junhaoliao mentioned this pull request Aug 25, 2025

feat(docker): Add container image containing the CLP package, for Docker Compose integration (resolves #1164). #1166

Merged

5 tasks

junhaoliao added 10 commits August 26, 2025 15:21

Update container image name in start_clp.py

f0db07f

Merge remote-tracking branch 'origin/main' into docker-compose

655600d

# Conflicts: # components/clp-package-utils/clp_package_utils/scripts/start_clp.py

add support for configurable CLP WebUI rate limiting

5f24ce7

update WebUI server path in start_clp.py and docker-compose configura…

3df20fc

…tion

copy docker-compose.yml in package task

c6f81ad

use absolute paths in archive and stream storage configurations

db9c20f

refactor: centralize environment variable management and enhance vali…

ea03e17

…dation

fix: use List[str] type hint for command parameter in start_clp.py

60994ee

refactor: remove dump_to_env_vars_dict methods and centralize envir…

7e25d75

…onment variable handling

lint

3e24e4e

kirkrodrigues and others added 3 commits October 23, 2025 09:07

Update docs/src/dev-docs/design-deployment-orchestration.md

ed25fc3

Co-authored-by: Junhao Liao <[email protected]>

building-package.md: Minor fixes.

4926450

Remove other details from multi-node doc until we add multi-node supp…

127cb79

…ort back in.

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

Add comment to explain the +/tmp hack.

649276f

junhaoliao commented Oct 23, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Oct 23, 2025

Remove Docker Compose mount hack for archive/staged-archive directories #1478

Closed

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

docs(design-deployment): Rename "Shared data stores" subgraph to "Dat…

7139740

…abases"

kirkrodrigues approved these changes Oct 23, 2025

View reviewed changes

junhaoliao changed the title ~~feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177).~~ feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. Oct 23, 2025

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

junhaoliao merged commit 9ff731b into y-scope:main Oct 23, 2025
28 of 32 checks passed

junhaoliao commented Oct 23, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Oct 23, 2025

Fix incorrect logs directory base for schedulers in Docker Compose #1483

Closed

This was referenced Oct 23, 2025

fix(clp-package): Use component-specific log directories for compression-scheduler and query-scheduler (fixes #1483). #1484

Merged

feat(presto-clp): Add Docker compose setup for Presto cluster that can connect to clp-json. #1132

Merged

coderabbitai bot mentioned this pull request Oct 25, 2025

Fix file mounting inconsistency between Presto and CLP Package Docker Compose projects #1496

Closed

junhaoliao mentioned this pull request Oct 25, 2025

docs(presto-clp): Add instructions for connecting CLP's UI to Presto; Simplify some set up instructions. #1310

Merged

3 tasks

This was referenced Oct 25, 2025

Integrate Presto services into CLP Package Docker Compose project #1497

Open

Allow Presto project to attach to external Docker network for CLP Package Compose #1498

Closed

junhaoliao mentioned this pull request Oct 25, 2025

feat(deployment): Add --setup-only flag to start-clp.sh to set up the package without starting components (resolves #1475). #1502

Merged

3 tasks

coderabbitai bot mentioned this pull request Oct 28, 2025

Add missing service dependencies in Docker Compose configuration #1516

Open

junhaoliao commented Oct 30, 2025

View reviewed changes

This was referenced Oct 30, 2025

Archive deletion fails in dataset manager due to path mapping mismatch #1526

Closed

Migrate sbin tools to Docker Compose or align their Docker mappings #1527

Open

This was referenced Nov 4, 2025

Fix scheduler logging level configuration in generic_start_scheduler #1220

Closed

fix(clp-package): Ensure at least one worker in default calculations (fixes #1509). #1547

Merged


		## Architecture

		[Figure 1](#figure-1) shows the components (services in orchestrator terminology) in the CLP

feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. #1178

feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. #1178

Uh oh!

Conversation

junhaoliao commented Aug 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Breaking Changes

Added

Other Removed

Documentation

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

junhaoliao Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

kirkrodrigues left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

junhaoliao Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

sitaowang1998 commented Oct 28, 2025

Uh oh!

coderabbitai bot commented Oct 28, 2025

Uh oh!

junhaoliao Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

junhaoliao commented Aug 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 8, 2025 •

edited

Loading