-
Notifications
You must be signed in to change notification settings - Fork 83
feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. #1178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Docker Compose orchestration and packaging: new BaseController and DockerComposeController to prepare per-service environments, emit a .env, and start/stop compose; container-aware config transforms and defaults; moves a default config constant; adds compose manifests, packaging steps, and documentation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant StartScript as start_clp.py
participant Controller as DockerComposeController
participant EnvPrep as BaseController
participant Compose as Docker Compose
User->>StartScript: run
StartScript->>StartScript: load config & instance-id
StartScript->>Controller: instantiate
StartScript->>Controller: controller.start()
Controller->>EnvPrep: _set_up_env()
EnvPrep->>EnvPrep: build per-component envs, update webui settings, resolve hosts
EnvPrep-->>Controller: aggregated env (.env)
Controller->>Compose: docker compose up (with .env, project name)
Compose->>Compose: start services (depends_on / healthchecks)
Controller-->>StartScript: return status
sequenceDiagram
autonumber
participant User
participant StopScript as stop_clp.py
participant Controller as DockerComposeController
participant Compose as Docker Compose
User->>StopScript: run
StopScript->>StopScript: load config & instance-id
StopScript->>Controller: instantiate
StopScript->>Controller: controller.stop()
Controller->>Compose: docker compose down (project)
Compose->>Compose: stop & remove services
Controller-->>StopScript: return status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
78b99f1 to
fa2aff7
Compare
# Conflicts: # components/clp-package-utils/clp_package_utils/scripts/start_clp.py
…onment variable handling
Co-authored-by: Junhao Liao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/src/user-docs/guides-multi-node.md (1)
5-8: Add missing comma and date reference for breaking change clarity.The warning text is missing a comma after "orchestration" (flagged in prior feedback), and should include a "Since" reference to indicate when this breaking change took effect.
Apply this diff:
:::{warning} -CLP now uses [Docker Compose][docker-compose] for orchestration and support for multi-node -deployments has been removed temporarily. +CLP now uses [Docker Compose][docker-compose] for orchestration, and support for multi-node +deployments has been removed temporarily. Since: 2025‑08‑08 (PR #1178). :::
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/dev-docs/building-package.md(2 hunks)docs/src/user-docs/guides-multi-node.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T19:26:32.436Z
Learnt from: junhaoliao
PR: y-scope/clp#1335
File: taskfiles/docker-images.yaml:15-15
Timestamp: 2025-09-25T19:26:32.436Z
Learning: In the CLP project's Taskfile, the `:package` task creates the G_PACKAGE_BUILD_DIR directory structure, so any task that depends on `:package` (like `docker-images:package`) can safely assume this directory exists without needing additional mkdir commands.
Applied to files:
docs/src/dev-docs/building-package.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (1)
docs/src/dev-docs/building-package.md (1)
91-92: Reference definitions are correct.The new link references align properly with usage in the note block and point to appropriate destinations. No concerns here.
| :::{note} | ||
| The package includes a `docker-compose.yaml` file that can be used to deploy CLP using [Docker | ||
| Compose][docker-compose]. If you want to manually deploy with Docker Compose instead of using the | ||
| package scripts, see the [Deployment orchestration][design-deployment-orchestration] design doc for | ||
| more information. | ||
| ::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct filename and add manual Compose example.
Two issues in this note:
-
Filename mismatch (critical): Line 65 references
docker-compose.yaml, but the package artifacts ship asdocker-compose.yml. This will mislead users trying to locate the file. -
Missing user example: Past review cycles suggested adding a quick bash snippet showing the basic Compose commands so users don't need to navigate elsewhere to understand how to manually start/stop services.
Apply this diff to correct the filename and add the example:
:::{note}
-The package includes a `docker-compose.yaml` file that can be used to deploy CLP using [Docker
+The package includes a `docker-compose.yml` file that can be used to deploy CLP using [Docker
Compose][docker-compose]. If you want to manually deploy with Docker Compose instead of using the
package scripts, see the [Deployment orchestration][design-deployment-orchestration] design doc for
more information.
+
+Example:
+
+```bash
+# From the package root where docker-compose.yml resides
+docker compose up -d
+# Tear down when done (preserves volumes; append -v to also remove volumes)
+docker compose down
+```
:::🤖 Prompt for AI Agents
In docs/src/dev-docs/building-package.md around lines 64 to 69, the note
incorrectly references docker-compose.yaml (the package ships
docker-compose.yml) and is missing a quick manual Compose example; update the
filename to docker-compose.yml and insert a short bash snippet showing how to
start and stop the stack (from package root): a one-line docker compose up -d
and a docker compose down comment about preserving volumes (and that -v removes
volumes) so users can run the commands without leaving the doc.
| # NOTE: Only one of `CLP_ARCHIVE_OUTPUT_DIR_HOST` and `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` are | ||
| # set at a time, but since `./var/data` on the host is mounted into the container and both | ||
| # `CLP_ARCHIVE_OUTPUT_DIR_HOST` and `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` default to | ||
| # directories under `./var/data`, we need to use a hack to avoid having Docker create the | ||
| # unset directory on the host (as root). | ||
| # | ||
| # For example, let's say we use the following as the mount for staged archives: | ||
| # "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-empty}:/var/data/staged-archives". If | ||
| # `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` is unset, Docker will create | ||
| # `/var/data/staged-archives` in the container, but it will also create | ||
| # `./var/data/staged-archives` on the host as root; this is because `/var/data` in the | ||
| # container is bind mounted to `./var/data` on the host. | ||
| # | ||
| # The hack to avoid this is if one of `CLP_ARCHIVE_OUTPUT_DIR_HOST` or | ||
| # `CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST` is unset, we set the target for the corresponding mount | ||
| # to a path that's not under `/var/data` in the container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai please create an issue to track removal of this hack
an idea is to accept a var/tmp directory config from the user, and we generate all temporary runtime data in that directory instead of the current var/data directory. then we don't have to add the /var/data bind mount below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create GitHub issue to track removal of Docker Compose mount hack
gh issue create \
--title "Remove Docker Compose mount hack for archive/staged-archive directories" \
--body "## Context
Currently, the Docker Compose configuration contains a hack to avoid Docker creating directories as root on the host when mounting archive and staged-archive directories.
The issue arises because only one of \`CLP_ARCHIVE_OUTPUT_DIR_HOST\` and \`CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST\` is set at a time, but since \`./var/data\` on the host is mounted into the container and both variables default to directories under \`./var/data\`, Docker would create the unset directory on the host as root if we naively mount with a default path under \`/var/data\`.
## Current Hack
When one of the variables is unset, we set the target for the corresponding mount to a path that's not under \`/var/data\` in the container to prevent Docker from creating it on the host.
See: https://github.com/y-scope/clp/blob/docker-compose/tools/deployment/package/docker-compose.base.yaml#L267-L283
## Proposed Solution
Accept a \`var/tmp\` directory config from the user and generate all temporary runtime data in that directory instead of the current \`var/data\` directory. This would eliminate the need for the \`/var/data\` bind mount and remove the hack.
## References
- PR #1178
- Comment: https://github.com/y-scope/clp/pull/1178#discussion_r2455414543
- Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao I've created issue #1478 to track the removal of this Docker Compose mount hack. The issue describes the current workaround and includes your proposed solution of using a separate var/tmp directory for temporary runtime data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (10)
tools/deployment/package/docker-compose.base.yaml (10)
88-107: Add explicit user todb-table-creatorafter removing global default.Once the global
useris removed fromx-service-defaults, add an explicit user to this CLP-owned service:db-table-creator: <<: *service_defaults hostname: "db_table_creator" + user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}" environment:Based on learnings
199-214: Add explicit user toresults-cache-indices-creatorafter removing global default.Once the global
useris removed fromx-service-defaults, add an explicit user to this CLP-owned service:results-cache-indices-creator: <<: *service_defaults hostname: "results_cache_indices_creator" + user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}" environment:Based on learnings
4-9: 🔴 CRITICAL: Remove global user override; vendor services will fail to start with forced host UID/GID.The global
usersetting inx-service-defaults(line 9) forces all services—including vendor images (MySQL, RabbitMQ, Redis, MongoDB)—to run as the host UID/GID. This breaks their packaged entrypoints and causes permission/ownership errors. Based on learnings, only CLP-owned workloads should specify a user; vendor services must run with their image defaults.Remove line 9 entirely. Then add explicit
user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"only to CLP-owned services:db-table-creator,results-cache-indices-creator,compression-scheduler,compression-worker,webui, andgarbage-collector.x-service-defaults: &service_defaults image: "${CLP_PACKAGE_CONTAINER_IMAGE_REF:-clp-package}" logging: driver: "local" stop_grace_period: "60s" - user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}"
52-87: 🔴 CRITICAL: Remove user override from database service; vendor image requires packaged user.Line 56 forces the MySQL/MariaDB container to run as the host UID/GID, which breaks the vendor entrypoint. MySQL requires its packaged user (usually
mysql). Remove this line entirely.database: <<: *service_defaults image: "${CLP_DB_IMAGE:-mariadb:10-jammy}" hostname: "database" - user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}" environment:Additionally, the
stop_grace_period: "60s"inherited fromx-service-defaultsis appropriate for safe MySQL shutdown.
132-167: 🔴 CRITICAL: Remove user override from redis service; Redis requires its packaged user.Line 136 forces Redis to run as the host UID/GID, which breaks the vendor entrypoint. Remove this line:
redis: <<: *service_defaults image: "redis:7.2.4" hostname: "redis" - user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}" ports:
350-379: Add explicit user togarbage-collectorafter removing global default.Once the global
useris removed fromx-service-defaults, add an explicit user to this CLP-owned service:garbage-collector: <<: *service_defaults hostname: "garbage_collector" + user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}" stop_grace_period: "10s" deploy:Based on learnings
215-247: Add explicit user tocompression-schedulerafter removing global default.Once the global
useris removed fromx-service-defaults, add an explicit user to this CLP-owned service:compression-scheduler: <<: *service_defaults hostname: "compression_scheduler" + user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}" stop_grace_period: "300s" environment:Based on learnings
168-198: 🔴 CRITICAL: Remove user override and fix trailing comma in command array.Two issues:
- Line 172: Remove the vendor user override so MongoDB runs with its packaged user:
results-cache: <<: *service_defaults image: "mongo:7.0.1" hostname: "results_cache" - user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}" ports:
- Line 191: Fix the trailing comma in the command array (YAML syntax error):
command: [ "--config", "/etc/mongo/mongod.conf", - "--bind_ip", "0.0.0.0", + "--bind_ip", "0.0.0.0" ]
304-349: Add explicit user towebuiafter removing global default.Once the global
useris removed fromx-service-defaults, add an explicit user to this CLP-owned service:webui: <<: *service_defaults hostname: "webui" + user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}" environment:Based on learnings
248-303: 🔴 CRITICAL: Add explicit user tocompression-workerand fix archive mount paths.Two issues:
- Add explicit user after removing global default:
compression-worker: <<: *service_defaults hostname: "compression_worker" + user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:1000}" environment:
- Fix archive mount path logic (lines 284–287). The conditional expansions
${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:+/tmp}and${CLP_ARCHIVE_OUTPUT_DIR_HOST:+/tmp}incorrectly inject/tmpinto the container target paths. WhenCLP_ARCHIVE_OUTPUT_DIR_HOSTis set to/mnt/data, the mount becomes/mnt/data:/tmp/var/data/archivesinstead of/mnt/data:/var/data/archives. Use fixed container paths:- "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-empty}\ - :${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:+/tmp}/var/data/archives" + :/var/data/archives" - "${CLP_STAGED_ARCHIVE_OUTPUT_DIR_HOST:-empty}\ - :${CLP_ARCHIVE_OUTPUT_DIR_HOST:+/tmp}/var/data/staged-archives" + :/var/data/staged-archives"Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tools/deployment/package/docker-compose.base.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
tools/deployment/package/docker-compose.base.yaml
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
tools/deployment/package/docker-compose.base.yaml
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
PR: y-scope/clp#1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Applied to files:
tools/deployment/package/docker-compose.base.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
| queue: | ||
| <<: *service_defaults | ||
| image: "rabbitmq:3.9.8" | ||
| hostname: "queue" | ||
| user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}" | ||
| environment: | ||
| RABBITMQ_DEFAULT_PASS: "${CLP_QUEUE_PASS:?Please set a value.}" | ||
| RABBITMQ_DEFAULT_USER: "${CLP_QUEUE_USER:?Please set a value.}" | ||
| RABBITMQ_LOGS: "/var/log/rabbitmq/rabbitmq.log" | ||
| ports: | ||
| - host_ip: "${CLP_QUEUE_HOST:-127.0.0.1}" | ||
| published: "${CLP_QUEUE_PORT:-5672}" | ||
| target: 5672 | ||
| volumes: | ||
| - type: "bind" | ||
| source: "${CLP_QUEUE_LOGS_DIR_HOST:-./var/log/queue}" | ||
| target: "/var/log/rabbitmq" | ||
| healthcheck: | ||
| <<: *healthcheck_defaults | ||
| test: [ | ||
| "CMD", | ||
| "rabbitmq-diagnostics", "check_running" | ||
| ] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 CRITICAL: Remove user override from queue service; RabbitMQ requires its packaged user.
Line 112 forces RabbitMQ to run as the host UID/GID, which breaks the vendor entrypoint. RabbitMQ requires its packaged user. Remove this line:
queue:
<<: *service_defaults
image: "rabbitmq:3.9.8"
hostname: "queue"
- user: "${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}"
ports:Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose.base.yaml around lines 108 to 131,
remove the user override line in the queue service (the "user:
${CLP_THIRD_PARTY_SERVICE_UID_GID:-1000:1000}" entry) because RabbitMQ requires
its packaged user and the override breaks the vendor entrypoint; delete that
line and ensure no other uid/gid overrides are applied to this service (leave
environment, ports, volumes and healthcheck as-is).
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
docs/src/dev-docs/design-deployment-orchestration.md (3)
16-16: Fix missing conjunction; verify this was previously addressed.The sentence still reads "lists the services their functions" and is missing "and". Past review comments indicate this was addressed in commit ed25fc3, but the error remains. Please verify whether this fix was reverted or incompletely applied.
-[Table 1](#table-1) below lists the services their functions, while [Table 2](#table-2) lists the +[Table 1](#table-1) below lists the services and their functions, while [Table 2](#table-2) lists the
9-9: Add comma after parenthetical for clarity.The phrase reads more naturally with a comma between the parenthetical and the following phrase.
-[Figure 1](#figure-1) shows the components (*services* in orchestrator terminology) in the CLP +[Figure 1](#figure-1) shows the components (*services* in orchestrator terminology), in the CLP
153-153: Use "to" instead of "in order to" for conciseness.* `<Orchestrator>Controller` that implements (and/or overrides) any of the methods in `BaseController` (`<Orchestrator>` is a placeholder for the specific orchestrator for which the - class is being implemented in order to + class is being implemented to orchestrate a deployment).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/dev-docs/design-deployment-orchestration.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/dev-docs/design-deployment-orchestration.md
[uncategorized] ~9-~9: Possible missing comma found.
Context: ...in orchestrator terminology) in the CLP package as well as their dependencies. The CLP ...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~153-~153: Consider a more concise word here.
Context: ...specific derived classes must implement in order to orchestrate a deployment. * `<Orche...
(IN_ORDER_TO_PREMIUM)
🪛 markdownlint-cli2 (0.18.1)
docs/src/dev-docs/design-deployment-orchestration.md
9-9: Link fragments should be valid
(MD051, link-fragments)
16-16: Link fragments should be valid
(MD051, link-fragments)
16-16: Link fragments should be valid
(MD051, link-fragments)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: lint-check (macos-15)
|
|
||
| ## Architecture | ||
|
|
||
| [Figure 1](#figure-1) shows the components (*services* in orchestrator terminology) in the CLP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Verify link fragment syntax for cross-references.
The Markdown link syntax (e.g., [Figure 1](#figure-1)) references anchors defined with MyST syntax (e.g., (figure-1)= at line 19). Consider using MyST cross-reference syntax ({ref}) instead, which integrates more reliably with the documentation build system.
-[Figure 1](#figure-1) shows the components (*services* in orchestrator terminology), in the CLP
+{ref}`figure-1` shows the components (*services* in orchestrator terminology), in the CLPSimilarly, update line 16 to use {ref} instead of Markdown link syntax.
Also applies to: 16-16
🧰 Tools
🪛 LanguageTool
[uncategorized] ~9-~9: Possible missing comma found.
Context: ...in orchestrator terminology) in the CLP package as well as their dependencies. The CLP ...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 markdownlint-cli2 (0.18.1)
9-9: Link fragments should be valid
(MD051, link-fragments)
🤖 Prompt for AI Agents
In docs/src/dev-docs/design-deployment-orchestration.md around line 9 (and also
update line 16), replace the Markdown anchor-style links like [Figure
1](#figure-1) with MyST cross-reference syntax using the defined anchor name
(figure-1) — e.g., change the Markdown link to {ref}`figure-1`; do the same for
the link on line 16 so all cross-references use {ref}`<anchor-name>` matching
the (figure-1)= anchor at line 19.
| subgraph Databases | ||
| database | ||
| queue | ||
| redis | ||
| results_cache | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider renaming "Databases" group to reflect its broader role.
Past review comments discussed whether this grouping should be renamed to "State persistence", "Shared data stores", or similar, since it contains not only a database (MySQL) but also a message broker (RabbitMQ), cache (Redis), and results storage (MongoDB). No final decision was recorded. If there is team consensus, update the label to avoid confusion about the role of these components.
🤖 Prompt for AI Agents
In docs/src/dev-docs/design-deployment-orchestration.md around lines 70 to 75,
the diagram subgraph labeled "Databases" groups heterogeneous components (MySQL,
RabbitMQ, Redis, MongoDB); update the label to reflect their broader role—e.g.,
"State persistence", "Shared data stores", or "Storage & Messaging"—to remove
ambiguity. Pick the team-consensus name (or propose one if none) and replace the
subgraph title text accordingly, then update any nearby explanatory text to use
the new term for consistency.
| CLP_DB_PASS: "${CLP_DB_PASS:?Please set a value.}" | ||
| CLP_DB_USER: "${CLP_DB_USER:?Please set a value.}" | ||
| CLP_LOGGING_LEVEL: "${CLP_QUERY_SCHEDULER_LOGGING_LEVEL:-INFO}" | ||
| CLP_LOGS_DIR: "/var/log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the PR wrongly changes the query scheduler's logs directory base to /var/log (it should be /var/log/query_scheduler). a similar issue is found in the compression scheduler.
@coderabbitai create an issue to track the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the incorrect logs directory configuration
gh issue create \
--repo y-scope/clp \
--title "Fix incorrect logs directory base for schedulers in Docker Compose" \
--body "The PR #1178 incorrectly configures the logs directory base for query scheduler and compression scheduler.
**Issue:**
- Query scheduler's \`CLP_LOGS_DIR\` is set to \`/var/log\` but should be \`/var/log/query_scheduler\`
- Compression scheduler has a similar issue and should use \`/var/log/compression_scheduler\`
**Context:**
Reported by @junhaoliao in https://github.com/y-scope/clp/pull/1178#discussion_r2456746396
**Files affected:**
- \`tools/deployment/package/docker-compose.yaml\`
- \`tools/deployment/package/docker-compose.base.yaml\` (likely)" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1483 to track the incorrect logs directory configuration for the query scheduler and compression scheduler.
The issue documents that:
- Query scheduler's
CLP_LOGS_DIRshould be/var/log/query_schedulerinstead of/var/log - Compression scheduler has the same issue and should use
/var/log/compression_scheduler
The issue has been assigned to you and includes a link back to this discussion.
🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
|
When running compression workflow, I see the following logs at the very start of some compression worker: This is caused by the lack of dependency between compression worker and RabbitMQ. Compression worker starts before the RabbitMQ is fully setup, and thus cannot connect to it. Once the RabbitMQ is fully operational, everything works as normal. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 17710 🏁 Script executed: Length of output: 42 @sitaowang1998: I've opened issue #1516 to track this problem. After examining the Docker Compose configuration, I can confirm the root cause: the depends_on:
queue:
condition: "service_healthy"The The issue will track adding the missing dependencies for both workers and reviewing all other services for similar gaps. |
| directory: SerializablePath = CLP_DEFAULT_DATA_DIRECTORY_PATH / "archives" | ||
| directory: SerializablePath = CLP_DEFAULT_ARCHIVES_DIRECTORY_PATH | ||
|
|
||
| def transform_for_container(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "native" scripts - particularly native/dataset_manager.py - should have been updated to use the new in-container path (/var/data) instead of relying on the previous direct host-to-container path mapping. Otherwise, the dataset manager isn't able to remove archives at the moment.
@coderabbitai create an issue to track the archive deletion bug in the dataset manager
and another one to migrate the sbin tools to Docker Compose - or at least update their Docker mappings to align with the Package Compose project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create issue 1: Archive deletion bug in dataset manager
gh issue create \
--repo y-scope/clp \
--title "Archive deletion fails in dataset manager due to path mapping mismatch" \
--body "## Description
The native dataset manager script (\`native/dataset_manager.py\`) is currently unable to remove archives because it hasn't been updated to use the new in-container path (\`/var/data\`) introduced in the Docker Compose migration.
## Root Cause
The script still relies on the previous direct host-to-container path mapping instead of the new container path structure.
## Related
- PR: https://github.com/y-scope/clp/pull/1178
- Comment: https://github.com/y-scope/clp/pull/1178#discussion_r2476634906
- Reported by: @junhaoliao
## Expected Fix
Update \`native/dataset_manager.py\` and related native scripts to use the correct in-container paths that align with the Docker Compose deployment model." \
--assignee junhaoliao
echo "---"
# Create issue 2: Migrate sbin tools to Docker Compose
gh issue create \
--repo y-scope/clp \
--title "Migrate sbin tools to Docker Compose or align their Docker mappings" \
--body "## Description
The sbin tools need to be migrated to use Docker Compose orchestration, or at minimum, their Docker volume/path mappings should be updated to align with the Package Compose project structure.
## Context
Following the migration to Docker Compose in PR #1178, the sbin tools' container orchestration and path mappings are out of sync with the new deployment model.
## Scope
Either:
- Fully migrate sbin tools to Docker Compose orchestration, or
- Update their Docker run commands and volume mappings to align with the Package Compose project
## Related
- PR: https://github.com/y-scope/clp/pull/1178
- Comment: https://github.com/y-scope/clp/pull/1178#discussion_r2476634906
- Reported by: @junhaoliao" \
--assignee junhaoliao
Length of output: 89
@junhaoliao, I've created both issues to track these follow-up items:
-
Archive deletion bug: #1526 - Archive deletion fails in dataset manager due to path mapping mismatch
- Tracks the immediate bug where
native/dataset_manager.pycan't remove archives due to the path mapping mismatch with the new/var/datain-container path structure.
- Tracks the immediate bug where
-
Sbin tools migration: #1527 - Migrate sbin tools to Docker Compose or align their Docker mappings
- Tracks the broader task of either fully migrating sbin tools to Docker Compose or updating their Docker volume/path mappings to align with the Package Compose project.
Both issues are assigned to you and include backlinks to this PR and comment thread.
🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Learnt from: haiqi96
PR: y-scope/clp#1144
File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170
Timestamp: 2025-08-13T15:07:37.767Z
Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved/absolute Path object because FsStorage.make_config_paths_absolute() processes the directory path during configuration loading, making additional .resolve() calls redundant.
Learnt from: haiqi96
PR: y-scope/clp#1144
File: components/clp-package-utils/clp_package_utils/scripts/native/dataset_manager.py:164-170
Timestamp: 2025-08-13T15:07:37.767Z
Learning: In the CLP codebase, the ArchiveOutput.get_directory() method already returns a resolved/absolute Path object because FsStorage.make_config_paths_absolute() processes the directory path during configuration loading, making additional .resolve() calls redundant.
Learnt from: haiqi96
PR: y-scope/clp#1144
File: components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py:106-114
Timestamp: 2025-08-13T14:48:49.020Z
Learning: For the dataset manager scripts in components/clp-package-utils/clp_package_utils/scripts/, the native script (native/dataset_manager.py) is designed to only be called through the wrapper script (dataset_manager.py), so dataset validation is only performed at the wrapper level rather than duplicating it in the native script.
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Learnt from: junhaoliao
PR: y-scope/clp#1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Learnt from: junhaoliao
PR: y-scope/clp#1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Learnt from: junhaoliao
PR: y-scope/clp#1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Description
docker runorchestration with a controller-driven Compose workflow that provisions env/config, writes a.env, and brings services up/down withdocker compose.chownhandling.BaseControllerandDockerComposeControllerto encapsulate provisioning, environment generation, and orchestrator-specific lifecycle commands while persisting shared container config for the UI..envsynthesis covers credentials, storage paths, concurrency, logging levels, AWS auth, and service endpoints._set_up_env_for_*method that validates directories/configs, pre-creates paths, and resolves hostnames for Compose.CLPConfig.transform_for_container()(and per-subconfig overrides) translate host settings into container-friendly service names and absolute paths;generate_docker_compose_container_config()returns the transformed copy used by Compose.start_clp.py/stop_clp.pynow validate config, ensure directories exist, and delegate lifecycle management toDockerComposeController.docker-compose.base.yaml,docker-compose.yaml) ship with the package,taskfile.yamlcopies them into releases, and new docs describe the design and prerequisites.Breaking Changes
sbin/start-clp.shandsbin/stop-clp.shno longer accept component names or partial targets. The new controller always spins up or tears down the full Compose project, so existing automation that started individual services (e.g.,start-clp.sh query_worker) will fail.--forceoption, which prevented removal of killed containers when not specified, is also removed from thesbin/stop-clp.sh.--num-workersoption, which specifies a concurrency count for targeted component launches (e.g., compression_worker and query_worker), is also removed as a result. We will bring back the configuration ability with a potentially different interface. See Revisit worker parallelism strategy after multi-container worker migration #1424 for more updates.clp-config.yml. Existing installations relying on<base>/<hostname>directories must update paths or pre-create per-host directories themselves.docker --version, instead ofcommand -v docker, is now used to check docker availability.clp_py_utils.clp_config.get_components_for_target()and related component group/target constants were deleted; external tooling importing these symbols will break.CLPConfig.get_runnable_components()replaced byCLPConfig.get_deployment_type(), altering how callers determine which services to launch.clp_package_utils.general.check_dependencies()replaced bycheck_docker_dependencies(...)(different signature/semantics); scripts calling the old helper need to migrate.Added
components/clp-package-utils/clp_package_utils/controller.py: controller abstraction, Compose-specific implementation, shared config dump,.envwriter, worker auto-sizing, instance ID persistence.tools/deployment/package/: base stack and full-stack YAML defining services, volumes, health checks, and deployment-type variants.docs/src/dev-docs/design-docker-compose.md, index link, build doc note, quick-start Docker version requirements.transform_for_container()overrides for DB, queue, Redis, Mongo, schedulers, reducers, FS/S3 storage, and log input configurations.check_docker_dependencies()verifiesdockeravailability and Compose project state before start/stop.Other Removed
start_clp.pyhelpers, Docker mount builders, manual docker exec/wait loops,get_components_for_targetAPI) in favour of Compose-driven lifecycle management.Documentation
Checklist
breaking change.
Validation performed
(to be updated with more details once they're all revalidated with the latest batch of refactoring changes). Previously:
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Documentation
Chores