Skip to content

Conversation

@junhaoliao
Copy link
Member

@junhaoliao junhaoliao commented Oct 13, 2025

Description

(Pending bug fixing / code refactoring changes in #1413 )

This PR includes the libmariadbcpp.so shared library into the clp-package Docker image. The LD_LIBRARY_PATH in the Docker image is also updated so binaries executions will look up the clp ld library path for dynamic libraries.

Also, to avoid bloating the clp-package image, the build type of mariadb-connector-cpp is changed to Release from the default RelWithDebInfo, which reduces the libmariadbcpp.so's size from 23135472 bytes (23M) to 2329080 bytes (2.3M) which 10x size reduction.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Linkage fix

  1. task docker-images:package.
  2. junhao@ASUS-X870E:~/workspace/5-clp/tools/docker-images/clp-package$ docker run -it --rm "$(docker images --format '{{.Repository}}:{{.Tag}}' \
      | grep 'clp-package' | head -n1)" /bin/bash
    groups: cannot find name for group ID 1000
    I have no name!@53098ce876cf:/$ which spider_scheduler
    /opt/clp/bin/spider_scheduler
    I have no name!@53098ce876cf:/$ ldd /opt/clp/bin/spider_scheduler
            linux-vdso.so.1 (0x00007ffcab0b7000)
            libmariadbcpp.so => /opt/clp/lib/libmariadbcpp.so (0x00007d10b2966000)
            libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007d10b2738000)
            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007d10b2651000)
            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007d10b2631000)
            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007d10b2408000)
            /lib64/ld-linux-x86-64.so.2 (0x00007d10b2cae000)
            libmariadb.so.3 => /lib/x86_64-linux-gnu/libmariadb.so.3 (0x00007d10b23aa000)
            libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007d10b238e000)
            libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007d10b22ea000)
            libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007d10b1ea6000)
    

Size reduction

junhao@ASUS-X870E:~/workspace/5-clp/build/deps/cpp/mariadb-connector-cpp-build$ git switch main > /dev/null && task deps:spider > /dev/null 2>&1 && cd - && cd - && ls -l libmariadbcpp.so 
Switched to branch 'main'
/home/junhao/workspace/5-clp/build/deps/cpp
/home/junhao/workspace/5-clp/build/deps/cpp/mariadb-connector-cpp-build
-rwxr-xr-x 1 junhao junhao 2329080 Oct 12 23:07 libmariadbcpp.so
junhao@ASUS-X870E:~/workspace/5-clp/build/deps/cpp/mariadb-connector-cpp-build$ git switch bundle-mariadb > /dev/null && task deps:spider > /dev/null 2>&1 && cd - && cd - && ls -l libmariadbcpp.so 
Switched to branch 'bundle-mariadb'
/home/junhao/workspace/5-clp/build/deps/cpp
/home/junhao/workspace/5-clp/build/deps/cpp/mariadb-connector-cpp-build
-rwxr-xr-x 1 junhao junhao 2329080 Oct 12 23:07 libmariadbcpp.so

Summary by CodeRabbit

  • New Features

    • Image now includes the MariaDB C++ client shared library and exposes it at runtime via the library path.
  • Chores

    • Standardized and consolidated environment variables (CLP_HOME, PATH including sbin, PYTHONPATH, LD_LIBRARY_PATH) and reordered setup for multi-stage builds.
    • Built the MariaDB C++ connector in Release mode for installation.

…r` and `spider_worker`) in the `clp-package` image; Set `mariadb-connector-cpp` build type to `Release` (fixes y-scope#1410).
@junhaoliao junhaoliao requested a review from a team as a code owner October 13, 2025 03:20
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 13, 2025

Walkthrough

Adds a Release build flag to the mariadb-connector-cpp CMake generation and updates the clp-package Dockerfile to include libmariadbcpp.so in the image, set LD_LIBRARY_PATH, and reorder/consolidate final-stage ENV and USER declarations.

Changes

Cohort / File(s) Summary
Deps build configuration
taskfiles/deps/main.yaml
Adds -DCMAKE_BUILD_TYPE=Release to CMAKE_GEN_ARGS for the mariadb-connector-cpp installation step.
Docker image packaging
tools/docker-images/clp-package/Dockerfile
Copies libmariadbcpp.so into /opt/clp/lib, sets LD_LIBRARY_PATH to include that directory, consolidates and reorders ENV entries (CLP_HOME, PATH, PYTHONPATH, LD_LIBRARY_PATH) and moves USER/ENV declarations to the final stage.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Container as clp-package Container
    participant App as CLP App
    participant Loader as Dynamic Linker
    participant LibDir as /opt/clp/lib

    User->>Container: Run CLP app
    Container->>App: Start process (PATH resolves binary)
    App->>Loader: Request libmariadbcpp.so
    Note right of Loader #DDDDDD: LD_LIBRARY_PATH includes /opt/clp/lib
    Loader->>LibDir: Lookup libmariadbcpp.so
    LibDir-->>Loader: Return shared library
    Loader-->>App: Library loaded
    App-->>User: Continues execution
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Points to review:

  • Verify the CMAKE_GEN_ARGS addition doesn't conflict with other build configurations or intended debug builds.
  • Confirm libmariadbcpp.so source path and COPY --link semantics in the Dockerfile work across CI/build environments.
  • Validate LD_LIBRARY_PATH ordering and that runtime loader will locate other native deps as expected.

Possibly related issues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title accurately describes the two primary changes in the changeset. It explicitly mentions both the inclusion of libmariadbcpp.so in the clp-package Docker image (which corresponds to the Dockerfile modifications adding the library copy and LD_LIBRARY_PATH updates) and the setting of mariadb-connector-cpp build type to Release (which corresponds to the CMAKE_GEN_ARGS addition in taskfiles/deps/main.yaml). The title is clear and specific enough for a reviewer scanning the history to understand the main changes, and it appropriately references the issue being fixed.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef046e5 and 7c028f1.

📒 Files selected for processing (2)
  • taskfiles/deps/main.yaml (1 hunks)
  • tools/docker-images/clp-package/Dockerfile (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-13T03:32:19.293Z
Learnt from: junhaoliao
PR: y-scope/clp#1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.293Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Applied to files:

  • tools/docker-images/clp-package/Dockerfile
📚 Learning: 2025-10-20T21:05:30.417Z
Learnt from: junhaoliao
PR: y-scope/clp#1413
File: tools/docker-images/clp-package/Dockerfile:22-24
Timestamp: 2025-10-20T21:05:30.417Z
Learning: In the clp repository's Dockerfiles, ENV directives should be consolidated into multi-line ENV statements when possible to reduce image layers. ENV statements should only be split into separate commands when consolidation is not possible due to dependencies (e.g., when later variables must reference earlier ones that need to be set first, or when PATH must be modified sequentially).

Applied to files:

  • tools/docker-images/clp-package/Dockerfile
🪛 Checkov (3.2.334)
tools/docker-images/clp-package/Dockerfile

[low] 1-32: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (3)
taskfiles/deps/main.yaml (1)

409-409: Release build configuration aligns with project patterns and PR objectives

Setting -DCMAKE_BUILD_TYPE=Release is consistent with the build strategy used across all other C++ dependencies in this file (boost, fmt, zstd, etc.) and directly addresses the PR objective to reduce libmariadbcpp.so size. The change is minimal and focused, as clarified in prior discussion.

tools/docker-images/clp-package/Dockerfile (2)

31-32: Architecture-agnostic library inclusion with SONAME symlink support

The glob pattern lib/*/libmariadbcpp.so* correctly:

  • Handles multi-arch Debian paths (e.g., x86_64-linux-gnu, aarch64-linux-gnu) without hardcoding architecture.
  • Captures SONAME symlinks (libmariadbcpp.so.1, etc.) via wildcard suffix, allowing the dynamic linker to correctly resolve dependencies.
  • Maintains consistent --chown=${UID} with the preceding COPY (line 30) to preserve ownership.

This fully addresses the prior concern about architecture-specific hardcoding and SONAME symlink omission.


20-24: Environment variable structure follows dependency-aware consolidation

The ENV directives correctly split CLP_HOME (line 20) as a separate statement since subsequent variables (LD_LIBRARY_PATH, PATH, PYTHONPATH) depend on it. This aligns with the established pattern: split when there are inter-variable dependencies, consolidate otherwise.

LD_LIBRARY_PATH set to ${CLP_HOME}/lib enables spider binaries to locate libmariadbcpp.so at runtime via the dynamic loader, fulfilling the PR objective.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao marked this pull request as draft October 13, 2025 03:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/docker-images/clp-package/Dockerfile (1)

1-27: Optional: add a simple HEALTHCHECK

A lightweight HEALTHCHECK (e.g., invoking a “--version” on a core binary) improves observability. Checkov flagged its absence.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e5ab12 and ef046e5.

📒 Files selected for processing (2)
  • taskfiles/deps/main.yaml (1 hunks)
  • tools/docker-images/clp-package/Dockerfile (1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
tools/docker-images/clp-package/Dockerfile

[low] 1-27: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
tools/docker-images/clp-package/Dockerfile (1)

27-27: Non‑root runtime: verify permissions

Running as 1000:1000 is good, but please verify the user can read/execute binaries and any needed libs/configs under /opt/clp, and write to required dirs (logs, temp).

Comment on lines 21 to 25
ENV CLP_HOME="/opt/clp"
ENV LD_LIBRARY_PATH="${CLP_HOME}/lib" \
PATH="${CLP_HOME}/bin:${PATH}" \
PYTHONPATH="${CLP_HOME}/lib/python3/site-packages"
ENV PATH="${CLP_HOME}/sbin:${PATH}"
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consolidate ENV and avoid duplicate PATH edits

Combine into one ENV to reduce layers and keep order deterministic. Optionally preserve any pre-set LD_LIBRARY_PATH.

-ENV CLP_HOME="/opt/clp"
-ENV LD_LIBRARY_PATH="${CLP_HOME}/lib" \
-    PATH="${CLP_HOME}/bin:${PATH}" \
-    PYTHONPATH="${CLP_HOME}/lib/python3/site-packages"
-ENV PATH="${CLP_HOME}/sbin:${PATH}"
+ENV CLP_HOME=/opt/clp \
+    PATH="${CLP_HOME}/sbin:${CLP_HOME}/bin:${PATH}" \
+    PYTHONPATH="${CLP_HOME}/lib/python3/site-packages" \
+    LD_LIBRARY_PATH="${CLP_HOME}/lib:${LD_LIBRARY_PATH}"
🤖 Prompt for AI Agents
In tools/docker-images/clp-package/Dockerfile around lines 21 to 25, there are
two ENV directives that duplicate PATH modification and create extra image
layers; consolidate them into a single ENV that sets CLP_HOME, LD_LIBRARY_PATH
(preserving any existing value by appending :${LD_LIBRARY_PATH} if present),
PATH (prepend ${CLP_HOME}/sbin and ${CLP_HOME}/bin before existing ${PATH} to
keep deterministic order), and PYTHONPATH in one line so PATH is only modified
once and layer count is reduced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intended to split the ENV into separate lines - see #1413 (comment) for the rationale

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the discussion/comment from PR #1413 to understand the rationale
gh pr view 1413 --repo y-scope/clp --json comments --jq '.comments[] | select(.id == "r2425132080" or .url | contains("r2425132080"))'

# Also try fetching review comments
gh api repos/y-scope/clp/pulls/1413/comments --jq '.[] | select(.id == 2425132080 or (.html_url | contains("r2425132080")))'

Length of output: 3377


@junhaoliao, thank you for the clarification! I understand now that the team has an established preference for splitting ENV directives (especially PATH modifications) into separate lines for readability, even at the cost of additional layers. This aligns with the previous discussion in #1166 and makes sense given the dependency between the ENV statements (later ones reference CLP_HOME defined first).

I'll remember this convention for future reviews of Dockerfiles in this repository.

ฅ(=^・ω・^=)ฅ


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp#1414
File: tools/docker-images/clp-package/Dockerfile:20-24
Timestamp: 2025-10-13T03:32:19.263Z
Learning: In the clp repository's Dockerfiles (e.g., tools/docker-images/clp-package/Dockerfile), ENV directives should be split into separate lines for readability rather than consolidated to reduce layer count. This is especially true for PATH modifications, as agreed upon in PR #1166. Later ENV settings may depend on earlier ones (e.g., referencing CLP_HOME).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

sitaowang1998
sitaowang1998 previously approved these changes Oct 20, 2025
@junhaoliao
Copy link
Member Author

This draft PR depends on #1413 and therefore should be merged after #1413 is merged

(@sitaowang1998 do you want to take a look at #1413 as well?)

sitaowang1998
sitaowang1998 previously approved these changes Oct 23, 2025
FROM scratch
COPY --link --from=base / /

ARG UID=1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARG UID=1000
ARG USER=clp-user
ARG UID=1000
ARG GID=1000
RUN groupadd -g "${GID}" "${UID}" \
&& useradd --create-home --base-dir /home/ --shell /bin/bash --gid "${GID}" --uid "${UID}" "{USER}"
USER "${USER}"
WORKDIR "/home/${USER}"

if you want to mimic a real user home working environment. I think it would be beneficial to pass in the user id and group id of the github CI runner that builds and runs this package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if your system works as is, I don't really see a problem with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. the changes on the user creation were actually done by another PR #1413 . since this PR aims to only make the spider binaries work, let's do the group creation in another PR.

@coderabbitai add an issue to track the group creation in the docker image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track group creation in the docker image

gh issue create \
  --title "Improve user and group setup in clp-package Docker image" \
  --body "## Context

This issue tracks improvements to the user and group creation process in the \`clp-package\` Docker image, as discussed in PR #1414.

## Background

During review of PR #1414, it was suggested that the Docker image could benefit from proper group creation and a more realistic user home environment setup. The current implementation creates a user but does not create a dedicated group or set up a traditional home directory structure.

## Suggested Improvements

- Add group creation with configurable GID
- Create user with proper group assignment
- Set up a traditional home directory under \`/home/\`
- Make UID and GID configurable as build arguments

## References

- PR: https://github.com/y-scope/clp/pull/1414
- Comment: https://github.com/y-scope/clp/pull/1414#discussion_r2455360711
- Requested by: @junhaoliao

## Notes

These changes were deferred from PR #1414 to keep that PR focused on making the spider binaries work. PR #1413 already made some initial user creation changes." \
  --assignee junhaoliao

Length of output: 42


@junhaoliao I've created issue #1508 to track the group creation improvements for the Docker image: #1508

The issue has been assigned to you and includes references back to this PR and the discussion. Good call on keeping this PR focused on the spider binaries functionality!

# Conflicts:
#	tools/docker-images/clp-package/Dockerfile
@junhaoliao junhaoliao merged commit 1b92ca8 into y-scope:main Oct 27, 2025
31 checks passed
@junhaoliao junhaoliao deleted the bundle-mariadb branch October 27, 2025 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants