-
Notifications
You must be signed in to change notification settings - Fork 83
feat(clp-package)!: Containerize the scripts in sbin via Docker Compose; Remove redundant assets from the package (resolves #1358).
#1512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
00efbc1
ed3dde2
d307457
f871424
b03539a
1b65afe
3b4d44c
347b181
74a3921
885e9b5
0db2f30
695ed77
4f21e0f
a7c81a3
e6e8217
be2d200
d910944
163921b
3f5c302
65d421f
83c581c
e03a361
299795f
0ccd155
beacdcf
c7141e7
25c2450
0ac4501
479fc81
9921d48
a2dd5cf
8d6deb6
c6c5164
3b44945
4b6f337
bb90b69
ae825c0
9a66669
6983943
e372661
bee1b79
b0e8abf
5172963
bb5d8d2
95a8e1f
395e42f
3af2e4d
198f398
1d02051
3063e85
306997a
4f6c5b4
ef627c6
75ada35
969fde0
732dc92
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need to make sure the @coderabbitai create an issue to track this
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've created issue #1558 to track this concern about ensuring You're right to flag this—the 🧬 Code Graph Analysis Results[content below identifies relevant snippets to address the user comment; do not perform the user comment analysis here]
Note: The user comment mentions ensuring .resolve() calls properly resolve relative and symlinks in host path space so that "mounted_path"s are valid inside containers. The above snippets illustrate:
🧠 Learnings used |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -286,7 +286,7 @@ def main(argv): | |
| container_clp_config, clp_config, get_container_config_filename(container_name) | ||
| ) | ||
|
|
||
| necessary_mounts = [mounts.clp_home, mounts.data_dir, mounts.logs_dir] | ||
| necessary_mounts = [mounts.data_dir, mounts.logs_dir] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainMount AWS config into the container when configured. Without mounts.aws_config_dir, profile-based S3 auth will fail inside the job container. Apply this diff: - necessary_mounts = [mounts.data_dir, mounts.logs_dir]
+ necessary_mounts = [mounts.data_dir, mounts.logs_dir, mounts.aws_config_dir]🏁 Script executed: #!/bin/bash
# Check other scripts also include aws_config_dir when S3 is used
rg -nC2 'necessary_mounts\s*=\s*\[.*aws_config_dir' components/clp-package-utils/clp_package_utils/scriptsLength of output: 0 🏁 Script executed: #!/bin/bash
# 1. Check the current state at line 291 in compress_from_s3.py
echo "=== Current code around line 291 ==="
sed -n '285,295p' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== All necessary_mounts definitions ==="
rg -n 'necessary_mounts\s*=' components/clp-package-utils/clp_package_utils/scripts
echo ""
echo "=== Check for aws_config_dir references ==="
rg -n 'aws_config_dir' components/clp-package-utils/clp_package_utils/scripts
echo ""
echo "=== Check for S3 usage context in compress_from_s3.py ==="
rg -n 'S3\|s3\|aws' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py | head -20Length of output: 1638 🏁 Script executed: #!/bin/bash
# Check decompress.py line 115 context
echo "=== decompress.py:115 context ==="
sed -n '110,125p' components/clp-package-utils/clp_package_utils/scripts/decompress.py
echo ""
echo "=== dataset_manager.py:143 context (conditional logic) ==="
sed -n '135,150p' components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
echo ""
echo "=== Broader search for S3/credentials in compress_from_s3.py ==="
rg -n 'boto\|credentials\|profile\|AWS\|s3_' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Check if compress_from_s3.py references dataset_manager or similar patterns ==="
rg -n 'aws_config\|S3\|boto' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Check mounts object to understand aws_config_dir ==="
rg -n 'aws_config_dir' components/clp-package-utils/clp_package_utils -A2 -B2 | head -40Length of output: 5049 🏁 Script executed: #!/bin/bash
# Check if compress_from_s3.py calls generate_container_auth_options
echo "=== generate_container_auth_options in compress_from_s3.py ==="
rg -n 'generate_container_auth_options' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Full compress_from_s3.py file to understand S3 usage ==="
wc -l components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== First 50 lines to understand imports/purpose ==="
head -50 components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Lines around 291 with broader context (270-310) ==="
sed -n '270,310p' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Check for boto3, requests, or URL downloading ==="
grep -n 'boto3\|requests\|urlopen\|download\|fetch' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pyLength of output: 3602 🏁 Script executed: #!/bin/bash
# Check generate_container_auth_options to understand when aws_mount is needed
echo "=== generate_container_auth_options definition and logic ==="
rg -n 'def generate_container_auth_options' components/clp-package-utils/clp_package_utils -A 20
echo ""
echo "=== Check if compress_from_s3.py has any AWS config usage or should ==="
grep -n 'auth\|credential\|profile' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Search compress.py for comparison (similar compression script) ==="
sed -n '230,250p' components/clp-package-utils/clp_package_utils/scripts/compress.pyLength of output: 1323 🏁 Script executed: #!/bin/bash
# Check what happens after necessary_mounts is created - how are URLs handled?
echo "=== compress_from_s3.py lines 310-333 (end of file) ==="
sed -n '310,333p' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Check if container receives credentials through env vars instead ==="
rg -n 'AWS\|credential\|profile' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Verify: does dataset_manager call generate_container_auth_options and get aws_mount? ==="
sed -n '138,148p' components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py
echo ""
echo "=== Look for where generate_container_auth_options is defined ==="
rg -n 'def generate_container_auth_options' components/clp-package-utils/clp_package_utilsLength of output: 1401 🏁 Script executed: #!/bin/bash
# Search more broadly for generate_container_auth_options
echo "=== Search all Python files for generate_container_auth_options ==="
rg -n 'generate_container_auth_options' components/clp-package-utils/clp_package_utils
echo ""
echo "=== Check dataset_manager imports ==="
head -40 components/clp-package-utils/clp_package_utils/scripts/dataset_manager.py | grep -n import
echo ""
echo "=== Search in clp_py_utils for the function ==="
find components/clp-package-utils -name '*.py' -type f -exec grep -l 'generate_container_auth_options' {} \;Length of output: 859 🏁 Script executed: #!/bin/bash
# Check what _generate_url_list does and if it uses AWS credentials
echo "=== compress_from_s3.py _generate_url_list function ==="
sed -n '38,120p' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
echo ""
echo "=== Check S3 constants ==="
rg -n 'S3_OBJECT_COMPRESSION\|S3_KEY_PREFIX_COMPRESSION' components/clp-package-utils/clp_package_utils/general.py
echo ""
echo "=== Check _validate_s3_object_args to see what S3 setup is needed ==="
sed -n '150,180p' components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.pyLength of output: 3886 Include AWS config directory mount AND environment variables when S3 is used, following the pattern in dataset_manager.py. The suggested fix adds The fix should:
🤖 Prompt for AI Agents |
||
|
|
||
| while True: | ||
| container_url_list_filename = f"{uuid.uuid4()}.txt" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,6 @@ | ||
| import argparse | ||
| import logging | ||
| import os | ||
| import pathlib | ||
| import shlex | ||
| import subprocess | ||
|
|
@@ -110,7 +111,6 @@ def handle_extract_file_cmd( | |
| extraction_dir.mkdir(exist_ok=True) | ||
| container_extraction_dir = pathlib.Path("/") / "mnt" / "extraction-dir" | ||
| necessary_mounts = [ | ||
| mounts.clp_home, | ||
| mounts.data_dir, | ||
| mounts.logs_dir, | ||
| mounts.archives_output_dir, | ||
|
|
@@ -205,7 +205,7 @@ def handle_extract_stream_cmd( | |
| generated_config_path_on_container, generated_config_path_on_host = dump_container_config( | ||
| container_clp_config, clp_config, get_container_config_filename(container_name) | ||
| ) | ||
| necessary_mounts = [mounts.clp_home, mounts.logs_dir] | ||
| necessary_mounts = [mounts.logs_dir] | ||
| extra_env_vars = { | ||
| CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username, | ||
| CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password, | ||
|
|
@@ -298,8 +298,9 @@ def main(argv): | |
| file_extraction_parser.add_argument( | ||
| "-f", "--files-from", help="A file listing all files to extract." | ||
| ) | ||
| default_extraction_dir = pathlib.Path(os.environ.get("CLP_PWD_HOST", ".")) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we don't explicitly set and read this, |
||
| file_extraction_parser.add_argument( | ||
| "-d", "--extraction-dir", metavar="DIR", default=".", help="Extract files into DIR." | ||
| "-d", "--extraction-dir", metavar="DIR", default=default_extraction_dir, help="Extract files into DIR." | ||
| ) | ||
|
|
||
| # IR extraction command parser | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
junhaoliao marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| package_root=$(readlink -f "$script_dir/..") | ||
|
|
||
| if [[ -n "${CLP_HOME:-}" ]]; then | ||
| export CLP_HOME="$CLP_HOME" | ||
| else | ||
| export CLP_HOME="$package_root" | ||
| fi | ||
|
|
||
| image_id_file="$CLP_HOME/clp-package-image.id" | ||
| version_file="$CLP_HOME/VERSION" | ||
|
|
||
| if [[ -f "$image_id_file" ]]; then | ||
| image_id="$(tr -d '[:space:]' < "$image_id_file")" | ||
junhaoliao marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| export CLP_PACKAGE_CONTAINER_IMAGE_REF="$image_id" | ||
| elif [[ -f "$version_file" ]]; then | ||
| version="$(tr -d '[:space:]' < "$version_file")" | ||
junhaoliao marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| export CLP_PACKAGE_CONTAINER_IMAGE_REF="clp-package:$version" | ||
| else | ||
| echo "Error: Neither clp-package-image.id nor VERSION file exists." >&2 | ||
| return 1 2>/dev/null || exit 1 | ||
| fi | ||
|
|
||
| uid="$(id --user 2>/dev/null || echo "1000")" | ||
| gid="$(getent group docker | cut -d: -f3 2>/dev/null || echo "999")" | ||
kirkrodrigues marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| export CLP_FIRST_PARTY_SERVICE_UID_GID="$uid:$gid" | ||
|
|
||
| export CLP_PWD_HOST="$(pwd)" | ||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| if [[ -z "${CLP_DOCKER_PLUGIN_DIR:-}" ]]; then | ||
| for dir in \ | ||
kirkrodrigues marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| "$HOME/.docker/cli-plugins" \ | ||
| "/mnt/wsl/docker-desktop/cli-tools/usr/local/lib/docker/cli-plugins" \ | ||
| "/usr/local/lib/docker/cli-plugins" \ | ||
| "/usr/libexec/docker/cli-plugins"; do | ||
|
|
||
| compose_plugin_path="$dir/docker-compose" | ||
| if [[ -f "$compose_plugin_path" ]]; then | ||
| export CLP_DOCKER_PLUGIN_DIR="$dir" | ||
| break | ||
| fi | ||
| done | ||
| if [[ -z "${CLP_DOCKER_PLUGIN_DIR:-}" ]]; then | ||
| echo "Warning: Docker plugin directory not found; Docker Compose may not work inside container." >&2 | ||
| fi | ||
| fi | ||
|
|
||
| socket="$(docker context inspect --format '{{.Endpoints.docker.Host}}' 2>/dev/null | sed -E 's|^unix://||')" | ||
| if [[ -S "$socket" ]]; then | ||
| export CLP_DOCKER_SOCK_PATH="$socket" | ||
| fi | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should or so the scripts early exit on any error or on any unset variables, and make a pipeline fail if any command in it fails @coderabbitai create an issue to track the improvement
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've created issue #1556 to track adding error handling flags to the shell scripts in the package template. The issue covers all the scripts in 🧠 Learnings used |
||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/../.." | ||
| common_env_path="$script_dir/../.common-env.sh" | ||
| source "$common_env_path" | ||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.archive_manager \ | ||
| "$@" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/../.." | ||
| common_env_path="$script_dir/../.common-env.sh" | ||
| source "$common_env_path" | ||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.dataset_manager \ | ||
| "$@" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/.." | ||
| common_env_path="$script_dir/.common-env.sh" | ||
| source "$common_env_path" | ||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.compress_from_s3 \ | ||
| "$@" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/.." | ||
| common_env_path="$script_dir/.common-env.sh" | ||
| source "$common_env_path" | ||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.compress \ | ||
| "$@" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/.." | ||
| common_env_path="$script_dir/.common-env.sh" | ||
| source "$common_env_path" | ||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.decompress \ | ||
| "$@" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/.." | ||
| common_env_path="$script_dir/.common-env.sh" | ||
| source "$common_env_path" | ||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.search \ | ||
| "$@" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/.." | ||
| common_env_path="$script_dir/.common-env.sh" | ||
| source "$common_env_path" | ||
junhaoliao marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.start_clp \ | ||
| "$@" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,10 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| package_root="$script_dir/.." | ||
| common_env_path="$script_dir/.common-env.sh" | ||
| source "$common_env_path" | ||
|
|
||
| PYTHONPATH=$(readlink -f "$package_root/lib/python3/site-packages") \ | ||
| docker compose -f "$CLP_HOME/docker-compose.runtime.yaml" run --rm clp-runtime \ | ||
| python3 \ | ||
| -m clp_package_utils.scripts.stop_clp \ | ||
| "$@" |
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't really have to use Docker Compose for this container definition, though Docker Compose helps it keep structured and avoids duplicate code.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. host network mode should be used or the start_clp.py script cannot properly check what IP addresses it can bind to on the host. @coderabbitai file an issue
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: Length of output: 42 @junhaoliao, I've created issue #1560 to track enabling host network mode for the 🧠 Learnings used |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| x-service-defaults: &service_defaults | ||
| image: "${CLP_PACKAGE_CONTAINER_IMAGE_REF:-clp-package}" | ||
| logging: | ||
| driver: "local" | ||
| stop_grace_period: "60s" | ||
|
|
||
| services: | ||
| clp-runtime: | ||
| <<: *service_defaults | ||
| hostname: "clp_runtime" | ||
| user: "${CLP_FIRST_PARTY_SERVICE_UID_GID:-1000:999}" | ||
| environment: | ||
| CLP_HOME: "${CLP_HOME}" | ||
| CLP_PWD_HOST: "${CLP_PWD_HOST:-${PWD}}" | ||
| volumes: | ||
kirkrodrigues marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - "${CLP_DOCKER_PLUGIN_DIR}:/usr/local/lib/docker/cli-plugins:ro" | ||
| - "${CLP_DOCKER_SOCK_PATH}:/var/run/docker.sock" | ||
| - "${CLP_HOME}:${CLP_HOME}" | ||
| - "/usr/bin/docker:/usr/bin/docker:ro" | ||
| stdin_open: true | ||
| tty: true | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mounts like this have to be removed, or they conflict with the contents inside the docker image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can you explain more?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, that comment was so unclear that it took me time to recall why it was done -
on the host, clp_home is essentially the clp-package directory root. previously we were mounting the package root into the container so the python modules in
lib/python3/site-packagescan be accessed in the container. however, we have now removed such contents from the package (recall "now the package is only 180K"); if we mount an empty directory into/opt/clpin the container, anything in/opt/clpis essentially gone. we should just use the assets in the docker image and remove such mounts then