Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 0 additions & 71 deletions .github/workflows/clp-execution-image-build.yaml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,7 @@ def load_config_file(
clp_config = CLPConfig()

clp_config.make_config_paths_absolute(clp_home)
clp_config.load_execution_container_name()
clp_config.load_container_image_ref()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

LGTM! Central configuration loading updated to new container image reference mechanism.

The change from load_execution_container_name() to load_container_image_ref() centralizes the container image reference loading. Based on the related code snippet, this method reads from either clp-package-image.id or falls back to the VERSION file to construct the image reference.

Verify that load_container_image_ref() handles missing files appropriately:


🏁 Script executed:

#!/bin/bash
# Description: Verify error handling in load_container_image_ref for missing image ID and VERSION files

# Search for the implementation of load_container_image_ref
ast-grep --pattern $'def load_container_image_ref($$$):
  $$$'

# Check for file existence checks and error handling
rg -nP -A10 'def load_container_image_ref' --type=py

Length of output: 2391


Add error handling for missing VERSION file in load_container_image_ref
The method guards against a missing container-image ID file but unconditionally opens the version file, which will raise a FileNotFoundError if the VERSION file is absent. Add an existence check or wrap the version-file open in exception handling to provide a clear error message or fallback.
components/clp-py-utils/clp_py_utils/clp_config.py Lines 1019–1025

🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/general.py around line 430: the
call to clp_config.load_container_image_ref() can raise a FileNotFoundError
because the function opens the VERSION file unconditionally; update
load_container_image_ref (components/clp-py-utils/clp_py_utils/clp_config.py
lines ~1019–1025) to check for the VERSION file's existence before opening it or
wrap the open call in a try/except that catches FileNotFoundError and either
returns a sensible fallback (e.g., None or a default version) or raises a new
exception with a clear, descriptive error message that includes the path checked
so callers can handle missing VERSION files gracefully.


validate_path_for_container_mount(clp_config.data_directory)
validate_path_for_container_mount(clp_config.logs_directory)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ def main(argv: List[str]) -> int:
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd: List[str] = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
container_name, necessary_mounts, clp_config.container_image_ref, extra_env_vars
)

# fmt: off
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ def main(argv):
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
container_name, necessary_mounts, clp_config.container_image_ref, extra_env_vars
)
compress_cmd = _generate_compress_cmd(
parsed_args, dataset, generated_config_path_on_container, logs_list_path_on_container
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ def main(argv: List[str]) -> int:
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
container_name, necessary_mounts, clp_config.container_image_ref, extra_env_vars
)

if len(aws_env_vars) != 0:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ def handle_extract_file_cmd(
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
container_name, necessary_mounts, clp_config.container_image_ref, extra_env_vars
)

# fmt: off
Expand Down Expand Up @@ -211,7 +211,7 @@ def handle_extract_stream_cmd(
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
container_name, necessary_mounts, clp_config.container_image_ref, extra_env_vars
)

# fmt: off
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def main(argv):
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
container_name, necessary_mounts, clp_config.container_image_ref, extra_env_vars
)

# fmt: off
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ def create_db_tables(
mounts.generated_config_file,
]
append_docker_options(container_start_cmd, necessary_mounts, env_vars)
container_start_cmd.append(clp_config.execution_container)
container_start_cmd.append(clp_config.container_image_ref)

clp_py_utils_dir = clp_site_packages_dir / "clp_py_utils"
# fmt: off
Expand Down Expand Up @@ -306,7 +306,7 @@ def create_results_cache_indices(
env_vars = [f"PYTHONPATH={clp_site_packages_dir}"]
necessary_mounts = [mounts.clp_home, mounts.data_dir, mounts.logs_dir]
append_docker_options(container_start_cmd, necessary_mounts, env_vars)
container_start_cmd.append(clp_config.execution_container)
container_start_cmd.append(clp_config.container_image_ref)

clp_py_utils_dir = clp_site_packages_dir / "clp_py_utils"
# fmt: off
Expand Down Expand Up @@ -627,7 +627,7 @@ def generic_start_scheduler(
):
necessary_mounts.append(mounts.input_logs_dir)
append_docker_options(container_start_cmd, necessary_mounts, env_vars)
container_start_cmd.append(clp_config.execution_container)
container_start_cmd.append(clp_config.container_image_ref)

# fmt: off
scheduler_cmd = [
Expand Down Expand Up @@ -762,7 +762,7 @@ def generic_start_worker(
env_vars.extend(aws_env_vars)

append_docker_options(container_start_cmd, necessary_mounts, env_vars)
container_start_cmd.append(clp_config.execution_container)
container_start_cmd.append(clp_config.container_image_ref)

worker_cmd = [
"python3",
Expand Down Expand Up @@ -969,7 +969,7 @@ def start_webui(
necessary_mounts.append(mounts.stream_output_dir)

append_docker_options(container_cmd, necessary_mounts, env_vars)
container_cmd.append(clp_config.execution_container)
container_cmd.append(clp_config.container_image_ref)

node_cmd = [
str(CONTAINER_CLP_HOME / "bin" / "node-22"),
Expand Down Expand Up @@ -1023,7 +1023,7 @@ def start_reducer(
mounts.generated_config_file,
]
append_docker_options(container_start_cmd, necessary_mounts, env_vars)
container_start_cmd.append(clp_config.execution_container)
container_start_cmd.append(clp_config.container_image_ref)

# fmt: off
reducer_cmd = [
Expand Down Expand Up @@ -1101,7 +1101,7 @@ def start_garbage_collector(
env_vars.extend(aws_env_vars)

append_docker_options(container_start_cmd, necessary_mounts, env_vars)
container_start_cmd.append(clp_config.execution_container)
container_start_cmd.append(clp_config.container_image_ref)

# fmt: off
garbage_collector_cmd = [
Expand Down
37 changes: 19 additions & 18 deletions components/clp-py-utils/clp_py_utils/clp_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,14 +83,13 @@
COMPRESSION_JOBS_TABLE_NAME = "compression_jobs"
COMPRESSION_TASKS_TABLE_NAME = "compression_tasks"

OS_RELEASE_FILE_PATH = pathlib.Path("etc") / "os-release"

CLP_DEFAULT_CREDENTIALS_FILE_PATH = pathlib.Path("etc") / "credentials.yml"
CLP_DEFAULT_DATA_DIRECTORY_PATH = pathlib.Path("var") / "data"
CLP_DEFAULT_DATASET_NAME = "default"
CLP_METADATA_TABLE_PREFIX = "clp_"
CLP_PACKAGE_CONTAINER_IMAGE_ID_PATH = pathlib.Path("clp-package-image.id")
CLP_SHARED_CONFIG_FILENAME = ".clp-config.yml"

CLP_VERSION_FILE_PATH = pathlib.Path("VERSION")

# Environment variable names
CLP_DB_USER_ENV_VAR_NAME = "CLP_DB_USER"
Expand Down Expand Up @@ -876,7 +875,7 @@ def _get_env_var(name: str) -> str:


class CLPConfig(BaseModel):
execution_container: Optional[str] = None
container_image_ref: Optional[str] = None

logs_input: Union[FsIngestionConfig, S3IngestionConfig] = FsIngestionConfig()

Expand All @@ -902,7 +901,10 @@ class CLPConfig(BaseModel):
logs_directory: pathlib.Path = pathlib.Path("var") / "log"
aws_config_directory: Optional[pathlib.Path] = None

_os_release_file_path: pathlib.Path = PrivateAttr(default=OS_RELEASE_FILE_PATH)
_container_image_id_path: pathlib.Path = PrivateAttr(
default=CLP_PACKAGE_CONTAINER_IMAGE_ID_PATH
)
_version_file_path: pathlib.Path = PrivateAttr(default=CLP_VERSION_FILE_PATH)

@field_validator("aws_config_directory")
@classmethod
Expand All @@ -919,7 +921,10 @@ def make_config_paths_absolute(self, clp_home: pathlib.Path):
self.stream_output.storage.make_config_paths_absolute(clp_home)
self.data_directory = make_config_path_absolute(clp_home, self.data_directory)
self.logs_directory = make_config_path_absolute(clp_home, self.logs_directory)
self._os_release_file_path = make_config_path_absolute(clp_home, self._os_release_file_path)
self._container_image_id_path = make_config_path_absolute(
clp_home, self._container_image_id_path
)
self._version_file_path = make_config_path_absolute(clp_home, self._version_file_path)

def validate_logs_input_config(self):
logs_input_type = self.logs_input.type
Expand Down Expand Up @@ -1007,22 +1012,18 @@ def validate_aws_config_dir(self):
"aws_config_directory should not be set when profile authentication is not used"
)

def load_execution_container_name(self):
if self.execution_container is not None:
def load_container_image_ref(self):
if self.container_image_ref is not None:
# Accept configured value for debug purposes
return

os_release = dotenv_values(self._os_release_file_path)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have removed the dotenv_values import and related dependency installation in this PR. @coderabbitai file an issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if dotenv_values is used anywhere else in the codebase
rg -n 'dotenv_values' --type=py

# Check for python-dotenv dependency in requirements files
fd -e txt -e in -e toml -e cfg --exec grep -Hn 'python-dotenv' {}

# Also check for pyproject.toml dependencies
fd pyproject.toml --exec grep -A 20 'dependencies' {}

Length of output: 1741


🏁 Script executed:

#!/bin/bash
# Check for dotenv usage specifically in the clp-py-utils component
rg -n 'dotenv' components/clp-py-utils/ --type=py

Length of output: 85


🏁 Script executed:

#!/bin/bash
# File an issue to track the removal of unused dotenv_values import and dependency
gh issue create \
  --title "Remove unused python-dotenv dependency from clp-py-utils" \
  --body "## Context
After PR #1357, the \`dotenv_values\` import is no longer used in \`components/clp-py-utils/clp_py_utils/clp_config.py\`.

## Tasks
- [ ] Remove the unused import: \`from dotenv import dotenv_values\` from \`components/clp-py-utils/clp_py_utils/clp_config.py\` (line 6)
- [ ] Remove the \`python-dotenv\` dependency from \`components/clp-py-utils/pyproject.toml\`

## Notes
- The \`python-dotenv\` dependency should remain in \`tools/deployment/presto-clp/scripts/requirements.txt\` as it's still used by \`tools/deployment/presto-clp/scripts/init.py\`

## References
- PR: https://github.com/y-scope/clp/pull/1357
- Comment: https://github.com/y-scope/clp/pull/1357#discussion_r2442446179
- Requested by: @junhaoliao" \
  --assignee junhaoliao

Length of output: 42


@junhaoliao, I've filed issue #1443 to track the removal of the unused python-dotenv dependency and import.

Summary:

  • The dotenv_values import in components/clp-py-utils/clp_py_utils/clp_config.py (line 6) is no longer used after this PR
  • The python-dotenv dependency in components/clp-py-utils/pyproject.toml can be removed
  • The dependency should remain in tools/deployment/presto-clp/scripts/requirements.txt since it's still actively used by tools/deployment/presto-clp/scripts/init.py

The issue has been assigned to you with the tasks and context needed for the cleanup.


🧬 Code Graph Analysis Results

components/clp-py-utils/clp_py_utils/clp_config.py

[Snippet 1]

  • Filename: components/clp-py-utils/clp_py_utils/clp_config.py

  • Line(s): 6-6

  • Content:
    from dotenv import dotenv_values

  • Rationale: This import is directly related to the user comment about removing dotenv_values usage and the related dependency installation.

if "ubuntu" == os_release["ID"]:
self.execution_container = (
f"clp-execution-x86-{os_release['ID']}-{os_release['VERSION_CODENAME']}:main"
)
if self._container_image_id_path.exists():
with open(self._container_image_id_path) as image_id_file:
self.container_image_ref = image_id_file.read().strip()
else:
raise NotImplementedError(
f"Unsupported OS {os_release['ID']} in {OS_RELEASE_FILE_PATH}"
)

self.execution_container = "ghcr.io/y-scope/clp/" + self.execution_container
with open(self._version_file_path) as version_file:
clp_package_version = version_file.read().strip()
self.container_image_ref = f"ghcr.io/y-scope/clp/clp-package:{clp_package_version}"

def get_shared_config_file_path(self) -> pathlib.Path:
return self.logs_directory / CLP_SHARED_CONFIG_FILENAME
Expand Down
19 changes: 0 additions & 19 deletions docs/src/dev-docs/tooling-containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,24 +129,6 @@ Jammy x86 environment.
components/core/tools/docker-images/clp-core-ubuntu-jammy
```

## clp-execution-x86-ubuntu-jammy

An image containing the dependencies necessary to run the CLP package in an Ubuntu Jammy x86
environment.

* [GitHub Packages page][exe-ubuntu-jammy]
* Pull command:

```bash
docker pull ghcr.io/y-scope/clp/clp-execution-x86-ubuntu-jammy:main
```

* Path:

```text
tools/docker-images/clp-execution-base-ubuntu-jammy
```

## clp-package

An image containing the CLP package.
Expand All @@ -162,7 +144,6 @@ An image containing the CLP package.
[core-deps-musllinux_1_2-x86_64]: https://github.com/y-scope/clp/pkgs/container/clp%2Fclp-core-dependencies-x86-musllinux_1_2
[core-deps-ubuntu-jammy]: https://github.com/y-scope/clp/pkgs/container/clp%2Fclp-core-dependencies-x86-ubuntu-jammy
[core-ubuntu-jammy]: https://github.com/y-scope/clp/pkgs/container/clp%2Fclp-core-x86-ubuntu-jammy
[exe-ubuntu-jammy]: https://github.com/y-scope/clp/pkgs/container/clp%2Fclp-execution-x86-ubuntu-jammy
[gh-packages]: https://github.com/orgs/y-scope/packages?repo_name=clp
[manylinux_2_28]: https://github.com/pypa/manylinux?tab=readme-ov-file#manylinux_2_28-almalinux-8-based
[musllinux_1_2]: https://github.com/pypa/manylinux?tab=readme-ov-file#musllinux_1_2-alpine-linux-322-based-313-compatible
5 changes: 0 additions & 5 deletions docs/src/dev-docs/tooling-gh-workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,6 @@ need it for one OS since users can use the container on any OS.

This workflow builds CLP-core on macOS and runs its unit tests.

## clp-execution-image-build

This workflow builds a container image that contains the dependencies necessary to run the CLP
package.

## clp-lint

This workflow runs linting checks on the codebase.
2 changes: 1 addition & 1 deletion taskfile.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ vars:

tasks:
default:
deps: ["package"]
deps: ["docker-images:package"]

clean:
cmds:
Expand Down
1 change: 1 addition & 0 deletions taskfiles/docker-images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ tasks:
- ":package"
cmds:
- "./build.sh"
- "rsync --archive '{{.G_BUILD_DIR}}/clp-package-image.id' '{{.G_PACKAGE_BUILD_DIR}}'"
16 changes: 0 additions & 16 deletions tools/docker-images/clp-execution-base-ubuntu-jammy/Dockerfile

This file was deleted.

38 changes: 0 additions & 38 deletions tools/docker-images/clp-execution-base-ubuntu-jammy/build.sh

This file was deleted.

Loading
Loading