Skip to content
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
00efbc1
feat(deployment): Containerize CLP tools via Docker Compose.
junhaoliao Oct 28, 2025
ed3dde2
revert unrelated change
junhaoliao Oct 28, 2025
d307457
fix(docker): Simplify clp-runtime service configuration in Docker Com…
junhaoliao Oct 28, 2025
f871424
debloat the package by avoiding duplicating assets already in the con…
junhaoliao Oct 28, 2025
b03539a
fix(env): Update container image reference format in .common-env.sh
junhaoliao Oct 28, 2025
1b65afe
feat(core): Add resolve_host_path utility for path resolution in cont…
junhaoliao Oct 29, 2025
3b4d44c
lint
junhaoliao Oct 29, 2025
347b181
Merge branch 'main' into containerize-tools
junhaoliao Oct 29, 2025
74a3921
refactor(utils): Simplify path validation by removing host mount para…
junhaoliao Oct 29, 2025
885e9b5
revert unrelated change
junhaoliao Oct 29, 2025
0db2f30
fix package docker workflow
junhaoliao Oct 29, 2025
695ed77
lint
junhaoliao Oct 29, 2025
4f21e0f
docs(dev-docs): Update task command name for building packages.
junhaoliao Oct 29, 2025
a7c81a3
docs: Remove Python section and streamline Docker Desktop requirements
junhaoliao Oct 29, 2025
e6e8217
Merge branch 'main' into containerize-tools
junhaoliao Oct 29, 2025
be2d200
refactor(clp-py-utils): Remove redundant debug print in config direct…
junhaoliao Oct 29, 2025
d910944
chore(scripts): Add shellcheck directive comments for sourcing common…
junhaoliao Oct 29, 2025
163921b
fix SC2155
junhaoliao Oct 29, 2025
3f5c302
fix(scripts): Ensure CLP_DOCKER_SOCK_PATH is set only when undefined.
junhaoliao Oct 29, 2025
65d421f
fix(init): invert CLP_HOME check to use -z for unset detection
junhaoliao Oct 29, 2025
83c581c
fix(scripts): Remove exit 1; Remove redundant stderr redirection in …
junhaoliao Oct 29, 2025
e03a361
chore(clp-py-utils): Remove unused os import from core module.
junhaoliao Oct 29, 2025
299795f
Update volume definitions in docker-compose.runtime.yaml to use the e…
junhaoliao Oct 29, 2025
0ccd155
fix(deployment): Add default values and validations in docker-compose…
junhaoliao Oct 29, 2025
beacdcf
fix(scripts): Adjust formatting for better readability in .common-env…
junhaoliao Oct 29, 2025
c7141e7
move `>&2` before echo
junhaoliao Oct 29, 2025
25c2450
fix(scripts): Resolve host paths for config files during load operati…
junhaoliao Oct 29, 2025
0ac4501
fix(clp-config): Add `use_host_mount` parameter to validation methods…
junhaoliao Oct 29, 2025
479fc81
lint
junhaoliao Oct 29, 2025
9921d48
fix(ci): Update package build step name to indicate the package image…
junhaoliao Oct 30, 2025
a2dd5cf
refactor(clp-py-utils): Rename CONTAINER_HOST_ROOT_DIR -> CONTAINER_D…
junhaoliao Oct 30, 2025
8d6deb6
refactor(clp-py-utils): Rename resolve_host_path -> resolve_host_path…
junhaoliao Oct 30, 2025
c6c5164
Update docstring to clarify `path` translation return value
junhaoliao Oct 30, 2025
3b44945
refactor(clp-py-utils): Rename path -> host_path and resolved -> tran…
junhaoliao Oct 30, 2025
4b6f337
refactor(taskfile): Move NODE_ENV definition to webui task.
junhaoliao Oct 30, 2025
bb90b69
refactor(taskfile): Reorder package-build-deps dependencies for clarity.
junhaoliao Oct 30, 2025
ae825c0
refactor(taskfile): Change "{{.G_BUILD_DIR}}/{{.TASK}}.md5" -> G_WEBU…
junhaoliao Oct 30, 2025
9a66669
Add back package-template to package dependencies.
junhaoliao Oct 30, 2025
6983943
fix(clp-py-utils): Explain in docs that only single-level symlink are…
junhaoliao Oct 30, 2025
e372661
Merge branch 'main' into containerize-tools
junhaoliao Oct 30, 2025
bee1b79
Reorder dependencies
junhaoliao Oct 30, 2025
b0e8abf
shfmt - Apply suggestions from code review
junhaoliao Oct 31, 2025
5172963
avoid hardcoding names - Apply suggestions from code review
junhaoliao Oct 31, 2025
bb5d8d2
shfmt
junhaoliao Nov 3, 2025
95a8e1f
Use standard form for stderr redirection
junhaoliao Nov 3, 2025
395e42f
shfmt
junhaoliao Nov 3, 2025
3af2e4d
Rename dir -> compose_plugin_dir to avoid overlap with the command dir
junhaoliao Nov 3, 2025
198f398
add docs for resolve_host_path_in_container
junhaoliao Nov 3, 2025
1d02051
Merge branch 'main' into containerize-tools
junhaoliao Nov 4, 2025
3063e85
format - Apply suggestions from code review
junhaoliao Nov 4, 2025
306997a
split volumes into groups and add docs
junhaoliao Nov 4, 2025
4f6c5b4
Move resolved webui settings paths declarations closer to their usages
junhaoliao Nov 4, 2025
ef627c6
format - Apply suggestions from code review
junhaoliao Nov 4, 2025
75ada35
Merge branch 'main' into containerize-tools
junhaoliao Nov 4, 2025
969fde0
Clarify `HOME` forwarding into the container - Apply suggestions from…
junhaoliao Nov 4, 2025
732dc92
Merge branch 'main' into containerize-tools
junhaoliao Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/clp-artifact-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,7 @@ jobs:
shell: "bash"
run: "chown $(id -u):$(id -g) -R ."

- name: "Build the package"
- name: "Build the package without the package image"
uses: "./.github/actions/run-on-image"
env:
OS_NAME: "ubuntu-jammy"
Expand All @@ -548,7 +548,8 @@ jobs:
${{needs.filter-relevant-changes.outputs.ubuntu_jammy_image_changed == 'false'
|| (github.event_name != 'pull_request' && github.ref == 'refs/heads/main')}}
run_command: >-
CLP_CPP_MAX_PARALLELISM_PER_BUILD_TASK=$(getconf _NPROCESSORS_ONLN) task package
CLP_CPP_MAX_PARALLELISM_PER_BUILD_TASK=$(getconf _NPROCESSORS_ONLN)
task package-build-deps

- uses: "./.github/actions/clp-build-runtime-image"
with:
Expand Down
80 changes: 54 additions & 26 deletions components/clp-package-utils/clp_package_utils/controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
get_datasets_table_name,
get_files_table_name,
)
from clp_py_utils.core import resolve_host_path_in_container

from clp_package_utils.general import (
check_docker_dependencies,
Expand Down Expand Up @@ -119,9 +120,12 @@ def _set_up_env_for_database(self) -> EnvVarsDict:
logs_dir = self._clp_config.logs_directory / component_name
validate_db_config(self._clp_config, conf_logging_file, data_dir, logs_dir)

data_dir.mkdir(exist_ok=True, parents=True)
logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(data_dir, logs_dir)
resolved_data_dir = resolve_host_path_in_container(data_dir)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)

resolved_data_dir.mkdir(exist_ok=True, parents=True)
resolved_logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(resolved_data_dir, resolved_logs_dir)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -166,8 +170,9 @@ def _set_up_env_for_queue(self) -> EnvVarsDict:
logs_dir = self._clp_config.logs_directory / component_name
validate_queue_config(self._clp_config, logs_dir)

logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(logs_dir)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(resolved_logs_dir)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -204,9 +209,12 @@ def _set_up_env_for_redis(self) -> EnvVarsDict:
logs_dir = self._clp_config.logs_directory / component_name
validate_redis_config(self._clp_config, conf_file, data_dir, logs_dir)

data_dir.mkdir(exist_ok=True, parents=True)
logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(data_dir, logs_dir)
resolved_data_dir = resolve_host_path_in_container(data_dir)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)

resolved_data_dir.mkdir(exist_ok=True, parents=True)
resolved_logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(resolved_data_dir, resolved_logs_dir)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -252,9 +260,12 @@ def _set_up_env_for_results_cache(self) -> EnvVarsDict:
logs_dir = self._clp_config.logs_directory / component_name
validate_results_cache_config(self._clp_config, conf_file, data_dir, logs_dir)

data_dir.mkdir(exist_ok=True, parents=True)
logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(data_dir, logs_dir)
resolved_data_dir = resolve_host_path_in_container(data_dir)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)

resolved_data_dir.mkdir(exist_ok=True, parents=True)
resolved_logs_dir.mkdir(exist_ok=True, parents=True)
_chown_paths_if_root(resolved_data_dir, resolved_logs_dir)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -291,7 +302,8 @@ def _set_up_env_for_compression_scheduler(self) -> EnvVarsDict:
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self._clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(parents=True, exist_ok=True)

env_vars = EnvVarsDict()

Expand All @@ -314,7 +326,8 @@ def _set_up_env_for_query_scheduler(self) -> EnvVarsDict:
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self._clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(parents=True, exist_ok=True)

env_vars = EnvVarsDict()

Expand All @@ -336,7 +349,8 @@ def _set_up_env_for_compression_worker(self, num_workers: int) -> EnvVarsDict:
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self._clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(parents=True, exist_ok=True)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -365,7 +379,8 @@ def _set_up_env_for_query_worker(self, num_workers: int) -> EnvVarsDict:
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self._clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(parents=True, exist_ok=True)

env_vars = EnvVarsDict()

Expand All @@ -392,7 +407,8 @@ def _set_up_env_for_reducer(self, num_workers: int) -> EnvVarsDict:
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self._clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(parents=True, exist_ok=True)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -427,7 +443,9 @@ def _set_up_env_for_webui(self, container_clp_config: CLPConfig) -> EnvVarsDict:
self._clp_home / "var" / "www" / "webui" / "server" / "dist" / "settings.json"
)
validate_webui_config(
self._clp_config, client_settings_json_path, server_settings_json_path
self._clp_config,
client_settings_json_path,
server_settings_json_path,
)

# Read, update, and write back client's and server's settings.json
Expand All @@ -454,10 +472,13 @@ def _set_up_env_for_webui(self, container_clp_config: CLPConfig) -> EnvVarsDict:
"SqlDbClpTablePrefix": table_prefix,
"SqlDbCompressionJobsTableName": COMPRESSION_JOBS_TABLE_NAME,
}
resolved_client_settings_json_path = resolve_host_path_in_container(
client_settings_json_path
)
client_settings_json = self._read_and_update_settings_json(
client_settings_json_path, client_settings_json_updates
resolved_client_settings_json_path, client_settings_json_updates
)
with open(client_settings_json_path, "w") as client_settings_json_file:
with open(resolved_client_settings_json_path, "w") as client_settings_json_file:
client_settings_json_file.write(json.dumps(client_settings_json))
Comment on lines +479 to 482
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Prefer json.dump with indentation for readability.

Small DX improvement for troubleshooting settings.

-        with open(resolved_client_settings_json_path, "w") as client_settings_json_file:
-            client_settings_json_file.write(json.dumps(client_settings_json))
+        with open(resolved_client_settings_json_path, "w") as client_settings_json_file:
+            json.dump(client_settings_json, client_settings_json_file, indent=2, sort_keys=True)
@@
-        with open(resolved_server_settings_json_path, "w") as settings_json_file:
-            settings_json_file.write(json.dumps(server_settings_json))
+        with open(resolved_server_settings_json_path, "w") as settings_json_file:
+            json.dump(server_settings_json, settings_json_file, indent=2, sort_keys=True)

Also applies to: 536-537

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/controller.py around lines
479-482 (and also apply the same change at lines ~536-537), replace the manual
json.dumps write with json.dump using an indent parameter for readability; open
the file as currently done but call json.dump(client_settings_json,
client_settings_json_file, indent=2) (or another suitable indent) so the written
JSON is pretty-printed and easier to troubleshoot.


server_settings_json_updates = {
Expand Down Expand Up @@ -509,10 +530,13 @@ def _set_up_env_for_webui(self, container_clp_config: CLPConfig) -> EnvVarsDict:
server_settings_json_updates["PrestoHost"] = None
server_settings_json_updates["PrestoPort"] = None

resolved_server_settings_json_path = resolve_host_path_in_container(
server_settings_json_path
)
server_settings_json = self._read_and_update_settings_json(
server_settings_json_path, server_settings_json_updates
resolved_server_settings_json_path, server_settings_json_updates
)
with open(server_settings_json_path, "w") as settings_json_file:
with open(resolved_server_settings_json_path, "w") as settings_json_file:
settings_json_file.write(json.dumps(server_settings_json))

env_vars = EnvVarsDict()
Expand Down Expand Up @@ -544,7 +568,9 @@ def _set_up_env_for_mcp_server(self) -> EnvVarsDict:

logs_dir = self._clp_config.logs_directory / component_name
validate_mcp_server_config(self._clp_config, logs_dir)
logs_dir.mkdir(parents=True, exist_ok=True)

resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(parents=True, exist_ok=True)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -581,7 +607,8 @@ def _set_up_env_for_garbage_collector(self) -> EnvVarsDict:
logger.info(f"Setting up environment for {component_name}...")

logs_dir = self._clp_config.logs_directory / component_name
logs_dir.mkdir(parents=True, exist_ok=True)
resolved_logs_dir = resolve_host_path_in_container(logs_dir)
resolved_logs_dir.mkdir(parents=True, exist_ok=True)

env_vars = EnvVarsDict()

Expand Down Expand Up @@ -812,13 +839,14 @@ def get_or_create_instance_id(clp_config: CLPConfig) -> str:
:return: The instance ID.
"""
instance_id_file_path = clp_config.logs_directory / "instance-id"
resolved_instance_id_file_path = resolve_host_path_in_container(instance_id_file_path)

if instance_id_file_path.exists():
with open(instance_id_file_path, "r") as f:
if resolved_instance_id_file_path.exists():
with open(resolved_instance_id_file_path, "r") as f:
instance_id = f.readline()
else:
instance_id = str(uuid.uuid4())[-4:]
with open(instance_id_file_path, "w") as f:
with open(resolved_instance_id_file_path, "w") as f:
f.write(instance_id)

return instance_id
Expand Down
35 changes: 19 additions & 16 deletions components/clp-package-utils/clp_package_utils/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
get_config_value,
make_config_path_absolute,
read_yaml_config_file,
resolve_host_path_in_container,
validate_path_could_be_dir,
)
from strenum import KebabCaseStrEnum
Expand Down Expand Up @@ -355,7 +356,8 @@ def dump_container_config(
"""
config_file_path_on_host = clp_config.logs_directory / config_filename
config_file_path_on_container = container_clp_config.logs_directory / config_filename
with open(config_file_path_on_host, "w") as f:
resolved_config_file_path_on_host = resolve_host_path_in_container(config_file_path_on_host)
with open(resolved_config_file_path_on_host, "w") as f:
yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f)

return config_file_path_on_container, config_file_path_on_host
Expand Down Expand Up @@ -463,16 +465,17 @@ def validate_credentials_file_path(
clp_config: CLPConfig, clp_home: pathlib.Path, generate_default_file: bool
):
credentials_file_path = clp_config.credentials_file_path
if not credentials_file_path.exists():
resolved_credentials_file_path = resolve_host_path_in_container(credentials_file_path)
if not resolved_credentials_file_path.exists():
if (
make_config_path_absolute(clp_home, CLP_DEFAULT_CREDENTIALS_FILE_PATH)
== credentials_file_path
and generate_default_file
):
generate_credentials_file(credentials_file_path)
generate_credentials_file(resolved_credentials_file_path)
else:
Comment on lines +469 to 476
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Also mkdir for generated default credentials file.

Generating the default file will fail if its parent dir does not exist.

Apply this diff:

-    resolved_credentials_file_path = resolve_host_path(credentials_file_path)
+    resolved_credentials_file_path = resolve_host_path(credentials_file_path)
@@
-            generate_credentials_file(resolved_credentials_file_path)
+            resolved_credentials_file_path.parent.mkdir(parents=True, exist_ok=True)
+            generate_credentials_file(resolved_credentials_file_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not resolved_credentials_file_path.exists():
if (
make_config_path_absolute(clp_home, CLP_DEFAULT_CREDENTIALS_FILE_PATH)
== credentials_file_path
and generate_default_file
):
generate_credentials_file(credentials_file_path)
generate_credentials_file(resolved_credentials_file_path)
else:
if not resolved_credentials_file_path.exists():
if (
make_config_path_absolute(clp_home, CLP_DEFAULT_CREDENTIALS_FILE_PATH)
== credentials_file_path
and generate_default_file
):
resolved_credentials_file_path.parent.mkdir(parents=True, exist_ok=True)
generate_credentials_file(resolved_credentials_file_path)
else:
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 471 to
478, the code generates a default credentials file but doesn't ensure its parent
directory exists; before calling
generate_credentials_file(resolved_credentials_file_path) create the parent
directory for resolved_credentials_file_path (e.g.,
resolved_credentials_file_path.parent.mkdir(parents=True, exist_ok=True)) so
file creation won't fail when the directory is missing.

raise ValueError(f"Credentials file path '{credentials_file_path}' does not exist.")
elif not credentials_file_path.is_file():
elif not resolved_credentials_file_path.is_file():
raise ValueError(f"Credentials file path '{credentials_file_path}' is not a file.")


Expand Down Expand Up @@ -503,7 +506,8 @@ def validate_db_config(
data_dir: pathlib.Path,
logs_dir: pathlib.Path,
):
if not component_config.exists():
resolved_component_config = resolve_host_path_in_container(component_config)
if not resolved_component_config.exists():
raise ValueError(f"{DB_COMPONENT_NAME} configuration file missing: '{component_config}'.")
_validate_data_directory(data_dir, DB_COMPONENT_NAME)
_validate_log_directory(logs_dir, DB_COMPONENT_NAME)
Expand All @@ -523,7 +527,8 @@ def validate_redis_config(
data_dir: pathlib.Path,
logs_dir: pathlib.Path,
):
if not component_config.exists():
resolved_component_config = resolve_host_path_in_container(component_config)
if not resolved_component_config.exists():
raise ValueError(
f"{REDIS_COMPONENT_NAME} configuration file missing: '{component_config}'."
)
Expand All @@ -550,7 +555,8 @@ def validate_results_cache_config(
data_dir: pathlib.Path,
logs_dir: pathlib.Path,
):
if not component_config.exists():
resolved_component_config = resolve_host_path_in_container(component_config)
if not resolved_component_config.exists():
raise ValueError(
f"{RESULTS_CACHE_COMPONENT_NAME} configuration file missing: '{component_config}'."
)
Expand All @@ -564,13 +570,9 @@ def validate_results_cache_config(
)


def validate_logs_input_config(clp_config: CLPConfig) -> None:
clp_config.validate_logs_input_config()


def validate_output_storage_config(clp_config: CLPConfig) -> None:
clp_config.validate_archive_output_config()
clp_config.validate_stream_output_config()
clp_config.validate_archive_output_config(True)
clp_config.validate_stream_output_config(True)
Comment on lines 571 to +575
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Pass boolean parameters by keyword for clarity.

The positional True arguments reduce readability and trigger Ruff FBT003 warnings.

Apply this diff:

-    clp_config.validate_archive_output_config(True)
-    clp_config.validate_stream_output_config(True)
+    clp_config.validate_archive_output_config(use_host_mount=True)
+    clp_config.validate_stream_output_config(use_host_mount=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def validate_output_storage_config(clp_config: CLPConfig) -> None:
clp_config.validate_archive_output_config()
clp_config.validate_stream_output_config()
clp_config.validate_archive_output_config(True)
clp_config.validate_stream_output_config(True)
def validate_output_storage_config(clp_config: CLPConfig) -> None:
clp_config.validate_archive_output_config(use_host_mount=True)
clp_config.validate_stream_output_config(use_host_mount=True)
🧰 Tools
🪛 Ruff (0.14.2)

575-575: Boolean positional value in function call

(FBT003)


576-576: Boolean positional value in function call

(FBT003)

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 574 to
576, two calls pass True positionally which reduces clarity and triggers Ruff
FBT003; change them to pass the boolean as a keyword argument using the
parameter name defined on those methods (for example,
clp_config.validate_archive_output_config(required=True) and
clp_config.validate_stream_output_config(required=True) — replace "required"
with the actual parameter name if it differs).


validate_path_for_container_mount(clp_config.archive_output.get_directory())
validate_path_for_container_mount(clp_config.stream_output.get_directory())
Expand All @@ -582,7 +584,8 @@ def validate_webui_config(
server_settings_json_path: pathlib.Path,
):
for path in [client_settings_json_path, server_settings_json_path]:
if not path.exists():
resolved_path = resolve_host_path_in_container(path)
if not resolved_path.exists():
raise ValueError(f"{WEBUI_COMPONENT_NAME} {path} is not a valid path to settings.json")

validate_port(f"{WEBUI_COMPONENT_NAME}.port", clp_config.webui.host, clp_config.webui.port)
Expand Down Expand Up @@ -764,7 +767,7 @@ def _is_docker_compose_project_running(project_name: str) -> bool:

def _validate_data_directory(data_dir: pathlib.Path, component_name: str) -> None:
try:
validate_path_could_be_dir(data_dir)
validate_path_could_be_dir(resolve_host_path_in_container(data_dir))
except ValueError as ex:
raise ValueError(f"{component_name} data directory is invalid: {ex}")

Expand All @@ -778,6 +781,6 @@ def _validate_log_directory(logs_dir: pathlib.Path, component_name: str):
:raise ValueError: If the path is invalid or can't be a directory.
"""
try:
validate_path_could_be_dir(logs_dir)
validate_path_could_be_dir(resolve_host_path_in_container(logs_dir))
except ValueError as ex:
raise ValueError(f"{component_name} logs directory is invalid: {ex}")
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
StorageEngine,
StorageType,
)
from clp_py_utils.core import resolve_host_path_in_container

from clp_package_utils.general import (
CLPConfig,
Expand Down Expand Up @@ -171,9 +172,11 @@ def main(argv: List[str]) -> int:
try:
config_file_path: Path = Path(parsed_args.config)
clp_config: CLPConfig = load_config_file(
config_file_path, default_config_file_path, clp_home
resolve_host_path_in_container(config_file_path),
resolve_host_path_in_container(default_config_file_path),
clp_home,
)
Comment on lines +175 to 178
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard config path resolution to preserve container-local defaults.

The current implementation unconditionally translates both config_file_path and default_config_file_path to host-mounted paths. This breaks when the default config exists only inside the container (e.g., /opt/clp/etc/clp-config.yml), because translating it yields a non-existent /mnt/host/opt/clp/etc/clp-config.yml path, causing load_config_file to fail.

For config_file_path: check whether the path exists in the container before resolving; if it exists, use it as-is (supports both container-local defaults and host overrides).
For default_config_file_path: apply the same pattern—preserve the path if it exists in the container, otherwise resolve it.

Apply this diff:

+    import os
+
     # Validate and load config file
     try:
         config_file_path: Path = Path(parsed_args.config)
+        # Check if config_file_path exists in container; if not, resolve it as a host path
+        if config_file_path.exists():
+            resolved_config_file_path = config_file_path
+        else:
+            resolved_config_file_path = resolve_host_path_in_container(config_file_path)
+
+        # For default config, use container path if it exists; otherwise resolve as host path
+        if default_config_file_path.exists():
+            resolved_default_config_file_path = default_config_file_path
+        else:
+            resolved_default_config_file_path = resolve_host_path_in_container(
+                default_config_file_path
+            )
+
         clp_config: CLPConfig = load_config_file(
-            resolve_host_path_in_container(config_file_path),
-            resolve_host_path_in_container(default_config_file_path),
+            resolved_config_file_path,
+            resolved_default_config_file_path,
             clp_home,
         )

Based on learnings.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
around lines 175 to 178, the code always calls resolve_host_path_in_container on
both config_file_path and default_config_file_path which breaks when a default
config only exists inside the container; instead, check whether each path exists
in the container first (e.g., os.path.exists(container_path)); if it exists,
pass the original container-local path through, otherwise call
resolve_host_path_in_container to convert it to the host-mounted path; apply
this logic to both config_file_path and default_config_file_path before passing
them (and leave clp_home unchanged).

clp_config.validate_logs_dir()
clp_config.validate_logs_dir(True)

Comment on lines +179 to 180
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Pass use_host_mount as a keyword.

Keeps intent clear and avoids FBT003.

-        clp_config.validate_logs_dir(True)
+        clp_config.validate_logs_dir(use_host_mount=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
clp_config.validate_logs_dir(True)
clp_config.validate_logs_dir(use_host_mount=True)
🧰 Tools
🪛 Ruff (0.14.2)

179-179: Boolean positional value in function call

(FBT003)

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py
around lines 179 to 180, the call clp_config.validate_logs_dir(True) should pass
the boolean as a keyword to make intent explicit and avoid FBT003; change the
call to pass use_host_mount=True (i.e.,
clp_config.validate_logs_dir(use_host_mount=True)) so the parameter name is
explicit.

# Validate and load necessary credentials
validate_and_load_db_credentials_file(clp_config, clp_home, False)
Expand Down Expand Up @@ -220,7 +223,6 @@ def main(argv: List[str]) -> int:
)

necessary_mounts: List[Optional[DockerMount]] = [
mounts.clp_home,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mounts like this have to be removed, or they conflict with the contents inside the docker image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, can you explain more?

Copy link
Member Author

@junhaoliao junhaoliao Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, that comment was so unclear that it took me time to recall why it was done -

on the host, clp_home is essentially the clp-package directory root. previously we were mounting the package root into the container so the python modules in lib/python3/site-packages can be accessed in the container. however, we have now removed such contents from the package (recall "now the package is only 180K"); if we mount an empty directory into /opt/clp in the container, anything in /opt/clp is essentially gone. we should just use the assets in the docker image and remove such mounts then

mounts.logs_dir,
mounts.archives_output_dir,
]
Expand Down Expand Up @@ -281,7 +283,10 @@ def main(argv: List[str]) -> int:
logger.debug(f"Docker command failed: {shlex.join(cmd)}")

# Remove generated files
generated_config_path_on_host.unlink()
resolved_generated_config_path_on_host = resolve_host_path_in_container(
generated_config_path_on_host
)
resolved_generated_config_path_on_host.unlink()

return ret_code

Expand Down
Loading
Loading