-
Notifications
You must be signed in to change notification settings - Fork 83
refactor(package): Standardize how configuration and credentials are passed to containers (resolves #1149). #1152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 64 commits
b143237
6cddeda
f9d5b59
489476f
0e3f412
05283d5
34c5e8f
842efac
05715fb
87d07c6
73334ed
866334d
dff9bef
00ceeb8
cc9a7c7
4e1c272
7d82bec
1b7c70c
781bb16
bf63d6e
469bc65
8654a56
03881cc
0c9a035
6f8879c
3850b2d
96735b1
365b5d6
c84836c
04c5c98
33e11ac
6e06740
6ef77dd
36af52d
d85085b
e371aa8
5fa14f8
9eba523
312445e
335f25e
3cc3d44
02f393d
8bcae09
4a9bf99
fc27205
40e387d
ed58dcc
89f001f
95b55ef
fa81a9d
4743566
0a8de41
cbdd9fb
67d048a
f852ca4
0acb960
d21efbf
d446e52
c0949cc
f723512
38c75ee
a11b314
4124b60
3fe8368
ce143c6
e15547e
4bba4ed
6ab2f80
cd4437d
14e0219
33bc068
ae8f581
0e52cb1
f063bd9
04f81b7
34c430f
31342c3
ca2db62
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -9,11 +9,12 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import typing | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import uuid | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from enum import auto | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from typing import List, Optional, Tuple | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from typing import Dict, List, Optional, Tuple | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import yaml | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from clp_py_utils.clp_config import ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CLP_DEFAULT_CREDENTIALS_FILE_PATH, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CLP_GENERATED_CONFIG_FILENAME, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CLPConfig, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DB_COMPONENT_NAME, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| QueryEngine, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -95,6 +96,7 @@ def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path): | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.archives_output_dir: typing.Optional[DockerMount] = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.stream_output_dir: typing.Optional[DockerMount] = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.aws_config_dir: typing.Optional[DockerMount] = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.generated_config_file: typing.Optional[DockerMount] = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) LGTM: generated_config_file mount slot added Good addition. Consider marking the file mount read-only when created to prevent accidental in-container edits. 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+99
to
100
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Nit: align typing style (Optional over typing.Optional) You already import Optional; keep usage consistent. - self.generated_config_file: typing.Optional[DockerMount] = None
+ self.generated_config_file: Optional[DockerMount] = None📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def _validate_data_directory(data_dir: pathlib.Path, component_name: str) -> None: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -285,6 +287,18 @@ def generate_container_config( | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| container_clp_config.stream_output.get_directory(), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not is_path_already_mounted( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| clp_home, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| CONTAINER_CLP_HOME, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| clp_config.get_generated_config_file_path(), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| container_clp_config.get_generated_config_file_path(), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| docker_mounts.generated_config_file = DockerMount( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DockerMountType.BIND, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| clp_config.get_generated_config_file_path(), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| container_clp_config.get_generated_config_file_path(), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Only create the mount if the directory exists | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if clp_config.aws_config_directory is not None: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| container_clp_config.aws_config_directory = CONTAINER_AWS_CONFIG_DIRECTORY | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -308,33 +322,57 @@ def generate_worker_config(clp_config: CLPConfig) -> WorkerConfig: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return worker_config | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def dump_container_config( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| container_clp_config: CLPConfig, clp_config: CLPConfig, container_name: str | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) -> Tuple[pathlib.Path, pathlib.Path]: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def dump_config(container_clp_config: CLPConfig, clp_config: CLPConfig, config_filename: str): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Writes the given config to the logs directory so that it's accessible in the container. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Writes the given container config to the logs directory, so that it's accessible in the | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| container. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| :param container_clp_config: The config to write. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| :param clp_config: The corresponding config on the host (used to determine the logs directory). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| :param container_name: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| :param config_filename: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| :return: The path to the config file in the container and on the host. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| container_config_filename = f".{container_name}-config.yml" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| config_file_path_on_host = clp_config.logs_directory / container_config_filename | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| config_file_path_on_container = container_clp_config.logs_directory / container_config_filename | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| config_file_path_on_host = clp_config.logs_directory / config_filename | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| config_file_path_on_container = container_clp_config.logs_directory / config_filename | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| with open(config_file_path_on_host, "w") as f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return config_file_path_on_container, config_file_path_on_host | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def dump_config(container_clp_config: CLPConfig, clp_config: CLPConfig, config_filename: str): | |
| """ | |
| Writes the given config to the logs directory so that it's accessible in the container. | |
| Writes the given container config to the logs directory, so that it's accessible in the | |
| container. | |
| :param container_clp_config: The config to write. | |
| :param clp_config: The corresponding config on the host (used to determine the logs directory). | |
| :param container_name: | |
| :param config_filename: | |
| :return: The path to the config file in the container and on the host. | |
| """ | |
| container_config_filename = f".{container_name}-config.yml" | |
| config_file_path_on_host = clp_config.logs_directory / container_config_filename | |
| config_file_path_on_container = container_clp_config.logs_directory / container_config_filename | |
| config_file_path_on_host = clp_config.logs_directory / config_filename | |
| config_file_path_on_container = container_clp_config.logs_directory / config_filename | |
| with open(config_file_path_on_host, "w") as f: | |
| yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f) | |
| return config_file_path_on_container, config_file_path_on_host | |
| def dump_config( | |
| container_clp_config: CLPConfig, | |
| clp_config: CLPConfig, | |
| config_filename: str, | |
| ) -> Tuple[pathlib.Path, pathlib.Path]: | |
| """ | |
| Writes the given container config to the logs directory, so that it's accessible in the | |
| container. | |
| :param container_clp_config: The config to write. | |
| :param clp_config: The corresponding config on the host (used to determine the logs directory). | |
| :param config_filename: | |
| :return: The path to the config file in the container and on the host. | |
| """ | |
| config_file_path_on_host = clp_config.logs_directory / config_filename | |
| config_file_path_on_container = container_clp_config.logs_directory / config_filename | |
| config_file_path_on_host.parent.mkdir(parents=True, exist_ok=True) | |
| with open(config_file_path_on_host, "w") as f: | |
| yaml.safe_dump(container_clp_config.dump_to_primitive_dict(), f) | |
| return config_file_path_on_container, config_file_path_on_host |
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 325 to
340, ensure the target logs directory exists before opening the file and add an
explicit return type annotation for the function; to fix, call
mkdir(parents=True, exist_ok=True) on clp_config.logs_directory (or the parent
directory of config_file_path_on_host) before opening the file to write, and
update the function signature to include the return type -> Tuple[Path, Path]
(and import Tuple and Path from typing/pathlib if not already imported).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the parent path doesn't exist, it's probably fine to let it fail to show any implementation error during development
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junhaoliao That's a fair point about letting it fail fast to catch implementation errors during development. The explicit failure when the directory doesn't exist will indeed help identify setup issues early rather than masking them with automatic directory creation.
The return type annotation suggestion is still valid though, as it's separate from the directory creation behavior and improves type safety:
def dump_config(
container_clp_config: CLPConfig,
clp_config: CLPConfig,
config_filename: str,
) -> Tuple[pathlib.Path, pathlib.Path]:✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/general.py:0-0
Timestamp: 2025-08-19T14:41:28.856Z
Learning: In the CLP codebase, prefer explicit failures over automatic directory creation in utility functions like dump_config. The user junhaoliao prefers to let file operations fail when parent directories don't exist, as this helps catch implementation errors during development rather than masking setup issues with automatic directory creation.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
kirkrodrigues marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
kirkrodrigues marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
junhaoliao marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above "-e", f"PYTHONPATH={clp_site_packages_dir}", line was not removed, as a mistake possibly during branch syncs with main, which causes the environment variable to be defined twice. The duplication isn't harmful in the correctness though.
@coderabbitai create an issue to track the removal of the deplicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create GitHub issue to track duplicate PYTHONPATH removal
gh issue create \
--title "Remove duplicate PYTHONPATH environment variable in generate_container_start_cmd" \
--body "## Description
In \`components/clp-package-utils/clp_package_utils/general.py\`, the \`generate_container_start_cmd\` function defines the PYTHONPATH environment variable twice:
1. Directly in the \`container_start_cmd\` array: \`\"-e\", f\"PYTHONPATH={clp_site_packages_dir}\"\`
2. In the \`env_vars\` dictionary, which is then iterated to add environment variables to the command
This duplication occurred during branch syncs and should be cleaned up. While not harmful to correctness, only one definition is necessary.
## References
- PR: https://github.com/y-scope/clp/pull/1152
- Comment: https://github.com/y-scope/clp/pull/1152#discussion_r2468365115
- Reported by: @junhaoliao" \
--assignee junhaoliao \
--repo y-scope/clp
Length of output: 42
@junhaoliao, I've created issue #1513 to track the removal of the duplicate PYTHONPATH environment variable definition in generate_container_start_cmd. The issue has been assigned to you.
🧬 Code Graph Analysis Results
[clp-package-utils/general.py] Summary: generate_container_start_cmd
- Description: Creates the docker run command to start a container with mounts, environment variables, and a name.
- Parameters:
- container_name: str
- container_mounts: List[Optional[DockerMount]]
- container_image: str
- extra_env_vars: Optional[Dict[str, str]]
- Returns: List[str] representing the docker run command.
- Key behavior:
- Defines clp_site_packages_dir as CONTAINER_CLP_HOME / "lib" / "python3" / "site-packages".
- Includes an environment variable "-e PYTHONPATH={clp_site_packages_dir}" unconditionally.
- Merges extra_env_vars (if provided) into the command as additional -e KEY=VALUE pairs.
- Appends mount arguments for non-None mounts.
- Important details:
- The command explicitly sets PYTHONPATH in the container startup command, which may duplicate with other environment-variable provisioning (e.g., a separate list of environment vars that also sets PYTHONPATH).
- This function is central to how the container’s environment is prepared and could interact with other env var provisioning logic.
[clp-package-utils/general.py] Summary: get_common_env_vars_list
- Description: Returns a list of common environment variables for Docker containers.
- Parameters:
- include_clp_home_env_var: bool (default True)
- Returns: List[str] where each item is in the form KEY=VALUE.
- Key behavior:
- Always includes PYTHONPATH={CONTAINER_CLP_HOME}/lib/python3/site-packages.
- If include_clp_home_env_var is True, also includes CLP_HOME={CONTAINER_CLP_HOME}.
- Important details:
- This list provides a standard set of env vars for containers.
- Note: PYTHONPATH is defined here, which can interact with the PYTHONPATH set in generate_container_start_cmd (potential duplication if both methods are used).
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
kirkrodrigues marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
kirkrodrigues marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
URL-encode credentials in Celery connection strings
Credentials containing special characters (e.g., @, :, /) will break the connection URLs.
Apply URL encoding to credentials before embedding them in connection strings:
+from urllib.parse import quote
def get_celery_connection_env_vars_list(container_clp_config: CLPConfig) -> List[str]:
"""
:param container_clp_config:
:return: A list of Celery connection environment variables for Docker containers, in the format
"KEY=VALUE".
"""
+ queue_user = quote(container_clp_config.queue.username, safe='')
+ queue_pass = quote(container_clp_config.queue.password, safe='')
+ redis_pass = quote(container_clp_config.redis.password, safe='')
env_vars = [
f"BROKER_URL=amqp://"
- f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
+ f"{queue_user}:{queue_pass}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
- f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
+ f"RESULT_BACKEND=redis://default:{redis_pass}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| env_vars = [ | |
| f"BROKER_URL=amqp://" | |
| f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@" | |
| f"{container_clp_config.queue.host}:{container_clp_config.queue.port}", | |
| f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@" | |
| f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/" | |
| f"{container_clp_config.redis.query_backend_database}", | |
| from urllib.parse import quote | |
| def get_celery_connection_env_vars_list(container_clp_config: CLPConfig) -> List[str]: | |
| """ | |
| :param container_clp_config: | |
| :return: A list of Celery connection environment variables for Docker containers, in the format | |
| "KEY=VALUE". | |
| """ | |
| queue_user = quote(container_clp_config.queue.username, safe='') | |
| queue_pass = quote(container_clp_config.queue.password, safe='') | |
| redis_pass = quote(container_clp_config.redis.password, safe='') | |
| env_vars = [ | |
| f"BROKER_URL=amqp://" | |
| f"{queue_user}:{queue_pass}@" | |
| f"{container_clp_config.queue.host}:{container_clp_config.queue.port}", | |
| f"RESULT_BACKEND=redis://default:{redis_pass}@" | |
| f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/" | |
| f"{container_clp_config.redis.query_backend_database}", | |
| ] | |
| return env_vars |
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 695 to
701, the Celery BROKER_URL and RESULT_BACKEND strings embed raw usernames and
passwords which will break when credentials contain special characters;
URL-encode the queue username and password and the redis password (using
urllib.parse.quote_plus or quote) before interpolating them into the connection
URLs so special characters are percent-encoded, and update the env_vars
construction to use the encoded credential variables.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,6 +6,8 @@ | |
| from typing import Final, List, Optional | ||
|
|
||
| from clp_py_utils.clp_config import ( | ||
| CLP_DB_PASS_ENV_VAR_NAME, | ||
| CLP_DB_USER_ENV_VAR_NAME, | ||
| CLP_DEFAULT_DATASET_NAME, | ||
| StorageEngine, | ||
| StorageType, | ||
|
|
@@ -220,8 +222,12 @@ def main(argv: List[str]) -> int: | |
| mounts.logs_dir, | ||
| mounts.archives_output_dir, | ||
| ] | ||
| extra_env_vars = { | ||
| CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username, | ||
| CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password, | ||
| } | ||
| container_start_cmd: List[str] = generate_container_start_cmd( | ||
| container_name, necessary_mounts, clp_config.execution_container | ||
| container_name, necessary_mounts, clp_config.execution_container, extra_env_vars | ||
| ) | ||
|
Comment on lines
+226
to
232
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Do not pass secrets on the Docker command line; provide values via subprocess env Passing -e KEY=VALUE exposes creds via process args and logs. Pass only the keys to the container CLI and inject the values via subprocess.run env. - extra_env_vars = {
- CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
- CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
- }
- container_start_cmd: List[str] = generate_container_start_cmd(
- container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
- )
+ extra_env_vars = {
+ CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
+ CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
+ }
+ # Provide only names to the docker CLI; inject values via subprocess env.
+ container_start_cmd: List[str] = generate_container_start_cmd(
+ container_name,
+ necessary_mounts,
+ clp_config.execution_container,
+ list(extra_env_vars.keys()),
+ )Apply outside-range updates to complete the change: # At top-level imports
import os # add
# Where the command is executed
proc = subprocess.run(cmd, env={**os.environ, **extra_env_vars}) |
||
|
|
||
| # fmt: off | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,8 @@ | |
| from typing import List, Optional | ||
|
|
||
| from clp_py_utils.clp_config import ( | ||
| CLP_DB_PASS_ENV_VAR_NAME, | ||
| CLP_DB_USER_ENV_VAR_NAME, | ||
| CLP_DEFAULT_DATASET_NAME, | ||
| StorageEngine, | ||
| ) | ||
|
|
@@ -222,8 +224,12 @@ def main(argv): | |
|
|
||
| _generate_logs_list(clp_config.logs_input.type, container_logs_list_path, parsed_args) | ||
|
|
||
| extra_env_vars = { | ||
| CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username, | ||
| CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password, | ||
| } | ||
| container_start_cmd = generate_container_start_cmd( | ||
| container_name, necessary_mounts, clp_config.execution_container | ||
| container_name, necessary_mounts, clp_config.execution_container, extra_env_vars | ||
| ) | ||
|
Comment on lines
+228
to
234
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainStop leaking DB credentials on the Docker CLI; pass values via subprocess env extra_env_vars currently flows into the docker argv as -e KEY=VALUE, exposing secrets via ps and logs. Emit only -e KEY and inject the values via subprocess env. Action:
Apply within this hunk: - container_start_cmd = generate_container_start_cmd(
- container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
- )
+ # NOTE: Requires general.generate_container_start_cmd to emit '-e KEY' (no '=VALUE')
+ container_start_cmd = generate_container_start_cmd(
+ container_name,
+ necessary_mounts,
+ clp_config.execution_container,
+ list(extra_env_vars.keys()),
+ )And update the subprocess invocation to inject secrets via env (outside this hunk): # at top of file
import os- subprocess.run(cmd, check=True)
+ subprocess.run(cmd, check=True, env={**os.environ, **extra_env_vars})Run to locate other CLI injections that still embed secrets: 🏁 Script executed: #!/bin/bash
rg -n -C2 -P '\bgenerate_container_start_cmd\s*\(' --glob '!**/venv/**'
rg -n -C2 -P 'append_docker_options\(.+,\s*env_vars\)' --glob '!**/venv/**'Length of output: 9124 Stop leaking DB credentials via Docker CLI flags Exposing secrets with “-e KEY=VALUE” leaks credentials in process listings and logs. To secure this: • Refactor Add at top of each affected file: import osExample diff for compress.py: - extra_env_vars = {
- CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
- CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
- }
- container_start_cmd = generate_container_start_cmd(
- container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
- )
+ extra_env_vars = {
+ CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
+ CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
+ }
+ # Emit only “-e KEY” and inject values via env
+ container_start_cmd = generate_container_start_cmd(
+ container_name,
+ necessary_mounts,
+ clp_config.execution_container,
+ list(extra_env_vars.keys()),
+ )
@@ -240,7 +245,9 @@ def run_compress(...):
- subprocess.run(container_start_cmd, check=True)
+ subprocess.run(
+ container_start_cmd,
+ check=True,
+ env={**os.environ, **extra_env_vars},
+ )🤖 Prompt for AI Agents |
||
| compress_cmd = _generate_compress_cmd( | ||
| parsed_args, dataset, generated_config_path_on_container, logs_list_path_on_container | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,6 +6,8 @@ | |
| from typing import Optional | ||
|
|
||
| from clp_py_utils.clp_config import ( | ||
| CLP_DB_PASS_ENV_VAR_NAME, | ||
| CLP_DB_USER_ENV_VAR_NAME, | ||
| CLP_DEFAULT_DATASET_NAME, | ||
| CLPConfig, | ||
| StorageEngine, | ||
|
|
@@ -122,8 +124,13 @@ def handle_extract_file_cmd( | |
| container_paths_to_extract_file_path, | ||
| ) | ||
| ) | ||
|
|
||
| extra_env_vars = { | ||
| CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username, | ||
| CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password, | ||
| } | ||
|
Comment on lines
+129
to
+132
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) DRY: factor out DB env var assembly The extra_env_vars construction is duplicated. Extract a tiny helper to keep it consistent and reduce future drift. Example: def _build_db_env(clp_config: CLPConfig) -> Dict[str, str]:
return {
CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}Then: extra_env_vars = _build_db_env(clp_config)Also applies to: 202-205 🤖 Prompt for AI Agents |
||
| container_start_cmd = generate_container_start_cmd( | ||
| container_name, necessary_mounts, clp_config.execution_container | ||
| container_name, necessary_mounts, clp_config.execution_container, extra_env_vars | ||
| ) | ||
|
Comment on lines
+129
to
135
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Do not print DB creds in docker argv; pass via env Emit only the variable names on the docker command line and inject actual values via subprocess env. Apply: - container_start_cmd = generate_container_start_cmd(
- container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
- )
+ # NOTE: Requires general.generate_container_start_cmd to emit '-e KEY'
+ container_start_cmd = generate_container_start_cmd(
+ container_name,
+ necessary_mounts,
+ clp_config.execution_container,
+ list(extra_env_vars.keys()),
+ )And later (outside this hunk): import os # at top if missing
# ...
subprocess.run(cmd, check=True, env={**os.environ, **extra_env_vars}) |
||
|
|
||
| # fmt: off | ||
|
|
@@ -194,8 +201,12 @@ def handle_extract_stream_cmd( | |
| container_clp_config, clp_config, container_name | ||
| ) | ||
| necessary_mounts = [mounts.clp_home, mounts.logs_dir] | ||
| extra_env_vars = { | ||
| CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username, | ||
| CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password, | ||
| } | ||
| container_start_cmd = generate_container_start_cmd( | ||
| container_name, necessary_mounts, clp_config.execution_container | ||
| container_name, necessary_mounts, clp_config.execution_container, extra_env_vars | ||
| ) | ||
|
Comment on lines
+205
to
211
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Apply the same secret-handling fix to the stream extraction path Mirror the recommended approach here as well: pass only names to Docker and set values in subprocess env to avoid leaking credentials via argv. Apply this localized change (after updating the builder as described above): - container_start_cmd = generate_container_start_cmd(
- container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
- )
+ container_start_cmd = generate_container_start_cmd(
+ container_name,
+ necessary_mounts,
+ clp_config.execution_container,
+ env_var_names=list(extra_env_vars.keys()),
+ )And ensure: env = {**os.environ, **extra_env_vars}
subprocess.run(cmd, check=True, env=env) |
||
|
|
||
| # fmt: off | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,6 +10,8 @@ | |
| from typing import Optional | ||
|
|
||
| from clp_py_utils.clp_config import ( | ||
| CLP_DB_PASS_ENV_VAR_NAME, | ||
| CLP_DB_USER_ENV_VAR_NAME, | ||
| CLPConfig, | ||
| Database, | ||
| ) | ||
|
|
@@ -189,6 +191,7 @@ def validate_and_load_config_file( | |
| clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) | ||
| clp_config.validate_archive_output_config() | ||
| clp_config.validate_logs_dir() | ||
| clp_config.database.load_credentials_from_env() | ||
| return clp_config | ||
|
Comment on lines
+194
to
195
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Surface missing DB creds clearly when loading from env
Minimal change: try:
clp_config.database.load_credentials_from_env()
except ValueError as e:
logger.error(f"Database credentials not set in environment: {e}")
return NoneThis keeps stack traces clean and aids operators during setup. 🤖 Prompt for AI Agents |
||
| except Exception: | ||
| logger.exception("Failed to load config.") | ||
|
|
@@ -244,8 +247,8 @@ def handle_extract_file_cmd( | |
| # fmt: on | ||
| extract_env = { | ||
| **os.environ, | ||
| "CLP_DB_USER": clp_db_connection_params["username"], | ||
| "CLP_DB_PASS": clp_db_connection_params["password"], | ||
| CLP_DB_USER_ENV_VAR_NAME: clp_db_connection_params["username"], | ||
| CLP_DB_PASS_ENV_VAR_NAME: clp_db_connection_params["password"], | ||
| } | ||
|
|
||
| files_to_extract_list_path = None | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -294,6 +294,7 @@ def main(argv): | |
| config_file_path = pathlib.Path(parsed_args.config) | ||
| clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) | ||
| clp_config.validate_logs_dir() | ||
| clp_config.database.load_credentials_from_env() | ||
| except: | ||
|
Comment on lines
+297
to
298
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Log a targeted error when DB creds are missing As with other scripts, catching the ValueError from Suggested pattern: try:
clp_config.database.load_credentials_from_env()
except ValueError as e:
logger.error(f"Database credentials not set in environment: {e}")
return -1🤖 Prompt for AI Agents |
||
| logger.exception("Failed to load config.") | ||
| return -1 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,12 @@ | ||
| import argparse | ||
| import logging | ||
| import os | ||
| import pathlib | ||
| import subprocess | ||
| import sys | ||
| import uuid | ||
|
|
||
| import yaml | ||
| from clp_py_utils.clp_config import ( | ||
| CLP_DB_PASS_ENV_VAR_NAME, | ||
| CLP_DB_USER_ENV_VAR_NAME, | ||
| CLP_DEFAULT_DATASET_NAME, | ||
| StorageEngine, | ||
| StorageType, | ||
|
|
@@ -117,10 +116,13 @@ def main(argv): | |
| generated_config_path_on_container, generated_config_path_on_host = dump_container_config( | ||
| container_clp_config, clp_config, container_name | ||
| ) | ||
|
|
||
| necessary_mounts = [mounts.clp_home, mounts.logs_dir] | ||
| extra_env_vars = { | ||
| CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username, | ||
| CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password, | ||
| } | ||
| container_start_cmd = generate_container_start_cmd( | ||
| container_name, necessary_mounts, clp_config.execution_container | ||
| container_name, necessary_mounts, clp_config.execution_container, extra_env_vars | ||
| ) | ||
|
Comment on lines
+121
to
127
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Avoid exposing DB secrets via “-e KEY=VALUE” on docker argv Like compress.py, this leaks via ps. Emit only env var names on the CLI and provide values via subprocess env. Apply: - container_start_cmd = generate_container_start_cmd(
- container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
- )
+ # NOTE: Requires general.generate_container_start_cmd to emit '-e KEY' (no '=VALUE')
+ container_start_cmd = generate_container_start_cmd(
+ container_name,
+ necessary_mounts,
+ clp_config.execution_container,
+ list(extra_env_vars.keys()),
+ )Also inject env at the call site (outside this hunk): import os # if not already imported
# ...
subprocess.run(cmd, check=True, env={**os.environ, **extra_env_vars})🤖 Prompt for AI Agents |
||
|
|
||
| # fmt: off | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Unify typing style (avoid mixing
typing.OptionalwithOptional).You import
Optional,List, etc., but still usetyping.Optionalelsewhere. Prefer one style for consistency; since you already import from typing, useOptionalconsistently.Apply this refactor in the class where it appears:
class CLPDockerMounts: def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path): - self.input_logs_dir: typing.Optional[DockerMount] = None + self.input_logs_dir: Optional[DockerMount] = None self.clp_home: typing.Optional[DockerMount] = DockerMount( DockerMountType.BIND, clp_home, docker_clp_home ) - self.data_dir: typing.Optional[DockerMount] = None - self.logs_dir: typing.Optional[DockerMount] = None - self.archives_output_dir: typing.Optional[DockerMount] = None - self.stream_output_dir: typing.Optional[DockerMount] = None - self.aws_config_dir: typing.Optional[DockerMount] = None + self.data_dir: Optional[DockerMount] = None + self.logs_dir: Optional[DockerMount] = None + self.archives_output_dir: Optional[DockerMount] = None + self.stream_output_dir: Optional[DockerMount] = None + self.aws_config_dir: Optional[DockerMount] = None📝 Committable suggestion
🤖 Prompt for AI Agents