Skip to content
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
b143237
feat: Enhance database configuration by loading credentials from envi…
junhaoliao Aug 3, 2025
6cddeda
refactor
junhaoliao Aug 3, 2025
f9d5b59
lint
junhaoliao Aug 3, 2025
489476f
fix
junhaoliao Aug 3, 2025
0e3f412
fix issues caused by refactoring
junhaoliao Aug 3, 2025
05283d5
fix: Update exception handling to catch ValueError instead of KeyError
junhaoliao Aug 5, 2025
34c5e8f
refactor: Remove unused logger initialization in general.py
junhaoliao Aug 7, 2025
842efac
reorder generated_config_file_path
junhaoliao Aug 7, 2025
05715fb
fix: Update type hint for extra_env_vars to use Dict instead of dict
junhaoliao Aug 7, 2025
87d07c6
refactor: make extra_env_vars optional
junhaoliao Aug 7, 2025
73334ed
refactor: rename generate_common_environment_variables to generate_co…
junhaoliao Aug 7, 2025
866334d
improve environment variable generation for Docker containers
junhaoliao Aug 7, 2025
dff9bef
Set default value for extra_env_vars to None for immutability instead…
junhaoliao Aug 7, 2025
00ceeb8
Rename 'load_and_validate_config_file' to 'validate_and_load_config_f…
junhaoliao Aug 7, 2025
cc9a7c7
Add get_generated_config_file_path method to dynamically generate the…
junhaoliao Aug 7, 2025
4e1c272
extract environment variable retrieval logic
junhaoliao Aug 7, 2025
7d82bec
lint
junhaoliao Aug 7, 2025
1b7c70c
update error logging in query scheduler
junhaoliao Aug 7, 2025
781bb16
Remove unused import of load_worker_config from job_orchestration.exe…
junhaoliao Aug 8, 2025
bf63d6e
Use more specific error messages in exception logs
junhaoliao Aug 8, 2025
469bc65
Remove unused import of generate_worker_config function
junhaoliao Aug 8, 2025
8654a56
Remove container_clp_config argument from generate_common_environment…
junhaoliao Aug 8, 2025
03881cc
remove unused import of logging module
junhaoliao Aug 8, 2025
0c9a035
Merge branch 'main' into db-config-file
junhaoliao Aug 8, 2025
6f8879c
rename generated_config_dir to generated_config_file
junhaoliao Aug 8, 2025
3850b2d
revert empty line before return
junhaoliao Aug 8, 2025
96735b1
Check for None explicitly - Apply suggestions from code review
junhaoliao Aug 8, 2025
365b5d6
Fix indent mistake caused by merging code - Apply suggestions from co…
junhaoliao Aug 8, 2025
c84836c
Add `dump_to_primitive_dict` method to Database, Redis, and Reducer c…
junhaoliao Aug 8, 2025
04c5c98
Merge remote-tracking branch 'junhao/db-config-file' into db-config-file
junhaoliao Aug 8, 2025
33e11ac
fix lint
junhaoliao Aug 8, 2025
6e06740
Merge remote-tracking branch 'origin/main' into db-config-file
junhaoliao Aug 14, 2025
6ef77dd
lint
junhaoliao Aug 14, 2025
36af52d
fix(package): Update `native/decompress.py` to use CLI args and env v…
junhaoliao Aug 14, 2025
d85085b
refactor(clp-py-utils): simplify sensitive information handling in CL…
junhaoliao Aug 14, 2025
e371aa8
refactor(clp-py-utils): Use os.getenv() instead of os.environ[] for m…
junhaoliao Aug 14, 2025
5fa14f8
refactor(clp-config): add environment variable constants for credentials
junhaoliao Aug 14, 2025
9eba523
refactor(clp-py-utils): move credential loading to respective classes
junhaoliao Aug 14, 2025
312445e
refactor(clp-package-utils): Replace manual environment variable read…
junhaoliao Aug 14, 2025
335f25e
lint
junhaoliao Aug 14, 2025
3cc3d44
add type annotation for _get_env_var function parameter
junhaoliao Aug 14, 2025
02f393d
Merge branch 'main' into db-config-file
junhaoliao Aug 14, 2025
8bcae09
set default values as None for optional connection fields
junhaoliao Aug 14, 2025
4a9bf99
remove unused import of os module
junhaoliao Aug 14, 2025
fc27205
Docs - Apply suggestions from code review
junhaoliao Aug 15, 2025
40e387d
Remove unused os import - Apply suggestions from code review
junhaoliao Aug 15, 2025
ed58dcc
update environment_variables generator function return type descriptions
junhaoliao Aug 15, 2025
89f001f
Rename parameter `include_clp_home` to `include_clp_home_env_var`
junhaoliao Aug 15, 2025
95b55ef
replace hardcoded env var names with constants
junhaoliao Aug 15, 2025
fa81a9d
move necessary_mounts definition
junhaoliao Aug 15, 2025
4743566
Merge branch 'main' into db-config-file
junhaoliao Aug 15, 2025
0a8de41
remove credential environment variables for reducer from start_clp.py
junhaoliao Aug 15, 2025
cbdd9fb
docs: Add missing documentation for `load_credentials_from_env` metho…
junhaoliao Aug 17, 2025
67d048a
fix(job-orchestration): update garbage collector configuration handling
junhaoliao Aug 18, 2025
f852ca4
refactor: Add CLP_GENERATED_CONFIG_FILE_NAME constant in clp_config.p…
junhaoliao Aug 18, 2025
0acb960
refactor(clp-package-utils): rename environment variable generation f…
junhaoliao Aug 18, 2025
d21efbf
shift order of CLPConfig field serialization
junhaoliao Aug 18, 2025
d446e52
refactor(scheduler): enhance error logging and component naming consi…
junhaoliao Aug 18, 2025
c0949cc
refactor(clp-package-utils): improve container environment variable h…
junhaoliao Aug 18, 2025
f723512
remove unused imports
junhaoliao Aug 18, 2025
38c75ee
lint
junhaoliao Aug 18, 2025
a11b314
Merge branch 'main' into db-config-file
junhaoliao Aug 18, 2025
4124b60
use named constant insteaad of magical string for ".clp-config.yml" -…
junhaoliao Aug 19, 2025
3fe8368
docs - Apply suggestions from code review
junhaoliao Aug 19, 2025
ce143c6
refactor(clp-py-utils): Optimize the serialization process by excludi…
junhaoliao Aug 19, 2025
e15547e
remove unused variable clp_site_packages_dir
junhaoliao Aug 19, 2025
4bba4ed
add missing mount to start_garbage_collector
junhaoliao Aug 19, 2025
6ab2f80
Update function docstrings `raise` to use consistent colon placement
junhaoliao Aug 19, 2025
cd4437d
refactor(clp-package-utils): rename environment variable generation f…
junhaoliao Aug 19, 2025
14e0219
refactor(clp-package-utils): rename `dump_shared_config` function to …
junhaoliao Aug 19, 2025
33bc068
refactor(clp-package-utils): rename and restructure configuration dum…
junhaoliao Aug 19, 2025
ae8f581
refactor(config): rename generated config file to shared config file
junhaoliao Aug 19, 2025
0e52cb1
Merge branch 'main' into db-config-file
junhaoliao Aug 19, 2025
f063bd9
lint
junhaoliao Aug 19, 2025
04f81b7
remove redundant comment - Apply suggestions from code review
junhaoliao Aug 19, 2025
34c430f
order constants - Apply suggestions from code review
junhaoliao Aug 19, 2025
31342c3
refactor(clp-package-utils): extract container config filename genera…
junhaoliao Aug 19, 2025
ca2db62
Merge branch 'main' into db-config-file
junhaoliao Aug 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 96 additions & 6 deletions components/clp-package-utils/clp_package_utils/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import typing
import uuid
from enum import auto
from typing import List, Optional, Tuple
from typing import Dict, List, Optional, Tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Unify typing style (avoid mixing typing.Optional with Optional).

You import Optional, List, etc., but still use typing.Optional elsewhere. Prefer one style for consistency; since you already import from typing, use Optional consistently.

Apply this refactor in the class where it appears:

 class CLPDockerMounts:
     def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
-        self.input_logs_dir: typing.Optional[DockerMount] = None
+        self.input_logs_dir: Optional[DockerMount] = None
         self.clp_home: typing.Optional[DockerMount] = DockerMount(
             DockerMountType.BIND, clp_home, docker_clp_home
         )
-        self.data_dir: typing.Optional[DockerMount] = None
-        self.logs_dir: typing.Optional[DockerMount] = None
-        self.archives_output_dir: typing.Optional[DockerMount] = None
-        self.stream_output_dir: typing.Optional[DockerMount] = None
-        self.aws_config_dir: typing.Optional[DockerMount] = None
+        self.data_dir: Optional[DockerMount] = None
+        self.logs_dir: Optional[DockerMount] = None
+        self.archives_output_dir: Optional[DockerMount] = None
+        self.stream_output_dir: Optional[DockerMount] = None
+        self.aws_config_dir: Optional[DockerMount] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Dict, List, Optional, Tuple
class CLPDockerMounts:
def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
self.input_logs_dir: Optional[DockerMount] = None
self.clp_home: typing.Optional[DockerMount] = DockerMount(
DockerMountType.BIND, clp_home, docker_clp_home
)
self.data_dir: Optional[DockerMount] = None
self.logs_dir: Optional[DockerMount] = None
self.archives_output_dir: Optional[DockerMount] = None
self.stream_output_dir: Optional[DockerMount] = None
self.aws_config_dir: Optional[DockerMount] = None
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around line 12,
unify typing style by replacing any uses of typing.Optional in the class (and
elsewhere in this file) with the already-imported Optional; ensure Optional is
present in the from typing import ... list (add it if missing), remove any
redundant direct imports of typing if they are only used for Optional, and run a
quick search/replace in the file to convert typing.List/typing.Dict/etc. to the
imported aliases (List, Dict) for consistent style.


import yaml
from clp_py_utils.clp_config import (
Expand Down Expand Up @@ -95,6 +95,7 @@ def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
self.archives_output_dir: typing.Optional[DockerMount] = None
self.stream_output_dir: typing.Optional[DockerMount] = None
self.aws_config_dir: typing.Optional[DockerMount] = None
self.generated_config_file: typing.Optional[DockerMount] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

LGTM: generated_config_file mount slot added

Good addition. Consider marking the file mount read-only when created to prevent accidental in-container edits.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around line 97, the
new generated_config_file mount should be created as read-only to prevent
accidental in-container edits; when assigning/creating the DockerMount for
self.generated_config_file, set the mount mode/read_only flag to read-only
(e.g., mode or read_only parameter) so the resulting mount is mounted as ro
rather than writable.


Comment on lines +99 to 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Nit: align typing style (Optional over typing.Optional)

You already import Optional; keep usage consistent.

-        self.generated_config_file: typing.Optional[DockerMount] = None
+        self.generated_config_file: Optional[DockerMount] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.generated_config_file: typing.Optional[DockerMount] = None
self.generated_config_file: Optional[DockerMount] = None
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 98 to
99, the typing annotation uses typing.Optional[DockerMount] while Optional is
already imported; change typing.Optional[DockerMount] to Optional[DockerMount]
to keep typing style consistent across the file.


def _validate_data_directory(data_dir: pathlib.Path, component_name: str) -> None:
Expand Down Expand Up @@ -285,6 +286,18 @@ def generate_container_config(
container_clp_config.stream_output.get_directory(),
)

if not is_path_already_mounted(
clp_home,
CONTAINER_CLP_HOME,
clp_config.get_generated_config_file_path(),
container_clp_config.get_generated_config_file_path(),
):
docker_mounts.generated_config_file = DockerMount(
DockerMountType.BIND,
clp_config.get_generated_config_file_path(),
container_clp_config.get_generated_config_file_path(),
)

# Only create the mount if the directory exists
if clp_config.aws_config_directory is not None:
container_clp_config.aws_config_directory = CONTAINER_AWS_CONFIG_DIRECTORY
Expand Down Expand Up @@ -328,13 +341,19 @@ def dump_container_config(


def generate_container_start_cmd(
container_name: str, container_mounts: List[Optional[DockerMount]], container_image: str
container_name: str,
container_mounts: List[Optional[DockerMount]],
container_image: str,
extra_env_vars: Optional[Dict[str, str]] = None,
) -> List[str]:
"""
Generates the command to start a container with the given mounts and name.
Generates the command to start a container with the given mounts, environment variables, and
name.

:param container_name:
:param container_mounts:
:param container_image:
:param extra_env_vars:
:return: The command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Security: avoid adding KEY=VALUE secrets to docker cmd

generate_container_start_cmd currently emits -e KEY=VALUE, which leaks secrets via ps and logs. Switch to passing only keys (i.e., -e KEY) and require callers to provide values via subprocess.run(..., env=...).

 def generate_container_start_cmd(
     container_name: str,
     container_mounts: List[Optional[DockerMount]],
     container_image: str,
-    extra_env_vars: Optional[Dict[str, str]] = None,
+    extra_env_vars: Optional[Dict[str, str]] = None,
 ) -> List[str]:
     """
-    Generates the command to start a container with the given mounts, environment variables, and
-    name.
+    Generates the command to start a container with the given mounts, environment variables, and
+    name.
 
     :param container_name:
     :param container_mounts:
     :param container_image:
-    :param extra_env_vars:
+    :param extra_env_vars: Additional environment variables as a mapping. Only the keys are emitted
+                           with '-e KEY'. Callers must pass actual values via subprocess.run(env=...).
     :return: The command.
     """
@@
-    if extra_env_vars is not None:
-        for key, value in extra_env_vars.items():
-            container_start_cmd.extend(["-e", f"{key}={value}"])
+    if extra_env_vars is not None:
+        for key in extra_env_vars.keys():
+            container_start_cmd.extend(["-e", key])

Follow through in call sites by merging env dicts when invoking subprocess.run.

Also applies to: 371-375

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines
343-357 (and also 371-375), the function generate_container_start_cmd must stop
embedding environment variable values into the docker command (avoid "-e
KEY=VALUE"); change it to emit only "-e", "KEY" for each env entry so secrets
are not visible in process lists or logs, and update callers to pass the merged
environment mapping to subprocess.run via its env parameter (e.g., merge
os.environ with extra env dict before invoking subprocess.run). Ensure
extra_env_vars may be None, iterate its keys only when building the command, and
update all call sites that start containers to combine existing env and
extra_env_vars into a single dict passed to subprocess.run instead of embedding
values in the CLI.

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Docstring update is fine; consider clarifying “extra_env_vars” in the docstring

The new arg is correct; adding a note that keys should not include reserved variables (e.g., PYTHONPATH/CLP_HOME) would avoid misuse.

     """
-    Generates the command to start a container with the given mounts, environment variables, and
+    Generates the command to start a container with the given mounts, environment variables, and
     name.
 
     :param container_name:
     :param container_mounts:
     :param container_image:
-    :param extra_env_vars:
+    :param extra_env_vars: Additional environment variables as a mapping. Must not include
+                           reserved keys like PYTHONPATH or CLP_HOME.
     :return: The command.
     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def generate_container_start_cmd(
container_name: str, container_mounts: List[Optional[DockerMount]], container_image: str
container_name: str,
container_mounts: List[Optional[DockerMount]],
container_image: str,
extra_env_vars: Optional[Dict[str, str]] = None,
) -> List[str]:
"""
Generates the command to start a container with the given mounts and name.
Generates the command to start a container with the given mounts, environment variables, and
name.
:param container_name:
:param container_mounts:
:param container_image:
:param extra_env_vars:
:return: The command.
"""
def generate_container_start_cmd(
container_name: str,
container_mounts: List[Optional[DockerMount]],
container_image: str,
extra_env_vars: Optional[Dict[str, str]] = None,
) -> List[str]:
"""
Generates the command to start a container with the given mounts, environment variables, and
name.
:param container_name:
:param container_mounts:
:param container_image:
:param extra_env_vars: Additional environment variables as a mapping. Must not include
reserved keys like PYTHONPATH or CLP_HOME.
:return: The command.
"""
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 342 to
357, the docstring for generate_container_start_cmd lacks guidance on
extra_env_vars; update the docstring to clarify that extra_env_vars is a mapping
of additional environment variable names to values, and explicitly warn that
callers must not override reserved/container-managed variables (for example
PYTHONPATH, CLP_HOME, and any other variables listed elsewhere in the module),
and show the expected format (Dict[str, str]) and behavior when keys conflict
(e.g., ignored or raise). Ensure the note is brief, placed in the :param
extra_env_vars: section, and consistent with existing docstring style.

clp_site_packages_dir = CONTAINER_CLP_HOME / "lib" / "python3" / "site-packages"
Expand All @@ -350,6 +369,9 @@ def generate_container_start_cmd(
"--name", container_name,
"--log-driver", "local"
]
if extra_env_vars is not None:
for key, value in extra_env_vars.items():
container_start_cmd.extend(["-e", f"{key}={value}"])
for mount in container_mounts:
if mount:
container_start_cmd.append("--mount")
Expand Down Expand Up @@ -428,21 +450,21 @@ def validate_and_load_db_credentials_file(
clp_config: CLPConfig, clp_home: pathlib.Path, generate_default_file: bool
):
validate_credentials_file_path(clp_config, clp_home, generate_default_file)
clp_config.load_database_credentials_from_file()
clp_config.database.load_credentials_from_file(clp_config.credentials_file_path)


def validate_and_load_queue_credentials_file(
clp_config: CLPConfig, clp_home: pathlib.Path, generate_default_file: bool
):
validate_credentials_file_path(clp_config, clp_home, generate_default_file)
clp_config.load_queue_credentials_from_file()
clp_config.queue.load_credentials_from_file(clp_config.credentials_file_path)


def validate_and_load_redis_credentials_file(
clp_config: CLPConfig, clp_home: pathlib.Path, generate_default_file: bool
):
validate_credentials_file_path(clp_config, clp_home, generate_default_file)
clp_config.load_redis_credentials_from_file()
clp_config.redis.load_credentials_from_file(clp_config.credentials_file_path)


def validate_db_config(clp_config: CLPConfig, data_dir: pathlib.Path, logs_dir: pathlib.Path):
Expand Down Expand Up @@ -599,3 +621,71 @@ def is_retention_period_configured(clp_config: CLPConfig) -> bool:
return True

return False


def generate_common_environment_variables(
include_clp_home_env_var=True,
) -> List[str]:
"""
Generate a list of common environment variables for Docker containers.

:param include_clp_home_env_var:
:return: A list of common environment variables for Docker containers in the format "KEY=VALUE".
"""
clp_site_packages_dir = CONTAINER_CLP_HOME / "lib" / "python3" / "site-packages"
env_vars = [f"PYTHONPATH={clp_site_packages_dir}"]

if include_clp_home_env_var:
env_vars.append(f"CLP_HOME={CONTAINER_CLP_HOME}")

return env_vars


def generate_credential_environment_variables(
container_clp_config: CLPConfig,
include_db_credentials=False,
include_queue_credentials=False,
include_redis_credentials=False,
) -> List[str]:
"""
Generates a list of credential environment variables for Docker containers.

:param container_clp_config:
:param include_db_credentials:
:param include_queue_credentials:
:param include_redis_credentials:
:return: A list of common environment variables for Docker containers in the format "KEY=VALUE".
"""
env_vars = []

if include_db_credentials:
env_vars.append(f"CLP_DB_USER={container_clp_config.database.username}")
env_vars.append(f"CLP_DB_PASS={container_clp_config.database.password}")

if include_queue_credentials:
env_vars.append(f"CLP_QUEUE_USER={container_clp_config.queue.username}")
env_vars.append(f"CLP_QUEUE_PASS={container_clp_config.queue.password}")

if include_redis_credentials:
env_vars.append(f"CLP_REDIS_PASS={container_clp_config.redis.password}")

return env_vars


def generate_celery_connection_environment_variables(container_clp_config: CLPConfig) -> List[str]:
"""
Generate a list of Celery connection environment variables for Docker containers.

:param container_clp_config:
:return: A list of common environment variables for Docker containers in the format "KEY=VALUE".
"""
env_vars = [
f"BROKER_URL=amqp://"
f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
Comment on lines +701 to +707
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

URL-encode credentials in Celery connection strings

Credentials containing special characters (e.g., @, :, /) will break the connection URLs.

Apply URL encoding to credentials before embedding them in connection strings:

+from urllib.parse import quote

 def get_celery_connection_env_vars_list(container_clp_config: CLPConfig) -> List[str]:
     """
     :param container_clp_config:
     :return: A list of Celery connection environment variables for Docker containers, in the format
     "KEY=VALUE".
     """
+    queue_user = quote(container_clp_config.queue.username, safe='')
+    queue_pass = quote(container_clp_config.queue.password, safe='')
+    redis_pass = quote(container_clp_config.redis.password, safe='')
     env_vars = [
         f"BROKER_URL=amqp://"
-        f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
+        f"{queue_user}:{queue_pass}@"
         f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
-        f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
+        f"RESULT_BACKEND=redis://default:{redis_pass}@"
         f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
         f"{container_clp_config.redis.query_backend_database}",
     ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env_vars = [
f"BROKER_URL=amqp://"
f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
from urllib.parse import quote
def get_celery_connection_env_vars_list(container_clp_config: CLPConfig) -> List[str]:
"""
:param container_clp_config:
:return: A list of Celery connection environment variables for Docker containers, in the format
"KEY=VALUE".
"""
queue_user = quote(container_clp_config.queue.username, safe='')
queue_pass = quote(container_clp_config.queue.password, safe='')
redis_pass = quote(container_clp_config.redis.password, safe='')
env_vars = [
f"BROKER_URL=amqp://"
f"{queue_user}:{queue_pass}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{redis_pass}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
]
return env_vars
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 695 to
701, the Celery BROKER_URL and RESULT_BACKEND strings embed raw usernames and
passwords which will break when credentials contain special characters;
URL-encode the queue username and password and the redis password (using
urllib.parse.quote_plus or quote) before interpolating them into the connection
URLs so special characters are percent-encoded, and update the env_vars
construction to use the encoded credential variables.

]

return env_vars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Celery env: parameterise Redis DB and URL-encode credentials

Fix correctness (DB index) and robustness (special characters). With optional Redis auth, omit “:pass@” when password is missing.

-def generate_celery_connection_environment_variables(container_clp_config: CLPConfig) -> List[str]:
+def generate_celery_connection_environment_variables(
+    container_clp_config: CLPConfig,
+    redis_database: Optional[int] = None,
+) -> List[str]:
@@
-    env_vars = [
-        f"BROKER_URL=amqp://"
-        f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
-        f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
-        f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
-        f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
-        f"{container_clp_config.redis.query_backend_database}",
-    ]
+    from urllib.parse import quote
+    q_user = "" if container_clp_config.queue.username is None else quote(container_clp_config.queue.username, safe="")
+    q_pass = "" if container_clp_config.queue.password is None else quote(container_clp_config.queue.password, safe="")
+    r_pass = container_clp_config.redis.password
+    redis_db = (
+        container_clp_config.redis.query_backend_database
+        if redis_database is None
+        else redis_database
+    )
+    broker = f"amqp://{q_user}:{q_pass}@{container_clp_config.queue.host}:{container_clp_config.queue.port}"
+    if r_pass:
+        backend = (
+            "redis://default:"
+            f"{quote(r_pass, safe='')}@{container_clp_config.redis.host}:{container_clp_config.redis.port}/{redis_db}"
+        )
+    else:
+        backend = f"redis://{container_clp_config.redis.host}:{container_clp_config.redis.port}/{redis_db}"
+    env_vars = [f"BROKER_URL={broker}", f"RESULT_BACKEND={backend}"]
 
     return env_vars

Update call sites (schedulers/workers) to pass redis_database as shown in start_clp.py comments.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def generate_celery_connection_environment_variables(container_clp_config: CLPConfig) -> List[str]:
"""
Generate a list of Celery connection environment variables for Docker containers.
:param container_clp_config:
:return: A list of common environment variables for Docker containers in the format "KEY=VALUE".
"""
env_vars = [
f"BROKER_URL=amqp://"
f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
]
return env_vars
def generate_celery_connection_environment_variables(
container_clp_config: CLPConfig,
redis_database: Optional[int] = None,
) -> List[str]:
"""
Generate a list of Celery connection environment variables for Docker containers.
:param container_clp_config: Configuration with queue and redis credentials/hosts.
:param redis_database: Optional Redis database index to use; defaults to
container_clp_config.redis.query_backend_database.
:return: A list of environment variable strings in the format "KEY=VALUE".
"""
from urllib.parse import quote
q_user = "" if container_clp_config.queue.username is None else quote(
container_clp_config.queue.username, safe=""
)
q_pass = "" if container_clp_config.queue.password is None else quote(
container_clp_config.queue.password, safe=""
)
r_pass = container_clp_config.redis.password
redis_db = (
container_clp_config.redis.query_backend_database
if redis_database is None
else redis_database
)
broker = (
f"amqp://{q_user}:{q_pass}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}"
)
if r_pass:
backend = (
"redis://default:"
f"{quote(r_pass, safe='')}@"
f"{container_clp_config.redis.host}:"
f"{container_clp_config.redis.port}/{redis_db}"
)
else:
backend = (
f"redis://{container_clp_config.redis.host}:"
f"{container_clp_config.redis.port}/{redis_db}"
)
env_vars = [f"BROKER_URL={broker}", f"RESULT_BACKEND={backend}"]
return env_vars
🤖 Prompt for AI Agents
components/clp-package-utils/clp_package_utils/general.py lines 685-701: the
current Celery env builder hardcodes the Redis DB index and embeds credentials
without URL-encoding or handling optional auth; update it to read the DB index
from container_clp_config.redis.redis_database (per start_clp.py call-site
change), URL-encode any username/password (use urllib.parse.quote_plus) for both
BROKER_URL and RESULT_BACKEND, and when Redis password is empty/None omit the
":password@" portion entirely so the URL becomes redis://host:port/db; adjust
call sites (schedulers/workers) to pass redis_database as required.

Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from clp_py_utils.clp_config import (
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
StorageType,
StorageType, CLP_DB_USER_ENV_VAR_NAME, CLP_DB_PASS_ENV_VAR_NAME,
)

from clp_package_utils.general import (
Expand Down Expand Up @@ -214,14 +214,17 @@ def main(argv: typing.List[str]) -> int:
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)

necessary_mounts: typing.List[CLPDockerMounts] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in the area:

Suggested change
necessary_mounts: typing.List[CLPDockerMounts] = [
necessary_mounts: typing.List[DockerMount] = [

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have the fix in #1186

mounts.clp_home,
mounts.logs_dir,
mounts.archives_output_dir,
]
extra_env_vars = {
CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd: typing.List[str] = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Do not pass secrets on the docker command line; use process env

Including DB creds as -e KEY=VALUE exposes them via ps and (on failure) logs. Emit only -e KEY and provide values via subprocess.run(..., env=...).

Apply:

-    extra_env_vars = {
-        CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
-        CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
-    }
-    container_start_cmd: typing.List[str] = generate_container_start_cmd(
-        container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
-    )
+    extra_env_vars = {
+        CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
+        CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
+    }
+    # Pass only the keys on the CLI; provide values via subprocess env.
+    container_start_cmd: typing.List[str] = generate_container_start_cmd(
+        container_name,
+        necessary_mounts,
+        clp_config.execution_container,
+        list(extra_env_vars.keys()),
+    )
@@
-    proc = subprocess.run(cmd)
+    # Merge secrets into the spawned process environment.
+    import os  # ensure imported at top if not already
+    proc = subprocess.run(cmd, env={**os.environ, **extra_env_vars})

Also consider avoiding logging full command strings anywhere secrets may appear. With the change above, the command string no longer contains DB secrets.

Also applies to: 270-275


# fmt: off
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from clp_py_utils.clp_config import (
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
StorageEngine, CLP_DB_USER_ENV_VAR_NAME, CLP_DB_PASS_ENV_VAR_NAME,
)
from job_orchestration.scheduler.job_config import InputType

Expand Down Expand Up @@ -222,8 +222,12 @@ def main(argv):

_generate_logs_list(clp_config.logs_input.type, container_logs_list_path, parsed_args)

extra_env_vars = {
CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)
Comment on lines +228 to 234
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Stop leaking DB credentials on the Docker CLI; pass values via subprocess env

extra_env_vars currently flows into the docker argv as -e KEY=VALUE, exposing secrets via ps and logs. Emit only -e KEY and inject the values via subprocess env.

Action:

  • Update generate_container_start_cmd to accept a list of env var names and emit -e KEY (no value).
  • Pass only the keys here and provide the values to subprocess.run(env=...).

Apply within this hunk:

-    container_start_cmd = generate_container_start_cmd(
-        container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
-    )
+    # NOTE: Requires general.generate_container_start_cmd to emit '-e KEY' (no '=VALUE')
+    container_start_cmd = generate_container_start_cmd(
+        container_name,
+        necessary_mounts,
+        clp_config.execution_container,
+        list(extra_env_vars.keys()),
+    )

And update the subprocess invocation to inject secrets via env (outside this hunk):

# at top of file
import os
-    subprocess.run(cmd, check=True)
+    subprocess.run(cmd, check=True, env={**os.environ, **extra_env_vars})

Run to locate other CLI injections that still embed secrets:


🏁 Script executed:

#!/bin/bash
rg -n -C2 -P '\bgenerate_container_start_cmd\s*\(' --glob '!**/venv/**'
rg -n -C2 -P 'append_docker_options\(.+,\s*env_vars\)' --glob '!**/venv/**'

Length of output: 9124


Stop leaking DB credentials via Docker CLI flags

Exposing secrets with “-e KEY=VALUE” leaks credentials in process listings and logs. To secure this:

• Refactor generate_container_start_cmd (in components/clp-package-utils/clp_package_utils/general.py) to
– accept env_var_names: List[str] instead of a name-value dict
– emit only ["-e", KEY] for each name
• Refactor append_docker_options similarly to take names only
• In every script that currently passes extra_env_vars as a dict, change to pass only the keys and inject the values via subprocess.run(env=…):
– components/clp-package-utils/clp_package_utils/scripts/search.py (lines 122–126)
– components/clp-package-utils/clp_package_utils/scripts/compress.py (229–233)
– components/clp-package-utils/clp_package_utils/scripts/decompress.py (132–134, 206–210)
– components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (227–231)
– all append_docker_options(...) calls in start_clp.py (multiple locations)

Add at top of each affected file:

import os

Example diff for compress.py:

-    extra_env_vars = {
-        CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
-        CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
-    }
-    container_start_cmd = generate_container_start_cmd(
-        container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
-    )
+    extra_env_vars = {
+        CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
+        CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
+    }
+    # Emit only “-e KEY” and inject values via env
+    container_start_cmd = generate_container_start_cmd(
+        container_name,
+        necessary_mounts,
+        clp_config.execution_container,
+        list(extra_env_vars.keys()),
+    )
@@ -240,7 +245,9 @@ def run_compress(...):
-    subprocess.run(container_start_cmd, check=True)
+    subprocess.run(
+        container_start_cmd,
+        check=True,
+        env={**os.environ, **extra_env_vars},
+    )
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/compress.py around
lines 227 to 233, the code currently builds extra_env_vars as a dict containing
DB credentials and passes it to generate_container_start_cmd, which leaks
secrets via Docker CLI; update the call to pass only the environment variable
NAMES (e.g., a list of CLP_DB_USER_ENV_VAR_NAME and CLP_DB_PASS_ENV_VAR_NAME)
instead of a name->value dict, ensure generate_container_start_cmd and
append_docker_options in
components/clp-package-utils/clp_package_utils/general.py are refactored to
accept List[str] and emit only ["-e", KEY] for each key, and change the
subprocess invocation that launches Docker to inject the actual secret values
via subprocess.run(..., env=os.environ_with_secrets) (add import os at top of
file) so values are set in the child process environment rather than on the CLI.

compress_cmd = _generate_compress_cmd(
parsed_args, dataset, generated_config_path_on_container, logs_list_path_on_container
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
CLP_DEFAULT_DATASET_NAME,
CLPConfig,
StorageEngine,
StorageType,
StorageType, CLP_DB_USER_ENV_VAR_NAME, CLP_DB_PASS_ENV_VAR_NAME,
)

from clp_package_utils.general import (
Expand Down Expand Up @@ -122,8 +122,13 @@ def handle_extract_file_cmd(
container_paths_to_extract_file_path,
)
)

extra_env_vars = {
CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
Comment on lines +129 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

DRY: factor out DB env var assembly

The extra_env_vars construction is duplicated. Extract a tiny helper to keep it consistent and reduce future drift.

Example:

def _build_db_env(clp_config: CLPConfig) -> Dict[str, str]:
    return {
        CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
        CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
    }

Then:

extra_env_vars = _build_db_env(clp_config)

Also applies to: 202-205

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/decompress.py around
lines 126-129 (and also apply the same change at lines ~202-205), the DB env var
dict is duplicated; extract a small helper function _build_db_env(clp_config:
CLPConfig) that returns the dict mapping CLP_DB_USER_ENV_VAR_NAME and
CLP_DB_PASS_ENV_VAR_NAME to clp_config.database.username/password, then replace
both inline constructions with a call to _build_db_env(clp_config) so both sites
reuse the same logic and avoid drift.

container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)
Comment on lines +129 to 135
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Do not print DB creds in docker argv; pass via env

Emit only the variable names on the docker command line and inject actual values via subprocess env.

Apply:

-    container_start_cmd = generate_container_start_cmd(
-        container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
-    )
+    # NOTE: Requires general.generate_container_start_cmd to emit '-e KEY'
+    container_start_cmd = generate_container_start_cmd(
+        container_name,
+        necessary_mounts,
+        clp_config.execution_container,
+        list(extra_env_vars.keys()),
+    )

And later (outside this hunk):

import os  # at top if missing
# ...
subprocess.run(cmd, check=True, env={**os.environ, **extra_env_vars})


# fmt: off
Expand Down Expand Up @@ -194,8 +199,12 @@ def handle_extract_stream_cmd(
container_clp_config, clp_config, container_name
)
necessary_mounts = [mounts.clp_home, mounts.logs_dir]
extra_env_vars = {
CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)
Comment on lines +205 to 211
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Apply the same secret-handling fix to the stream extraction path

Mirror the recommended approach here as well: pass only names to Docker and set values in subprocess env to avoid leaking credentials via argv.

Apply this localized change (after updating the builder as described above):

-    container_start_cmd = generate_container_start_cmd(
-        container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
-    )
+    container_start_cmd = generate_container_start_cmd(
+        container_name,
+        necessary_mounts,
+        clp_config.execution_container,
+        env_var_names=list(extra_env_vars.keys()),
+    )

And ensure:

env = {**os.environ, **extra_env_vars}
subprocess.run(cmd, check=True, env=env)


# fmt: off
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ def main(argv: typing.List[str]) -> int:
config_file_path, default_config_file_path, clp_home
)
clp_config.validate_logs_dir()
clp_config.database.load_credentials_from_env()
except:
logger.exception("Failed to load config.")
return -1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,7 @@ def main(argv):
clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
clp_config.validate_logs_input_config()
clp_config.validate_logs_dir()
clp_config.database.load_credentials_from_env()
except:
logger.exception("Failed to load config.")
return -1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from clp_py_utils.clp_config import (
CLPConfig,
Database,
Database, CLP_DB_USER_ENV_VAR_NAME, CLP_DB_PASS_ENV_VAR_NAME,
)
from clp_py_utils.clp_metadata_db_utils import get_files_table_name
from clp_py_utils.sql_adapter import SQL_Adapter
Expand Down Expand Up @@ -189,6 +189,7 @@ def validate_and_load_config_file(
clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
clp_config.validate_archive_output_config()
clp_config.validate_logs_dir()
clp_config.database.load_credentials_from_env()
return clp_config
Comment on lines +194 to 195
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Surface missing DB creds clearly when loading from env

load_credentials_from_env() raises on missing env vars; the caller logs “Failed to load config.” which hides the root cause. Catch ValueError to log a targeted message.

Minimal change:

try:
    clp_config.database.load_credentials_from_env()
except ValueError as e:
    logger.error(f"Database credentials not set in environment: {e}")
    return None

This keeps stack traces clean and aids operators during setup.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
around lines 192 to 193, calling clp_config.database.load_credentials_from_env()
can raise a ValueError when env vars are missing but the caller only logs a
generic "Failed to load config." — wrap the call in a try/except that catches
ValueError, log a clear targeted message like "Database credentials not set in
environment: {e}" via the module logger, and return None to surface the real
cause without leaking a full stack trace.

except Exception:
logger.exception("Failed to load config.")
Expand Down Expand Up @@ -244,8 +245,8 @@ def handle_extract_file_cmd(
# fmt: on
extract_env = {
**os.environ,
"CLP_DB_USER": clp_db_connection_params["username"],
"CLP_DB_PASS": clp_db_connection_params["password"],
CLP_DB_USER_ENV_VAR_NAME: clp_db_connection_params["username"],
CLP_DB_PASS_ENV_VAR_NAME: clp_db_connection_params["password"],
}

files_to_extract_list_path = None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,7 @@ def main(argv):
config_file_path = pathlib.Path(parsed_args.config)
clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
clp_config.validate_logs_dir()
clp_config.database.load_credentials_from_env()
except:
Comment on lines +297 to 298
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Log a targeted error when DB creds are missing

As with other scripts, catching the ValueError from load_credentials_from_env() and logging the missing var helps users quickly fix configuration issues.

Suggested pattern:

try:
    clp_config.database.load_credentials_from_env()
except ValueError as e:
    logger.error(f"Database credentials not set in environment: {e}")
    return -1
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/native/search.py
around lines 297 to 298, replace the bare except that follows
clp_config.database.load_credentials_from_env() with a targeted except
ValueError that logs a clear error about missing DB credentials (including the
exception message) using the module logger and then returns -1; do not use a
broad except so other exceptions propagate.

logger.exception("Failed to load config.")
return -1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from clp_py_utils.clp_config import (
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
StorageType,
StorageType, CLP_DB_PASS_ENV_VAR_NAME, CLP_DB_USER_ENV_VAR_NAME,
)

from clp_package_utils.general import (
Expand Down Expand Up @@ -117,10 +117,13 @@ def main(argv):
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)

necessary_mounts = [mounts.clp_home, mounts.logs_dir]
extra_env_vars = {
CLP_DB_USER_ENV_VAR_NAME: clp_config.database.username,
CLP_DB_PASS_ENV_VAR_NAME: clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)
Comment on lines +121 to 127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Avoid exposing DB secrets via “-e KEY=VALUE” on docker argv

Like compress.py, this leaks via ps. Emit only env var names on the CLI and provide values via subprocess env.

Apply:

-    container_start_cmd = generate_container_start_cmd(
-        container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
-    )
+    # NOTE: Requires general.generate_container_start_cmd to emit '-e KEY' (no '=VALUE')
+    container_start_cmd = generate_container_start_cmd(
+        container_name,
+        necessary_mounts,
+        clp_config.execution_container,
+        list(extra_env_vars.keys()),
+    )

Also inject env at the call site (outside this hunk):

import os  # if not already imported

# ...
subprocess.run(cmd, check=True, env={**os.environ, **extra_env_vars})
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/search.py around lines
120 to 126, the code currently places DB credentials directly into the docker
CLI via "-e KEY=VALUE", which leaks secrets via process listings; change the
container start command generation so it emits only environment variable names
(e.g., "-e KEY") without values, and do not inline secret values into the CLI
string; then at the subprocess invocation site (outside this hunk) import os if
necessary and call subprocess.run(...) with an env parameter that merges the
current environment and the secret values (env={**os.environ, **extra_env_vars})
so secrets are supplied via the process environment instead of appearing in
argv.


# fmt: off
Expand Down
Loading