Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
b143237
feat: Enhance database configuration by loading credentials from envi…
junhaoliao Aug 3, 2025
6cddeda
refactor
junhaoliao Aug 3, 2025
f9d5b59
lint
junhaoliao Aug 3, 2025
489476f
fix
junhaoliao Aug 3, 2025
0e3f412
fix issues caused by refactoring
junhaoliao Aug 3, 2025
05283d5
fix: Update exception handling to catch ValueError instead of KeyError
junhaoliao Aug 5, 2025
34c5e8f
refactor: Remove unused logger initialization in general.py
junhaoliao Aug 7, 2025
842efac
reorder generated_config_file_path
junhaoliao Aug 7, 2025
05715fb
fix: Update type hint for extra_env_vars to use Dict instead of dict
junhaoliao Aug 7, 2025
87d07c6
refactor: make extra_env_vars optional
junhaoliao Aug 7, 2025
73334ed
refactor: rename generate_common_environment_variables to generate_co…
junhaoliao Aug 7, 2025
866334d
improve environment variable generation for Docker containers
junhaoliao Aug 7, 2025
dff9bef
Set default value for extra_env_vars to None for immutability instead…
junhaoliao Aug 7, 2025
00ceeb8
Rename 'load_and_validate_config_file' to 'validate_and_load_config_f…
junhaoliao Aug 7, 2025
cc9a7c7
Add get_generated_config_file_path method to dynamically generate the…
junhaoliao Aug 7, 2025
4e1c272
extract environment variable retrieval logic
junhaoliao Aug 7, 2025
7d82bec
lint
junhaoliao Aug 7, 2025
1b7c70c
update error logging in query scheduler
junhaoliao Aug 7, 2025
781bb16
Remove unused import of load_worker_config from job_orchestration.exe…
junhaoliao Aug 8, 2025
bf63d6e
Use more specific error messages in exception logs
junhaoliao Aug 8, 2025
469bc65
Remove unused import of generate_worker_config function
junhaoliao Aug 8, 2025
8654a56
Remove container_clp_config argument from generate_common_environment…
junhaoliao Aug 8, 2025
03881cc
remove unused import of logging module
junhaoliao Aug 8, 2025
0c9a035
Merge branch 'main' into db-config-file
junhaoliao Aug 8, 2025
6f8879c
rename generated_config_dir to generated_config_file
junhaoliao Aug 8, 2025
3850b2d
revert empty line before return
junhaoliao Aug 8, 2025
96735b1
Check for None explicitly - Apply suggestions from code review
junhaoliao Aug 8, 2025
365b5d6
Fix indent mistake caused by merging code - Apply suggestions from co…
junhaoliao Aug 8, 2025
c84836c
Add `dump_to_primitive_dict` method to Database, Redis, and Reducer c…
junhaoliao Aug 8, 2025
04c5c98
Merge remote-tracking branch 'junhao/db-config-file' into db-config-file
junhaoliao Aug 8, 2025
33e11ac
fix lint
junhaoliao Aug 8, 2025
6e06740
Merge remote-tracking branch 'origin/main' into db-config-file
junhaoliao Aug 14, 2025
6ef77dd
lint
junhaoliao Aug 14, 2025
36af52d
fix(package): Update `native/decompress.py` to use CLI args and env v…
junhaoliao Aug 14, 2025
d85085b
refactor(clp-py-utils): simplify sensitive information handling in CL…
junhaoliao Aug 14, 2025
e371aa8
refactor(clp-py-utils): Use os.getenv() instead of os.environ[] for m…
junhaoliao Aug 14, 2025
5fa14f8
refactor(clp-config): add environment variable constants for credentials
junhaoliao Aug 14, 2025
9eba523
refactor(clp-py-utils): move credential loading to respective classes
junhaoliao Aug 14, 2025
312445e
refactor(clp-package-utils): Replace manual environment variable read…
junhaoliao Aug 14, 2025
335f25e
lint
junhaoliao Aug 14, 2025
3cc3d44
add type annotation for _get_env_var function parameter
junhaoliao Aug 14, 2025
02f393d
Merge branch 'main' into db-config-file
junhaoliao Aug 14, 2025
8bcae09
set default values as None for optional connection fields
junhaoliao Aug 14, 2025
4a9bf99
remove unused import of os module
junhaoliao Aug 14, 2025
fc27205
Docs - Apply suggestions from code review
junhaoliao Aug 15, 2025
40e387d
Remove unused os import - Apply suggestions from code review
junhaoliao Aug 15, 2025
ed58dcc
update environment_variables generator function return type descriptions
junhaoliao Aug 15, 2025
89f001f
Rename parameter `include_clp_home` to `include_clp_home_env_var`
junhaoliao Aug 15, 2025
95b55ef
replace hardcoded env var names with constants
junhaoliao Aug 15, 2025
fa81a9d
move necessary_mounts definition
junhaoliao Aug 15, 2025
4743566
Merge branch 'main' into db-config-file
junhaoliao Aug 15, 2025
0a8de41
remove credential environment variables for reducer from start_clp.py
junhaoliao Aug 15, 2025
cbdd9fb
docs: Add missing documentation for `load_credentials_from_env` metho…
junhaoliao Aug 17, 2025
67d048a
fix(job-orchestration): update garbage collector configuration handling
junhaoliao Aug 18, 2025
f852ca4
refactor: Add CLP_GENERATED_CONFIG_FILE_NAME constant in clp_config.p…
junhaoliao Aug 18, 2025
0acb960
refactor(clp-package-utils): rename environment variable generation f…
junhaoliao Aug 18, 2025
d21efbf
shift order of CLPConfig field serialization
junhaoliao Aug 18, 2025
d446e52
refactor(scheduler): enhance error logging and component naming consi…
junhaoliao Aug 18, 2025
c0949cc
refactor(clp-package-utils): improve container environment variable h…
junhaoliao Aug 18, 2025
f723512
remove unused imports
junhaoliao Aug 18, 2025
38c75ee
lint
junhaoliao Aug 18, 2025
a11b314
Merge branch 'main' into db-config-file
junhaoliao Aug 18, 2025
4124b60
use named constant insteaad of magical string for ".clp-config.yml" -…
junhaoliao Aug 19, 2025
3fe8368
docs - Apply suggestions from code review
junhaoliao Aug 19, 2025
ce143c6
refactor(clp-py-utils): Optimize the serialization process by excludi…
junhaoliao Aug 19, 2025
e15547e
remove unused variable clp_site_packages_dir
junhaoliao Aug 19, 2025
4bba4ed
add missing mount to start_garbage_collector
junhaoliao Aug 19, 2025
6ab2f80
Update function docstrings `raise` to use consistent colon placement
junhaoliao Aug 19, 2025
cd4437d
refactor(clp-package-utils): rename environment variable generation f…
junhaoliao Aug 19, 2025
14e0219
refactor(clp-package-utils): rename `dump_shared_config` function to …
junhaoliao Aug 19, 2025
33bc068
refactor(clp-package-utils): rename and restructure configuration dum…
junhaoliao Aug 19, 2025
ae8f581
refactor(config): rename generated config file to shared config file
junhaoliao Aug 19, 2025
0e52cb1
Merge branch 'main' into db-config-file
junhaoliao Aug 19, 2025
f063bd9
lint
junhaoliao Aug 19, 2025
04f81b7
remove redundant comment - Apply suggestions from code review
junhaoliao Aug 19, 2025
34c430f
order constants - Apply suggestions from code review
junhaoliao Aug 19, 2025
31342c3
refactor(clp-package-utils): extract container config filename genera…
junhaoliao Aug 19, 2025
ca2db62
Merge branch 'main' into db-config-file
junhaoliao Aug 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 85 additions & 1 deletion components/clp-package-utils/clp_package_utils/general.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import enum
import errno
import logging
import os
import pathlib
import re
Expand Down Expand Up @@ -36,6 +37,8 @@
)
from strenum import KebabCaseStrEnum

logger = logging.getLogger(__file__)

# CONSTANTS
EXTRACT_FILE_CMD = "x"
EXTRACT_IR_CMD = "i"
Expand Down Expand Up @@ -94,6 +97,7 @@ def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
self.archives_output_dir: typing.Optional[DockerMount] = None
self.stream_output_dir: typing.Optional[DockerMount] = None
self.aws_config_dir: typing.Optional[DockerMount] = None
self.generated_config_dir: typing.Optional[DockerMount] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess first, this should be generated_config_file instead of dir.

And I guess it might be better to just mount the parent of generated_config_file? (so mount a dir).
In our current code, it won't do anything since generated_config_file.parent would be the same as logs_dir, but it can prevent some silly future error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree mounting the parent directory is better because Docker's binding mount would create a directory when the specified path doesn't exist, causing issues in debugging.

However, i just tested that is_path_already_mounted() doesn't behave as expected and we can end up requesting to mount the logs_dir twice, which gives below error:

docker: Error response from daemon: Duplicate mount point: /home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE
...
subprocess.CalledProcessError: Command '['docker', 'run', '-i', '--network', 'host', '--rm', '--name', 'clp-database-table-creator-8367', '--log-driver', 'local', '-u', '1000:1000', '--mount', 'type=bind,src=/home/junhao/workspa
ce/clp/build/clp-package,dst=/opt/clp', '--mount', 'type=bind,src=/home/junhao/workspace/clp/build/clp-package/var/data/GIGABYTE,dst=/home/junhao/workspace/clp/build/clp-package/var/data/GIGABYTE', '--mount', 'type=bind,src=/hom
e/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE,dst=/home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE', '--mount', 'type=bind,src=/home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE,dst=/home/junh
ao/workspace/clp/build/clp-package/var/log/GIGABYTE', '-e', 'PYTHONPATH=/opt/clp/lib/python3/site-packages', '-e', 'CLP_HOME=/opt/clp', '-e', 'CLP_DB_USER=clp-user', '-e', 'CLP_DB_PASS=WuWM7ka3J70', 'ghcr.io/y-scope/clp/clp-exec
ution-x86-ubuntu-jammy:main', 'python3', '/opt/clp/lib/python3/site-packages/clp_py_utils/create-db-tables.py', '--config', '/home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE/.clp-config.yml', '--storage-engine', 'clp']' returned non-zero exit status 125

maybe we should just correct the name of this field to be generated_config_file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, i just renamed it to generated_config_file



def get_clp_home():
Expand Down Expand Up @@ -278,6 +282,19 @@ def generate_container_config(
clp_config.aws_config_directory,
container_clp_config.aws_config_directory,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with a new line, but in case if it's not intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me revert since it's not strictly related to the PR

if not is_path_already_mounted(
clp_home,
CONTAINER_CLP_HOME,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
):
docker_mounts.generated_config_dir = DockerMount(
DockerMountType.BIND,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
)

Copy link
Contributor

@coderabbitai coderabbitai bot Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Mount the parent directory of the config file, not the file itself.

The code attempts to mount generated_config_file_path which is a file path, but Docker requires mounting directories. You should mount the parent directory instead.

Apply this diff to fix the mounting issue:

 if not is_path_already_mounted(
     clp_home,
     CONTAINER_CLP_HOME,
-    clp_config.generated_config_file_path,
-    container_clp_config.generated_config_file_path,
+    clp_config.generated_config_file_path.parent,
+    container_clp_config.generated_config_file_path.parent,
 ):
     docker_mounts.generated_config_dir = DockerMount(
         DockerMountType.BIND,
-        clp_config.generated_config_file_path,
-        container_clp_config.generated_config_file_path,
+        clp_config.generated_config_file_path.parent,
+        container_clp_config.generated_config_file_path.parent,
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not is_path_already_mounted(
clp_home,
CONTAINER_CLP_HOME,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
):
docker_mounts.generated_config_dir = DockerMount(
DockerMountType.BIND,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
)
if not is_path_already_mounted(
clp_home,
CONTAINER_CLP_HOME,
clp_config.generated_config_file_path.parent,
container_clp_config.generated_config_file_path.parent,
):
docker_mounts.generated_config_dir = DockerMount(
DockerMountType.BIND,
clp_config.generated_config_file_path.parent,
container_clp_config.generated_config_file_path.parent,
)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 275 to
286, the code mounts the config file path directly, but Docker requires mounting
directories. Modify the mount source and target paths to use the parent
directory of the config file paths instead of the file paths themselves,
ensuring the mount points are directories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using --mount which seems to work for file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haiqi96 You're right—Docker bind mounts support files with --mount as long as the source exists. No change is required to switch to the parent directory.

Two small hardening tweaks you might consider:

  • Make the config bind read-only to avoid accidental writes:
    • pass is_read_only=True when constructing the DockerMount for the generated config file.
  • Optionally skip the file-level mount if the logs directory is already mounted (the file will be available via that directory), to avoid redundant mounts.

Otherwise, keeping the file mount is fine given dump_container_config runs before container start.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.409Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

return container_clp_config, docker_mounts


Expand Down Expand Up @@ -313,11 +330,15 @@ def dump_container_config(


def generate_container_start_cmd(
container_name: str, container_mounts: List[Optional[DockerMount]], container_image: str
container_name: str,
extra_env_vars: dict[str, str],
container_mounts: List[Optional[DockerMount]],
container_image: str,
) -> List[str]:
"""
Generates the command to start a container with the given mounts and name.
:param container_name:
:param extra_env_vars:
:param container_mounts:
:param container_image:
:return: The command.
Expand All @@ -335,6 +356,8 @@ def generate_container_start_cmd(
"--name", container_name,
"--log-driver", "local"
]
for key, value in extra_env_vars.items():
container_start_cmd.extend(["-e", f"{key}={value}"])
for mount in container_mounts:
if mount:
container_start_cmd.append("--mount")
Expand Down Expand Up @@ -377,6 +400,11 @@ def load_config_file(
hostname = socket.gethostname()
clp_config.data_directory /= hostname
clp_config.logs_directory /= hostname
clp_config.generated_config_file_path = (
clp_config.generated_config_file_path.parent
/ hostname
/ clp_config.generated_config_file_path.name
)

return clp_config

Expand Down Expand Up @@ -590,3 +618,59 @@ def validate_dataset_name(clp_table_prefix: str, dataset_name: str) -> None:
f"Invalid dataset name: `{dataset_name}`. Names can only be a maximum of"
f" {dataset_name_max_len} characters long."
)


def generate_common_environment_variables(
container_clp_config,
include_clp_home=True,
include_db_credentials=False,
include_queue_credentials=False,
include_redis_credentials=False,
include_celery_connection_params=False,
extra_vars=None,
) -> List[str]:
"""
Generate a list of common environment variables for Docker containers.

:param container_clp_config: The CLPConfig object for the container.
:param include_clp_home:
:param include_db_credentials:
:param include_queue_credentials:
:param include_redis_credentials:
:param include_celery_connection_params:
:param extra_vars: Additional environment variables to include in the format "KEY=VALUE".
:return: List of environment variable strings in the format "KEY=VALUE".
"""
clp_site_packages_dir = CONTAINER_CLP_HOME / "lib" / "python3" / "site-packages"
env_vars = [f"PYTHONPATH={clp_site_packages_dir}"]

if include_clp_home:
env_vars.append(f"CLP_HOME={CONTAINER_CLP_HOME}")

if include_db_credentials:
env_vars.append(f"CLP_DB_USER={container_clp_config.database.username}")
env_vars.append(f"CLP_DB_PASS={container_clp_config.database.password}")

if include_queue_credentials:
env_vars.append(f"CLP_QUEUE_USER={container_clp_config.queue.username}")
env_vars.append(f"CLP_QUEUE_PASS={container_clp_config.queue.password}")

if include_redis_credentials:
env_vars.append(f"CLP_REDIS_PASS={container_clp_config.redis.password}")

if include_celery_connection_params:
env_vars.append(
f"BROKER_URL=amqp://"
f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}"
)
env_vars.append(
f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}"
)

if extra_vars:
env_vars.extend(extra_vars)

return env_vars
Original file line number Diff line number Diff line change
Expand Up @@ -206,14 +206,17 @@ def main(argv: typing.List[str]) -> int:
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)

extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
necessary_mounts: typing.List[CLPDockerMounts] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in the area:

Suggested change
necessary_mounts: typing.List[CLPDockerMounts] = [
necessary_mounts: typing.List[DockerMount] = [

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have the fix in #1186

mounts.clp_home,
mounts.logs_dir,
mounts.archives_output_dir,
]
container_start_cmd: typing.List[str] = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, extra_env_vars, necessary_mounts, clp_config.execution_container
)

# fmt: off
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,11 @@ def main(argv):
container_clp_config, clp_config, container_name
)

extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}

necessary_mounts = [mounts.clp_home, mounts.data_dir, mounts.logs_dir]
if InputType.FS == input_type:
necessary_mounts.append(mounts.input_logs_dir)
Expand All @@ -223,7 +228,7 @@ def main(argv):
_generate_logs_list(clp_config.logs_input.type, container_logs_list_path, parsed_args)

container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, extra_env_vars, necessary_mounts, clp_config.execution_container
)
compress_cmd = _generate_compress_cmd(
parsed_args, dataset, generated_config_path_on_container, logs_list_path_on_container
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,11 @@ def handle_extract_file_cmd(
container_clp_config, clp_config, container_name
)

extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}

# Set up mounts
extraction_dir.mkdir(exist_ok=True)
container_extraction_dir = pathlib.Path("/") / "mnt" / "extraction-dir"
Expand All @@ -123,7 +128,7 @@ def handle_extract_file_cmd(
)
)
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, extra_env_vars, necessary_mounts, clp_config.execution_container
)

# fmt: off
Expand Down Expand Up @@ -203,9 +208,13 @@ def handle_extract_stream_cmd(
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)
extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
necessary_mounts = [mounts.clp_home, mounts.logs_dir]
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, extra_env_vars, necessary_mounts, clp_config.execution_container
)

# fmt: off
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import argparse
import logging
import os
import shutil
import sys
import typing
Expand Down Expand Up @@ -188,6 +189,12 @@ def main(argv: typing.List[str]) -> int:
except:
logger.exception("Failed to load config.")
return -1
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize credential loading and exception handling

Use CLPConfig.load_database_credentials_from_env() for consistency with the rest of the codebase and to get proper missing/empty var validation. Catch ValueError accordingly.

-    try:
-        clp_config.database.username = os.environ["CLP_DB_USER"]
-        clp_config.database.password = os.environ["CLP_DB_PASS"]
-    except KeyError as e:
-        logger.error(f"Missing environment variable: {e}")
-        return -1
+    try:
+        clp_config.load_database_credentials_from_env()
+    except ValueError as err:
+        logger.error(err)
+        return -1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1
try:
clp_config.load_database_credentials_from_env()
except ValueError as err:
logger.error(err)
return -1
🤖 Prompt for AI Agents
In
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
around lines 191 to 196, replace the direct os.environ lookups and KeyError
handling with the standardized CLPConfig.load_database_credentials_from_env()
call to populate clp_config.database (so missing or empty vars are validated
consistently); catch ValueError from that loader instead of KeyError, log the
error message with logger.error(...), and return -1 on failure.


database_config: Database = clp_config.database
dataset = parsed_args.dataset
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
import datetime
import logging
import os
import pathlib
import sys
import time
Expand Down Expand Up @@ -229,6 +230,12 @@ def main(argv):
except:
logger.exception("Failed to load config.")
return -1
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1

comp_jobs_dir = clp_config.logs_directory / "comp-jobs"
comp_jobs_dir.mkdir(parents=True, exist_ok=True)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
import asyncio
import logging
import os
import pathlib
import subprocess
import sys
Expand Down Expand Up @@ -112,11 +113,18 @@ def handle_extract_stream_cmd(
:return: 0 on success, -1 otherwise.
"""
# Validate and load config file
clp_config = validate_and_load_config_file(
clp_config = load_and_validate_config_file(
clp_home, pathlib.Path(parsed_args.config), default_config_file_path
)
if clp_config is None:
return -1
# Override credentials with environment variables
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can put this logic into native/utils.py since I see it duplicated for a few times.
Note this would require us to log within a method in native/utils.py, there is an example of how we do similar things but I would like to confirm with Kirk.

@kirkrodrigues Can we create a helper method in utils which takes in a logger as argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending @kirkrodrigues 's comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i think we can just replace this with

clp_config.load_database_credentials_from_env()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then check if the credentials are loaded here. if not, log a message those aren't set


command = parsed_args.command

Expand Down Expand Up @@ -222,6 +230,13 @@ def handle_extract_file_cmd(
)
if clp_config is None:
return -1
# Override credentials with environment variables
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1

paths = parsed_args.paths
list_path = parsed_args.files_from
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import asyncio
import ipaddress
import logging
import os
import pathlib
import socket
import sys
Expand Down Expand Up @@ -297,6 +298,12 @@ def main(argv):
except:
logger.exception("Failed to load config.")
return -1
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1

database_config: Database = clp_config.database
dataset = parsed_args.dataset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,13 @@ def main(argv):
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)

extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
necessary_mounts = [mounts.clp_home, mounts.logs_dir]
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, extra_env_vars, necessary_mounts, clp_config.execution_container
)

# fmt: off
Expand Down
Loading
Loading