Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
b143237
feat: Enhance database configuration by loading credentials from envi…
junhaoliao Aug 3, 2025
6cddeda
refactor
junhaoliao Aug 3, 2025
f9d5b59
lint
junhaoliao Aug 3, 2025
489476f
fix
junhaoliao Aug 3, 2025
0e3f412
fix issues caused by refactoring
junhaoliao Aug 3, 2025
05283d5
fix: Update exception handling to catch ValueError instead of KeyError
junhaoliao Aug 5, 2025
34c5e8f
refactor: Remove unused logger initialization in general.py
junhaoliao Aug 7, 2025
842efac
reorder generated_config_file_path
junhaoliao Aug 7, 2025
05715fb
fix: Update type hint for extra_env_vars to use Dict instead of dict
junhaoliao Aug 7, 2025
87d07c6
refactor: make extra_env_vars optional
junhaoliao Aug 7, 2025
73334ed
refactor: rename generate_common_environment_variables to generate_co…
junhaoliao Aug 7, 2025
866334d
improve environment variable generation for Docker containers
junhaoliao Aug 7, 2025
dff9bef
Set default value for extra_env_vars to None for immutability instead…
junhaoliao Aug 7, 2025
00ceeb8
Rename 'load_and_validate_config_file' to 'validate_and_load_config_f…
junhaoliao Aug 7, 2025
cc9a7c7
Add get_generated_config_file_path method to dynamically generate the…
junhaoliao Aug 7, 2025
4e1c272
extract environment variable retrieval logic
junhaoliao Aug 7, 2025
7d82bec
lint
junhaoliao Aug 7, 2025
1b7c70c
update error logging in query scheduler
junhaoliao Aug 7, 2025
781bb16
Remove unused import of load_worker_config from job_orchestration.exe…
junhaoliao Aug 8, 2025
bf63d6e
Use more specific error messages in exception logs
junhaoliao Aug 8, 2025
469bc65
Remove unused import of generate_worker_config function
junhaoliao Aug 8, 2025
8654a56
Remove container_clp_config argument from generate_common_environment…
junhaoliao Aug 8, 2025
03881cc
remove unused import of logging module
junhaoliao Aug 8, 2025
0c9a035
Merge branch 'main' into db-config-file
junhaoliao Aug 8, 2025
6f8879c
rename generated_config_dir to generated_config_file
junhaoliao Aug 8, 2025
3850b2d
revert empty line before return
junhaoliao Aug 8, 2025
96735b1
Check for None explicitly - Apply suggestions from code review
junhaoliao Aug 8, 2025
365b5d6
Fix indent mistake caused by merging code - Apply suggestions from co…
junhaoliao Aug 8, 2025
c84836c
Add `dump_to_primitive_dict` method to Database, Redis, and Reducer c…
junhaoliao Aug 8, 2025
04c5c98
Merge remote-tracking branch 'junhao/db-config-file' into db-config-file
junhaoliao Aug 8, 2025
33e11ac
fix lint
junhaoliao Aug 8, 2025
6e06740
Merge remote-tracking branch 'origin/main' into db-config-file
junhaoliao Aug 14, 2025
6ef77dd
lint
junhaoliao Aug 14, 2025
36af52d
fix(package): Update `native/decompress.py` to use CLI args and env v…
junhaoliao Aug 14, 2025
d85085b
refactor(clp-py-utils): simplify sensitive information handling in CL…
junhaoliao Aug 14, 2025
e371aa8
refactor(clp-py-utils): Use os.getenv() instead of os.environ[] for m…
junhaoliao Aug 14, 2025
5fa14f8
refactor(clp-config): add environment variable constants for credentials
junhaoliao Aug 14, 2025
9eba523
refactor(clp-py-utils): move credential loading to respective classes
junhaoliao Aug 14, 2025
312445e
refactor(clp-package-utils): Replace manual environment variable read…
junhaoliao Aug 14, 2025
335f25e
lint
junhaoliao Aug 14, 2025
3cc3d44
add type annotation for _get_env_var function parameter
junhaoliao Aug 14, 2025
02f393d
Merge branch 'main' into db-config-file
junhaoliao Aug 14, 2025
8bcae09
set default values as None for optional connection fields
junhaoliao Aug 14, 2025
4a9bf99
remove unused import of os module
junhaoliao Aug 14, 2025
fc27205
Docs - Apply suggestions from code review
junhaoliao Aug 15, 2025
40e387d
Remove unused os import - Apply suggestions from code review
junhaoliao Aug 15, 2025
ed58dcc
update environment_variables generator function return type descriptions
junhaoliao Aug 15, 2025
89f001f
Rename parameter `include_clp_home` to `include_clp_home_env_var`
junhaoliao Aug 15, 2025
95b55ef
replace hardcoded env var names with constants
junhaoliao Aug 15, 2025
fa81a9d
move necessary_mounts definition
junhaoliao Aug 15, 2025
4743566
Merge branch 'main' into db-config-file
junhaoliao Aug 15, 2025
0a8de41
remove credential environment variables for reducer from start_clp.py
junhaoliao Aug 15, 2025
cbdd9fb
docs: Add missing documentation for `load_credentials_from_env` metho…
junhaoliao Aug 17, 2025
67d048a
fix(job-orchestration): update garbage collector configuration handling
junhaoliao Aug 18, 2025
f852ca4
refactor: Add CLP_GENERATED_CONFIG_FILE_NAME constant in clp_config.p…
junhaoliao Aug 18, 2025
0acb960
refactor(clp-package-utils): rename environment variable generation f…
junhaoliao Aug 18, 2025
d21efbf
shift order of CLPConfig field serialization
junhaoliao Aug 18, 2025
d446e52
refactor(scheduler): enhance error logging and component naming consi…
junhaoliao Aug 18, 2025
c0949cc
refactor(clp-package-utils): improve container environment variable h…
junhaoliao Aug 18, 2025
f723512
remove unused imports
junhaoliao Aug 18, 2025
38c75ee
lint
junhaoliao Aug 18, 2025
a11b314
Merge branch 'main' into db-config-file
junhaoliao Aug 18, 2025
4124b60
use named constant insteaad of magical string for ".clp-config.yml" -…
junhaoliao Aug 19, 2025
3fe8368
docs - Apply suggestions from code review
junhaoliao Aug 19, 2025
ce143c6
refactor(clp-py-utils): Optimize the serialization process by excludi…
junhaoliao Aug 19, 2025
e15547e
remove unused variable clp_site_packages_dir
junhaoliao Aug 19, 2025
4bba4ed
add missing mount to start_garbage_collector
junhaoliao Aug 19, 2025
6ab2f80
Update function docstrings `raise` to use consistent colon placement
junhaoliao Aug 19, 2025
cd4437d
refactor(clp-package-utils): rename environment variable generation f…
junhaoliao Aug 19, 2025
14e0219
refactor(clp-package-utils): rename `dump_shared_config` function to …
junhaoliao Aug 19, 2025
33bc068
refactor(clp-package-utils): rename and restructure configuration dum…
junhaoliao Aug 19, 2025
ae8f581
refactor(config): rename generated config file to shared config file
junhaoliao Aug 19, 2025
0e52cb1
Merge branch 'main' into db-config-file
junhaoliao Aug 19, 2025
f063bd9
lint
junhaoliao Aug 19, 2025
04f81b7
remove redundant comment - Apply suggestions from code review
junhaoliao Aug 19, 2025
34c430f
order constants - Apply suggestions from code review
junhaoliao Aug 19, 2025
31342c3
refactor(clp-package-utils): extract container config filename genera…
junhaoliao Aug 19, 2025
ca2db62
Merge branch 'main' into db-config-file
junhaoliao Aug 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 97 additions & 2 deletions components/clp-package-utils/clp_package_utils/general.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import enum
import errno
import logging
import os
import pathlib
import re
Expand All @@ -9,7 +10,7 @@
import typing
import uuid
from enum import auto
from typing import List, Optional, Tuple
from typing import Dict, List, Optional, Tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Unify typing style (avoid mixing typing.Optional with Optional).

You import Optional, List, etc., but still use typing.Optional elsewhere. Prefer one style for consistency; since you already import from typing, use Optional consistently.

Apply this refactor in the class where it appears:

 class CLPDockerMounts:
     def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
-        self.input_logs_dir: typing.Optional[DockerMount] = None
+        self.input_logs_dir: Optional[DockerMount] = None
         self.clp_home: typing.Optional[DockerMount] = DockerMount(
             DockerMountType.BIND, clp_home, docker_clp_home
         )
-        self.data_dir: typing.Optional[DockerMount] = None
-        self.logs_dir: typing.Optional[DockerMount] = None
-        self.archives_output_dir: typing.Optional[DockerMount] = None
-        self.stream_output_dir: typing.Optional[DockerMount] = None
-        self.aws_config_dir: typing.Optional[DockerMount] = None
+        self.data_dir: Optional[DockerMount] = None
+        self.logs_dir: Optional[DockerMount] = None
+        self.archives_output_dir: Optional[DockerMount] = None
+        self.stream_output_dir: Optional[DockerMount] = None
+        self.aws_config_dir: Optional[DockerMount] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Dict, List, Optional, Tuple
class CLPDockerMounts:
def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
self.input_logs_dir: Optional[DockerMount] = None
self.clp_home: typing.Optional[DockerMount] = DockerMount(
DockerMountType.BIND, clp_home, docker_clp_home
)
self.data_dir: Optional[DockerMount] = None
self.logs_dir: Optional[DockerMount] = None
self.archives_output_dir: Optional[DockerMount] = None
self.stream_output_dir: Optional[DockerMount] = None
self.aws_config_dir: Optional[DockerMount] = None
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around line 12,
unify typing style by replacing any uses of typing.Optional in the class (and
elsewhere in this file) with the already-imported Optional; ensure Optional is
present in the from typing import ... list (add it if missing), remove any
redundant direct imports of typing if they are only used for Optional, and run a
quick search/replace in the file to convert typing.List/typing.Dict/etc. to the
imported aliases (List, Dict) for consistent style.


import yaml
from clp_py_utils.clp_config import (
Expand Down Expand Up @@ -94,6 +95,7 @@ def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
self.archives_output_dir: typing.Optional[DockerMount] = None
self.stream_output_dir: typing.Optional[DockerMount] = None
self.aws_config_dir: typing.Optional[DockerMount] = None
self.generated_config_dir: typing.Optional[DockerMount] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess first, this should be generated_config_file instead of dir.

And I guess it might be better to just mount the parent of generated_config_file? (so mount a dir).
In our current code, it won't do anything since generated_config_file.parent would be the same as logs_dir, but it can prevent some silly future error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree mounting the parent directory is better because Docker's binding mount would create a directory when the specified path doesn't exist, causing issues in debugging.

However, i just tested that is_path_already_mounted() doesn't behave as expected and we can end up requesting to mount the logs_dir twice, which gives below error:

docker: Error response from daemon: Duplicate mount point: /home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE
...
subprocess.CalledProcessError: Command '['docker', 'run', '-i', '--network', 'host', '--rm', '--name', 'clp-database-table-creator-8367', '--log-driver', 'local', '-u', '1000:1000', '--mount', 'type=bind,src=/home/junhao/workspa
ce/clp/build/clp-package,dst=/opt/clp', '--mount', 'type=bind,src=/home/junhao/workspace/clp/build/clp-package/var/data/GIGABYTE,dst=/home/junhao/workspace/clp/build/clp-package/var/data/GIGABYTE', '--mount', 'type=bind,src=/hom
e/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE,dst=/home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE', '--mount', 'type=bind,src=/home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE,dst=/home/junh
ao/workspace/clp/build/clp-package/var/log/GIGABYTE', '-e', 'PYTHONPATH=/opt/clp/lib/python3/site-packages', '-e', 'CLP_HOME=/opt/clp', '-e', 'CLP_DB_USER=clp-user', '-e', 'CLP_DB_PASS=WuWM7ka3J70', 'ghcr.io/y-scope/clp/clp-exec
ution-x86-ubuntu-jammy:main', 'python3', '/opt/clp/lib/python3/site-packages/clp_py_utils/create-db-tables.py', '--config', '/home/junhao/workspace/clp/build/clp-package/var/log/GIGABYTE/.clp-config.yml', '--storage-engine', 'clp']' returned non-zero exit status 125

maybe we should just correct the name of this field to be generated_config_file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, i just renamed it to generated_config_file



def get_clp_home():
Expand Down Expand Up @@ -270,6 +272,18 @@ def generate_container_config(
container_clp_config.stream_output.get_directory(),
)

if not is_path_already_mounted(
clp_home,
CONTAINER_CLP_HOME,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
):
docker_mounts.generated_config_dir = DockerMount(
DockerMountType.BIND,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
)

Copy link
Contributor

@coderabbitai coderabbitai bot Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Mount the parent directory of the config file, not the file itself.

The code attempts to mount generated_config_file_path which is a file path, but Docker requires mounting directories. You should mount the parent directory instead.

Apply this diff to fix the mounting issue:

 if not is_path_already_mounted(
     clp_home,
     CONTAINER_CLP_HOME,
-    clp_config.generated_config_file_path,
-    container_clp_config.generated_config_file_path,
+    clp_config.generated_config_file_path.parent,
+    container_clp_config.generated_config_file_path.parent,
 ):
     docker_mounts.generated_config_dir = DockerMount(
         DockerMountType.BIND,
-        clp_config.generated_config_file_path,
-        container_clp_config.generated_config_file_path,
+        clp_config.generated_config_file_path.parent,
+        container_clp_config.generated_config_file_path.parent,
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not is_path_already_mounted(
clp_home,
CONTAINER_CLP_HOME,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
):
docker_mounts.generated_config_dir = DockerMount(
DockerMountType.BIND,
clp_config.generated_config_file_path,
container_clp_config.generated_config_file_path,
)
if not is_path_already_mounted(
clp_home,
CONTAINER_CLP_HOME,
clp_config.generated_config_file_path.parent,
container_clp_config.generated_config_file_path.parent,
):
docker_mounts.generated_config_dir = DockerMount(
DockerMountType.BIND,
clp_config.generated_config_file_path.parent,
container_clp_config.generated_config_file_path.parent,
)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 275 to
286, the code mounts the config file path directly, but Docker requires mounting
directories. Modify the mount source and target paths to use the parent
directory of the config file paths instead of the file paths themselves,
ensuring the mount points are directories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using --mount which seems to work for file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haiqi96 You're right—Docker bind mounts support files with --mount as long as the source exists. No change is required to switch to the parent directory.

Two small hardening tweaks you might consider:

  • Make the config bind read-only to avoid accidental writes:
    • pass is_read_only=True when constructing the DockerMount for the generated config file.
  • Optionally skip the file-level mount if the logs directory is already mounted (the file will be available via that directory), to avoid redundant mounts.

Otherwise, keeping the file mount is fine given dump_container_config runs before container start.


🧠 Learnings used
Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.409Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

# Only create the mount if the directory exists
if clp_config.aws_config_directory is not None:
container_clp_config.aws_config_directory = CONTAINER_AWS_CONFIG_DIRECTORY
Expand All @@ -278,6 +292,7 @@ def generate_container_config(
clp_config.aws_config_directory,
container_clp_config.aws_config_directory,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with a new line, but in case if it's not intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me revert since it's not strictly related to the PR

return container_clp_config, docker_mounts


Expand Down Expand Up @@ -313,13 +328,17 @@ def dump_container_config(


def generate_container_start_cmd(
container_name: str, container_mounts: List[Optional[DockerMount]], container_image: str
container_name: str,
container_mounts: List[Optional[DockerMount]],
container_image: str,
extra_env_vars: Optional[Dict[str, str]] = None
) -> List[str]:
"""
Generates the command to start a container with the given mounts and name.
:param container_name:
:param container_mounts:
:param container_image:
:param extra_env_vars:
:return: The command.
"""
clp_site_packages_dir = CONTAINER_CLP_HOME / "lib" / "python3" / "site-packages"
Expand All @@ -335,6 +354,9 @@ def generate_container_start_cmd(
"--name", container_name,
"--log-driver", "local"
]
if extra_env_vars:
for key, value in extra_env_vars.items():
container_start_cmd.extend(["-e", f"{key}={value}"])
for mount in container_mounts:
if mount:
container_start_cmd.append("--mount")
Expand Down Expand Up @@ -377,6 +399,11 @@ def load_config_file(
hostname = socket.gethostname()
clp_config.data_directory /= hostname
clp_config.logs_directory /= hostname
clp_config.generated_config_file_path = (
clp_config.generated_config_file_path.parent
/ hostname
/ clp_config.generated_config_file_path.name
)

return clp_config

Expand Down Expand Up @@ -590,3 +617,71 @@ def validate_dataset_name(clp_table_prefix: str, dataset_name: str) -> None:
f"Invalid dataset name: `{dataset_name}`. Names can only be a maximum of"
f" {dataset_name_max_len} characters long."
)


def generate_common_environment_variables(
include_clp_home=True,
) -> List[str]:
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Function signature mismatch with usage in start_clp.py.

The function generate_common_environment_variables doesn't accept container_clp_config as a parameter, but it's being called with this argument at line 1002 in start_clp.py. This will cause a TypeError at runtime.


🏁 Script executed:

#!/bin/bash
# Verify all calls to generate_common_environment_variables
rg -A 2 "generate_common_environment_variables\(" --type py

Length of output: 2317


We should confirm the exact locations in start_clp.py where the incorrect calls occur so we can update them precisely. Let’s list those lines with context:


🏁 Script executed:

#!/bin/bash
rg -n -C3 "generate_common_environment_variables" components/clp-package-utils/clp_package_utils/scripts/start_clp.py

Length of output: 1838


Fix incorrect calls to generate_common_environment_variables in start_clp.py.

The function signature in components/clp-package-utils/clp_package_utils/general.py is

def generate_common_environment_variables(
    include_clp_home: bool = True,
) -> List[str]:
    ...

It does not accept a container_clp_config argument, yet in start_clp.py it’s being called like this:

• start_clp.py:247
*generate_common_environment_variables(container_clp_config),

• start_clp.py:608
*generate_common_environment_variables(container_clp_config),

• start_clp.py:929
*generate_common_environment_variables(container_clp_config),

• start_clp.py:1002
*generate_common_environment_variables(container_clp_config, include_clp_home=True),

To resolve the TypeError at runtime, update these invocations so they only pass the include_clp_home flag (or rely on its default), for example:

- *generate_common_environment_variables(container_clp_config),
+ *generate_common_environment_variables(),

- *generate_common_environment_variables(container_clp_config, include_clp_home=True),
+ *generate_common_environment_variables(include_clp_home=True),

Please apply these changes at the locations above.

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py at lines 622 to
624, the function generate_common_environment_variables only accepts the
include_clp_home boolean argument. In start_clp.py at lines 247, 608, 929, and
1002, calls to this function incorrectly pass container_clp_config as an
argument. To fix the TypeError, update these calls to remove
container_clp_config and only pass include_clp_home if needed, or rely on its
default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check the code again? i don't think we are calling generate_common_environment_variables with container_clp_config

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -e

echo "== Definition =="
rg -n -C1 $'^def generate_common_environment_variables\\(' components

echo ""
echo "== All call sites =="
rg -n -C3 "generate_common_environment_variables\\(" --type py

echo ""
echo "== start_clp.py call sites =="
fd start_clp.py components | xargs -I{} rg -n -C3 "generate_common_environment_variables\\(" {}

Length of output: 6350


@coderabbit-user Thanks, @haiqi96 — I rechecked the code with a repo-wide search. All calls to generate_common_environment_variables are correct; none passes container_clp_config.

Verified call sites in start_clp.py:

  • Line 246: *generate_common_environment_variables()
  • Line 607: *generate_common_environment_variables()
  • Line 738: *generate_common_environment_variables(include_clp_home=True)
  • Line 930: *generate_common_environment_variables()
  • Line 1005: *generate_common_environment_variables(include_clp_home=True)

Conclusion: No TypeError risk here. The earlier note was incorrect; no changes needed.


🧬 Code Graph Analysis Results

Relevant snippet from components/clp-package-utils/clp_package_utils/general.py (lines 313-346):

def generate_common_environment_variables(
    include_clp_home=True,
) -> List[str]:
    """
    Generate a list of common environment variables for Docker containers.

    :param include_clp_home:
    :return: List of environment variable strings in the format "KEY=VALUE".
    """
    clp_site_packages_dir = CONTAINER_CLP_HOME / "lib" / "python3" / "site-packages"
    env_vars = [f"PYTHONPATH={clp_site_packages_dir}"]

    if include_clp_home:
        env_vars.append(f"CLP_HOME={CONTAINER_CLP_HOME}")

    return env_vars

This function generate_common_environment_variables does not take any container_clp_config parameter. It only optionally includes the CLP_HOME environment variable and always includes PYTHONPATH.


Relevant snippet from components/clp-package-utils/clp_package_utils/general.py (lines 348-374):

def generate_credential_environment_variables(
    container_clp_config: CLPConfig,
    include_db_credentials=False,
    include_queue_credentials=False,
    include_redis_credentials=False,
) -> List[str]:
    """
    Generate a list of credential environment variables for Docker containers.

    :param container_clp_config:
    :param include_db_credentials:
    :param include_queue_credentials:
    :param include_redis_credentials:
    :return: List of environment variable strings in the format "KEY=VALUE".
    """
    env_vars = []

    if include_db_credentials:
        env_vars.append(f"CLP_DB_USER={container_clp_config.database.username}")
        env_vars.append(f"CLP_DB_PASS={container_clp_config.database.password}")

    if include_queue_credentials:
        env_vars.append(f"CLP_QUEUE_USER={container_clp_config.queue.username}")
        env_vars.append(f"CLP_QUEUE_PASS={container_clp_config.queue.password}")

    if include_redis_credentials:
        env_vars.append(f"CLP_REDIS_PASS={container_clp_config.redis.password}")

    return env_vars

This function generate_credential_environment_variables explicitly takes container_clp_config as a parameter and uses it to generate environment variables related to credentials.


Summary of generate_container_config function (lines 180-236):

This function takes a clp_config and clp_home and returns a tuple (container_clp_config, docker_mounts) where:

  • container_clp_config is a deep copy of clp_config with paths adjusted for container usage.
  • docker_mounts is an instance of CLPDockerMounts containing DockerMount objects for various directories.

This function prepares the container configuration but does not call generate_common_environment_variables.


Summary of generate_container_start_cmd function (lines 270-300):

This function generates the Docker run command line as a list of strings. It accepts:

  • container_name
  • container_mounts (list of DockerMount or None)
  • container_image
  • extra_env_vars (optional dictionary of environment variables)

It sets some environment variables like PYTHONPATH and CLP_HOME internally, but it does not call generate_common_environment_variables with a container_clp_config argument.


Summary of generate_common_environment_variables usage:

  • The function generate_common_environment_variables does not accept a container_clp_config argument.
  • It only optionally includes CLP_HOME and always includes PYTHONPATH.
  • Credential environment variables are generated separately by generate_credential_environment_variables which does accept container_clp_config.

Summary of CLPConfig class (from clp_py_utils/clp_config.py):

  • CLPConfig is a configuration class with attributes like database, queue, redis, etc.
  • It has methods to load credentials from files or environment variables.
  • It is used as the type for container_clp_config in the environment variable generation functions.

These snippets and summaries provide context on the function signatures and usage of generate_common_environment_variables and related functions in the file general.py. They clarify that generate_common_environment_variables is not called with container_clp_config anywhere in this file.


🧠 Learnings used
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: junhaoliao
PR: y-scope/clp#1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.409Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

"""
Generate a list of common environment variables for Docker containers.

:param include_clp_home:
:return: List of environment variable strings in the format "KEY=VALUE".
"""
clp_site_packages_dir = CONTAINER_CLP_HOME / "lib" / "python3" / "site-packages"
env_vars = [f"PYTHONPATH={clp_site_packages_dir}"]

if include_clp_home:
env_vars.append(f"CLP_HOME={CONTAINER_CLP_HOME}")

return env_vars


def generate_credential_environment_variables(
container_clp_config: CLPConfig,
include_db_credentials=False,
include_queue_credentials=False,
include_redis_credentials=False,
) -> List[str]:
"""
Generate a list of credential environment variables for Docker containers.

:param container_clp_config:
:param include_db_credentials:
:param include_queue_credentials:
:param include_redis_credentials:
:return: List of environment variable strings in the format "KEY=VALUE".
"""
env_vars = []

if include_db_credentials:
env_vars.append(f"CLP_DB_USER={container_clp_config.database.username}")
env_vars.append(f"CLP_DB_PASS={container_clp_config.database.password}")

if include_queue_credentials:
env_vars.append(f"CLP_QUEUE_USER={container_clp_config.queue.username}")
env_vars.append(f"CLP_QUEUE_PASS={container_clp_config.queue.password}")

if include_redis_credentials:
env_vars.append(f"CLP_REDIS_PASS={container_clp_config.redis.password}")

return env_vars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Assert credentials are loaded before emitting env vars

Defensive checks avoid propagating "None" into env vars if a loader wasn’t called. Use existing ensure_credentials_loaded for DB; check None for queue/redis.

 def generate_credential_environment_variables(
     container_clp_config: CLPConfig,
     include_db_credentials=False,
     include_queue_credentials=False,
     include_redis_credentials=False,
 ) -> List[str]:
@@
-    if include_db_credentials:
+    if include_db_credentials:
+        container_clp_config.database.ensure_credentials_loaded()
         env_vars.append(f"CLP_DB_USER={container_clp_config.database.username}")
         env_vars.append(f"CLP_DB_PASS={container_clp_config.database.password}")
 
-    if include_queue_credentials:
+    if include_queue_credentials:
+        if (
+            container_clp_config.queue.username is None
+            or container_clp_config.queue.password is None
+        ):
+            raise ValueError("Queue credentials not loaded.")
         env_vars.append(f"CLP_QUEUE_USER={container_clp_config.queue.username}")
         env_vars.append(f"CLP_QUEUE_PASS={container_clp_config.queue.password}")
 
-    if include_redis_credentials:
+    if include_redis_credentials:
+        if container_clp_config.redis.password is None:
+            raise ValueError("Redis password not loaded.")
         env_vars.append(f"CLP_REDIS_PASS={container_clp_config.redis.password}")
 
     return env_vars
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def generate_credential_environment_variables(
container_clp_config: CLPConfig,
include_db_credentials=False,
include_queue_credentials=False,
include_redis_credentials=False,
) -> List[str]:
"""
Generate a list of credential environment variables for Docker containers.
:param container_clp_config:
:param include_db_credentials:
:param include_queue_credentials:
:param include_redis_credentials:
:return: List of environment variable strings in the format "KEY=VALUE".
"""
env_vars = []
if include_db_credentials:
env_vars.append(f"CLP_DB_USER={container_clp_config.database.username}")
env_vars.append(f"CLP_DB_PASS={container_clp_config.database.password}")
if include_queue_credentials:
env_vars.append(f"CLP_QUEUE_USER={container_clp_config.queue.username}")
env_vars.append(f"CLP_QUEUE_PASS={container_clp_config.queue.password}")
if include_redis_credentials:
env_vars.append(f"CLP_REDIS_PASS={container_clp_config.redis.password}")
return env_vars
def generate_credential_environment_variables(
container_clp_config: CLPConfig,
include_db_credentials=False,
include_queue_credentials=False,
include_redis_credentials=False,
) -> List[str]:
"""
Generate a list of credential environment variables for Docker containers.
:param container_clp_config:
:param include_db_credentials:
:param include_queue_credentials:
:param include_redis_credentials:
:return: List of environment variable strings in the format "KEY=VALUE".
"""
env_vars = []
if include_db_credentials:
container_clp_config.database.ensure_credentials_loaded()
env_vars.append(f"CLP_DB_USER={container_clp_config.database.username}")
env_vars.append(f"CLP_DB_PASS={container_clp_config.database.password}")
if include_queue_credentials:
if (
container_clp_config.queue.username is None
or container_clp_config.queue.password is None
):
raise ValueError("Queue credentials not loaded.")
env_vars.append(f"CLP_QUEUE_USER={container_clp_config.queue.username}")
env_vars.append(f"CLP_QUEUE_PASS={container_clp_config.queue.password}")
if include_redis_credentials:
if container_clp_config.redis.password is None:
raise ValueError("Redis password not loaded.")
env_vars.append(f"CLP_REDIS_PASS={container_clp_config.redis.password}")
return env_vars
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 632 to
661, call the existing ensure_credentials_loaded helper for the database before
emitting CLP_DB_USER/CLP_DB_PASS and for queue and redis perform defensive None
checks on username/password (and skip emitting any env var whose value is None)
so you never insert "None" into environment strings; if desired, log or raise a
clear error when required credentials are missing rather than appending them.


def generate_celery_connection_environment_variables(container_clp_config: CLPConfig) -> List[str]:
"""
Generate a list of Celery connection environment variables for Docker containers.

:param container_clp_config:
:return: List of environment variable strings in the format "KEY=VALUE".
"""
env_vars = [
f"BROKER_URL=amqp://"
f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
Comment on lines +701 to +707
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

URL-encode credentials in Celery connection strings

Credentials containing special characters (e.g., @, :, /) will break the connection URLs.

Apply URL encoding to credentials before embedding them in connection strings:

+from urllib.parse import quote

 def get_celery_connection_env_vars_list(container_clp_config: CLPConfig) -> List[str]:
     """
     :param container_clp_config:
     :return: A list of Celery connection environment variables for Docker containers, in the format
     "KEY=VALUE".
     """
+    queue_user = quote(container_clp_config.queue.username, safe='')
+    queue_pass = quote(container_clp_config.queue.password, safe='')
+    redis_pass = quote(container_clp_config.redis.password, safe='')
     env_vars = [
         f"BROKER_URL=amqp://"
-        f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
+        f"{queue_user}:{queue_pass}@"
         f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
-        f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
+        f"RESULT_BACKEND=redis://default:{redis_pass}@"
         f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
         f"{container_clp_config.redis.query_backend_database}",
     ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
env_vars = [
f"BROKER_URL=amqp://"
f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
from urllib.parse import quote
def get_celery_connection_env_vars_list(container_clp_config: CLPConfig) -> List[str]:
"""
:param container_clp_config:
:return: A list of Celery connection environment variables for Docker containers, in the format
"KEY=VALUE".
"""
queue_user = quote(container_clp_config.queue.username, safe='')
queue_pass = quote(container_clp_config.queue.password, safe='')
redis_pass = quote(container_clp_config.redis.password, safe='')
env_vars = [
f"BROKER_URL=amqp://"
f"{queue_user}:{queue_pass}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{redis_pass}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
]
return env_vars
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 695 to
701, the Celery BROKER_URL and RESULT_BACKEND strings embed raw usernames and
passwords which will break when credentials contain special characters;
URL-encode the queue username and password and the redis password (using
urllib.parse.quote_plus or quote) before interpolating them into the connection
URLs so special characters are percent-encoded, and update the env_vars
construction to use the encoded credential variables.

]

return env_vars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

URL-encode credentials in connection URLs

If credentials contain reserved characters, the URLs can break. Quote user/pass to be safe.

+    from urllib.parse import quote
@@
-    env_vars = [
-        f"BROKER_URL=amqp://"
-        f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
-        f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
-        f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
-        f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
-        f"{container_clp_config.redis.query_backend_database}",
-    ]
+    q_user = "" if container_clp_config.queue.username is None else quote(container_clp_config.queue.username, safe="")
+    q_pass = "" if container_clp_config.queue.password is None else quote(container_clp_config.queue.password, safe="")
+    r_pass = "" if container_clp_config.redis.password is None else quote(container_clp_config.redis.password, safe="")
+    env_vars = [
+        (
+            "BROKER_URL="
+            f"amqp://{q_user}:{q_pass}@{container_clp_config.queue.host}:{container_clp_config.queue.port}"
+        ),
+        (
+            "RESULT_BACKEND="
+            f"redis://default:{r_pass}@{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
+            f"{container_clp_config.redis.query_backend_database}"
+        ),
+    ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def generate_celery_connection_environment_variables(container_clp_config: CLPConfig) -> List[str]:
"""
Generate a list of Celery connection environment variables for Docker containers.
:param container_clp_config:
:return: List of environment variable strings in the format "KEY=VALUE".
"""
env_vars = [
f"BROKER_URL=amqp://"
f"{container_clp_config.queue.username}:{container_clp_config.queue.password}@"
f"{container_clp_config.queue.host}:{container_clp_config.queue.port}",
f"RESULT_BACKEND=redis://default:{container_clp_config.redis.password}@"
f"{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}",
]
return env_vars
def generate_celery_connection_environment_variables(container_clp_config: CLPConfig) -> List[str]:
"""
Generate a list of Celery connection environment variables for Docker containers.
:param container_clp_config:
:return: List of environment variable strings in the format "KEY=VALUE".
"""
from urllib.parse import quote
q_user = "" if container_clp_config.queue.username is None else quote(container_clp_config.queue.username, safe="")
q_pass = "" if container_clp_config.queue.password is None else quote(container_clp_config.queue.password, safe="")
r_pass = "" if container_clp_config.redis.password is None else quote(container_clp_config.redis.password, safe="")
env_vars = [
(
"BROKER_URL="
f"amqp://{q_user}:{q_pass}@{container_clp_config.queue.host}:{container_clp_config.queue.port}"
),
(
"RESULT_BACKEND="
f"redis://default:{r_pass}@{container_clp_config.redis.host}:{container_clp_config.redis.port}/"
f"{container_clp_config.redis.query_backend_database}"
),
]
return env_vars
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 663 to
679, the generated connection URLs embed raw usernames and passwords which can
break when they contain reserved characters; update the function to URL-encode
the queue.username, queue.password, redis.password (and any other credential
parts) before interpolating into the broker and redis URLs (use
urllib.parse.quote_plus or quote for encoding), add the necessary import at the
top of the file, and replace the direct attribute usage in the f-strings with
the encoded variables so the resulting BROKER_URL and RESULT_BACKEND are safe.

Original file line number Diff line number Diff line change
Expand Up @@ -206,14 +206,17 @@ def main(argv: typing.List[str]) -> int:
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)

necessary_mounts: typing.List[CLPDockerMounts] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in the area:

Suggested change
necessary_mounts: typing.List[CLPDockerMounts] = [
necessary_mounts: typing.List[DockerMount] = [

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have the fix in #1186

mounts.clp_home,
mounts.logs_dir,
mounts.archives_output_dir,
]
extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
container_start_cmd: typing.List[str] = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)

# fmt: off
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,12 @@ def main(argv):

_generate_logs_list(clp_config.logs_input.type, container_logs_list_path, parsed_args)

extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)
compress_cmd = _generate_compress_cmd(
parsed_args, dataset, generated_config_path_on_container, logs_list_path_on_container
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,13 @@ def handle_extract_file_cmd(
container_paths_to_extract_file_path,
)
)

extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)

# fmt: off
Expand Down Expand Up @@ -203,9 +208,13 @@ def handle_extract_stream_cmd(
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)
extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
necessary_mounts = [mounts.clp_home, mounts.logs_dir]
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)

# fmt: off
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import argparse
import logging
import os
import shutil
import sys
import typing
Expand Down Expand Up @@ -188,6 +189,12 @@ def main(argv: typing.List[str]) -> int:
except:
logger.exception("Failed to load config.")
return -1
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize credential loading and exception handling

Use CLPConfig.load_database_credentials_from_env() for consistency with the rest of the codebase and to get proper missing/empty var validation. Catch ValueError accordingly.

-    try:
-        clp_config.database.username = os.environ["CLP_DB_USER"]
-        clp_config.database.password = os.environ["CLP_DB_PASS"]
-    except KeyError as e:
-        logger.error(f"Missing environment variable: {e}")
-        return -1
+    try:
+        clp_config.load_database_credentials_from_env()
+    except ValueError as err:
+        logger.error(err)
+        return -1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1
try:
clp_config.load_database_credentials_from_env()
except ValueError as err:
logger.error(err)
return -1
🤖 Prompt for AI Agents
In
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py
around lines 191 to 196, replace the direct os.environ lookups and KeyError
handling with the standardized CLPConfig.load_database_credentials_from_env()
call to populate clp_config.database (so missing or empty vars are validated
consistently); catch ValueError from that loader instead of KeyError, log the
error message with logger.error(...), and return -1 on failure.


database_config: Database = clp_config.database
dataset = parsed_args.dataset
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
import datetime
import logging
import os
import pathlib
import sys
import time
Expand Down Expand Up @@ -229,6 +230,12 @@ def main(argv):
except:
logger.exception("Failed to load config.")
return -1
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1

comp_jobs_dir = clp_config.logs_directory / "comp-jobs"
comp_jobs_dir.mkdir(parents=True, exist_ok=True)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
import asyncio
import logging
import os
import pathlib
import subprocess
import sys
Expand Down Expand Up @@ -117,6 +118,13 @@ def handle_extract_stream_cmd(
)
if clp_config is None:
return -1
# Override credentials with environment variables
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can put this logic into native/utils.py since I see it duplicated for a few times.
Note this would require us to log within a method in native/utils.py, there is an example of how we do similar things but I would like to confirm with Kirk.

@kirkrodrigues Can we create a helper method in utils which takes in a logger as argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending @kirkrodrigues 's comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i think we can just replace this with

clp_config.load_database_credentials_from_env()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then check if the credentials are loaded here. if not, log a message those aren't set


command = parsed_args.command

Expand Down Expand Up @@ -222,6 +230,13 @@ def handle_extract_file_cmd(
)
if clp_config is None:
return -1
# Override credentials with environment variables
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1

paths = parsed_args.paths
list_path = parsed_args.files_from
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import asyncio
import ipaddress
import logging
import os
import pathlib
import socket
import sys
Expand Down Expand Up @@ -297,6 +298,12 @@ def main(argv):
except:
logger.exception("Failed to load config.")
return -1
try:
clp_config.database.username = os.environ["CLP_DB_USER"]
clp_config.database.password = os.environ["CLP_DB_PASS"]
except KeyError as e:
logger.error(f"Missing environment variable: {e}")
return -1

database_config: Database = clp_config.database
dataset = parsed_args.dataset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,13 @@ def main(argv):
generated_config_path_on_container, generated_config_path_on_host = dump_container_config(
container_clp_config, clp_config, container_name
)

necessary_mounts = [mounts.clp_home, mounts.logs_dir]
extra_env_vars = {
"CLP_DB_USER": clp_config.database.username,
"CLP_DB_PASS": clp_config.database.password,
}
container_start_cmd = generate_container_start_cmd(
container_name, necessary_mounts, clp_config.execution_container
container_name, necessary_mounts, clp_config.execution_container, extra_env_vars
)

# fmt: off
Expand Down
Loading