Skip to content

Conversation

@junhaoliao
Copy link
Owner

@junhaoliao junhaoliao commented Oct 25, 2025

Summary

  • enforce explicit handling of missing and empty coordinator config entries when loading env vars
  • remove config.properties fallbacks now that worker memory env vars are always populated

Testing

  • ./bin/task lint:fix-py
  • python -m compileall tools/deployment/presto-clp/scripts/init.py

https://chatgpt.com/codex/tasks/task_e_68fcd8c77ea48327afeeddb4baf320f9

Summary by CodeRabbit

  • New Features

    • Workers auto-detect system memory and set worker memory parameters for better defaults.
    • Detected memory values are persisted in worker configuration for clearer control.
  • Configuration

    • Deployment validates required coordinator and memory settings and halts gracefully if values are missing.
  • Dependencies

    • Added psutil to enable system memory detection.

@coderabbitai
Copy link

coderabbitai bot commented Oct 25, 2025

Walkthrough

Adds automatic worker memory detection and env var configuration: init.py now uses psutil to compute and export query/system memory values, validates required coordinator keys, and updates discovery URI; requirements and templates were updated to include new memory variables.

Changes

Cohort / File(s) Summary
Worker init script
tools/deployment/presto-clp/scripts/init.py
Added _get_required_env_var helper; refactored _add_worker_env_vars to use it and return False on missing keys; introduced _configure_worker_memory_env_vars using psutil and math to compute and set PRESTO_WORKER_CONFIGPROPERTIES_QUERY_MEMORY_GB, PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEMORY_GB, PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEM_LIMIT_GB; discovery URI now built from resolved coordinator service name and port.
Runtime dependencies
tools/deployment/presto-clp/scripts/requirements.txt
Added psutil.
Worker environment and config templates
tools/deployment/presto-clp/worker.env, tools/deployment/presto-clp/worker/config-template/config.properties
Added memory-related env vars and corresponding config.properties entries: system-memory-gb, query-memory-gb, system-mem-limit-gb; updated worker.env default values.

Sequence Diagram(s)

sequenceDiagram
    participant Init as init.py
    participant Ps as psutil
    participant Env as Env / Filesystem

    Init->>Init: start worker env setup
    Init->>Init: _get_required_env_var(PRESTO_COORDINATOR_SERVICENAME)
    alt key missing
        Init-->>Env: log error, return False (abort setup)
    else key present
        Init->>Init: _get_required_env_var(PRESTO_COORDINATOR_HTTPPORT)
        Init->>Ps: get total system memory
        Ps-->>Init: total memory bytes
        rect rgb(220,240,200)
        Note over Init: compute query/system memory GB\nand system mem limit (math)
        end
        Init->>Env: set PRESTO_WORKER_CONFIGPROPERTIES_* env vars
        Init->>Init: build PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI
        Init-->>Env: write/update worker.env and templates
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to memory calculation and rounding in _configure_worker_memory_env_vars.
  • Verify _get_required_env_var behavior and early-return semantics in _add_worker_env_vars.
  • Check discovery URI formatting and port/service name resolution.
  • Confirm psutil is acceptable for target deployment environments.

"I hopped through bytes of RAM at dawn,
Counting gigabytes across the lawn.
I set three vars with a twitchy grin,
So presto workers know where to begin.
🐇✨"

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Refine Presto worker config validation" is directly related to the PR's stated objectives, which focus on enforcing explicit handling of missing coordinator config entries and removing fallbacks. The changes include introducing helper functions (_get_required_env_var) that improve error handling and configuration validation, returning False on missing keys instead of propagating exceptions, and restructuring how required environment variables are fetched. While the PR also adds memory configuration logic (psutil integration, new environment variables, and config properties), the overarching theme centers on improving configuration handling and validation, which the title appropriately captures. The title is specific enough to convey the purpose without being overly broad or vague.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/address-issue-1503-with-comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
tools/deployment/presto-clp/worker.env (1)

3-5: Avoid hardcoding generated memory values in template env.

Since init.py always computes these, keep them unset or commented to prevent drift/confusion across hosts.

Proposed diff:

+## Memory values are auto-generated by scripts/init.py based on the host/container limit.
+## Leave unset unless you intend to override.
-PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEMORY_GB=8
-PRESTO_WORKER_CONFIGPROPERTIES_QUERY_MEMORY_GB=5
+# PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEMORY_GB=
+# PRESTO_WORKER_CONFIGPROPERTIES_QUERY_MEMORY_GB=
 ...
-PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEM_LIMIT_GB=8
+# PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEM_LIMIT_GB=

Please confirm the runtime actually sources the init.py–generated env file (not worker.env) so these can remain commented.

Also applies to: 8-8

tools/deployment/presto-clp/scripts/init.py (3)

3-4: Guard psutil import to fail fast with a clear message.

Top-level import will crash before logging if psutil isn’t installed. Make it explicit and check at use-site.

Proposed diff:

-import math
-import os
+import math
+import os
@@
-import psutil
+try:
+    import psutil  # type: ignore[import-not-found]
+except ImportError:
+    psutil = None  # type: ignore[assignment]

And at the start of _configure_worker_memory_env_vars:

 def _configure_worker_memory_env_vars(env_vars: Dict[str, str]) -> bool:
@@
-    try:
+    if psutil is None:
+        logger.error(
+            "psutil is required to detect system memory. Install with: pip install -r tools/deployment/presto-clp/scripts/requirements.txt"
+        )
+        return False
+    try:
         total_memory_bytes = int(psutil.virtual_memory().total)

Also applies to: 9-9


236-243: Validate coordinator port value before composing discovery URI.

Ensure the port is numeric and within [1, 65535] to catch misconfig early.

Proposed diff:

-    env_vars["PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI"] = (
-        f"http://{coordinator_service_name}:{coordinator_http_port}"
-    )
+    try:
+        port_int = int(coordinator_http_port)
+        if not (1 <= port_int <= 65535):
+            raise ValueError
+    except ValueError:
+        logger.error(
+            "Invalid PRESTO_COORDINATOR_HTTPPORT '%s' in '%s'",
+            coordinator_http_port,
+            coordinator_common_env_file_path,
+        )
+        return False
+    env_vars["PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI"] = (
+        f"http://{coordinator_service_name}:{port_int}"
+    )

305-314: Treat whitespace-only values as empty for required env keys.

dotenv entries like KEY=" " should fail validation, not pass.

Proposed diff:

-    value = config.get(key)
-    if value is None:
+    raw_value = config.get(key)
+    if raw_value is None:
         logger.error("Missing required key %s in '%s'", key, env_file_path)
         raise KeyError(key)
-
-    if value == "":
+    value = raw_value.strip()
+    if value == "":
         logger.error("Empty value for required key %s in '%s'", key, env_file_path)
         raise KeyError(key)

Consider applying the same empty-string check to _get_required_config_value for parity with file-based configs.

tools/deployment/presto-clp/scripts/requirements.txt (1)

1-1: Pin versions for reproducible, secure installs with correct major version bounds.

Unpinned packages risk breakage and supply-chain drift. The proposed diff needs a correction: psutil's upper bound should allow major version 7, which is now stable.

Proposed diff:

-psutil
-python-dotenv
-PyYAML
+psutil>=5.9,<8
+python-dotenv>=1.0,<2
+PyYAML>=6.0,<7

psutil 7.1.2 (released Oct 25, 2025) is the latest stable version, so the original constraint <7 was too restrictive.

Optionally manage dependencies via pip-tools (constraints/lock).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb7f0d5 and 5c10f26.

📒 Files selected for processing (4)
  • tools/deployment/presto-clp/scripts/init.py (2 hunks)
  • tools/deployment/presto-clp/scripts/requirements.txt (1 hunks)
  • tools/deployment/presto-clp/worker.env (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/config.properties (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
tools/deployment/presto-clp/scripts/init.py

257-257: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
tools/deployment/presto-clp/worker/config-template/config.properties (1)

6-8: LGTM: property placeholders match new env vars.

The three properties line up with init.py outputs. No issues spotted.

Comment on lines 254 to 275
try:
total_memory_bytes = int(psutil.virtual_memory().total)
except (AttributeError, ValueError) as exc:
logger.error("Unable to determine total system memory: %s", exc)
return False

total_memory_gb = total_memory_bytes / (1024**3)
total_memory_gb_floor = max(1, math.floor(total_memory_gb))

system_memory_gb = min(
total_memory_gb_floor,
max(1, math.floor(total_memory_gb * 0.90)),
)
query_memory_gb = min(
system_memory_gb,
max(1, math.floor(system_memory_gb * (2 / 3))),
)
system_mem_limit_gb = min(
total_memory_gb_floor,
max(system_memory_gb, max(1, math.floor(total_memory_gb * 0.94))),
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make memory detection cgroup-aware; switch to exception logging.

psutil reports host RAM, not container limits. In containers, this can over-allocate Presto and cause OOM. Also, Ruff flags TRY400 here.

Apply these changes:

  • Detect Linux cgroup limits (v2: /sys/fs/cgroup/memory.max; v1: /sys/fs/cgroup/memory/memory.limit_in_bytes) and cap to the lower of psutil total and cgroup limit.
  • Use logger.exception when memory detection fails.

Patch inside function:

-    try:
-        total_memory_bytes = int(psutil.virtual_memory().total)
-    except (AttributeError, ValueError) as exc:
-        logger.error("Unable to determine total system memory: %s", exc)
-        return False
+    try:
+        total_memory_bytes = int(psutil.virtual_memory().total)
+    except Exception:
+        logger.exception("Unable to determine total system memory")
+        return False
+
+    # Respect container memory limits when present (cgroup v2/v1)
+    cgroup_limit_bytes = _read_cgroup_memory_limit_bytes()
+    if cgroup_limit_bytes and cgroup_limit_bytes > 0:
+        total_memory_bytes = min(total_memory_bytes, cgroup_limit_bytes)

Add this helper (outside the function) to keep things tidy:

from pathlib import Path

def _read_cgroup_memory_limit_bytes() -> Optional[int]:
    # cgroup v2
    v2 = Path("/sys/fs/cgroup/memory.max")
    # cgroup v1
    v1 = Path("/sys/fs/cgroup/memory/memory.limit_in_bytes")
    for p in (v2, v1):
        try:
            with open(p, "r", encoding="utf-8") as f:
                raw = f.read().strip()
            if raw and raw.lower() != "max":
                val = int(raw)
                if val > 0:
                    return val
        except FileNotFoundError:
            continue
        except Exception as exc:
            logger.debug("Ignoring cgroup memory read error from %s: %s", p, exc)
    return None

This prevents overshooting container RAM and aligns with typical Presto deployment patterns.

🧰 Tools
🪛 Ruff (0.14.1)

257-257: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 254 to 275,
psutil.virtual_memory() is being used alone which returns host RAM and can
over-allocate inside containers; change the logic to detect cgroup limits and
cap total_memory_bytes to the lower of psutil total and any cgroup memory limit
(check cgroup v2 /sys/fs/cgroup/memory.max and cgroup v1
/sys/fs/cgroup/memory/memory.limit_in_bytes, ignore values of "max" or
non-positive, and treat missing files as no limit). Add a helper function
(placed outside the current function) that reads those two paths robustly,
returns an Optional[int] cgroup limit, and logs debug on read errors; then use
that helper to set total_memory_bytes = min(psutil_total, cgroup_limit) when
cgroup_limit is present. Also switch the existing except handler to
logger.exception(...) so exceptions during memory detection are logged with
traceback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tools/deployment/presto-clp/scripts/init.py (1)

251-286: Container memory limits are not respected; missing error handling.

This is the same concern raised in a previous review: psutil reports host RAM, which can cause over-allocation in containers leading to OOM kills. Additionally, there's no error handling for the psutil call.

As suggested in the previous review:

  1. Detect cgroup limits (v2: /sys/fs/cgroup/memory.max; v1: /sys/fs/cgroup/memory/memory.limit_in_bytes) and cap to the lower of psutil total and cgroup limit
  2. Add try-except with logger.exception for the memory detection call

Based on learnings

🧹 Nitpick comments (1)
tools/deployment/presto-clp/scripts/init.py (1)

258-261: Simplify redundant min() operation.

The outer min(total_memory_gb_floor, ...) is redundant since math.floor(total_memory_gb * 0.90) will always be ≤ total_memory_gb_floor.

Apply this diff:

-    system_memory_gb = min(
-        total_memory_gb_floor,
-        max(1, math.floor(total_memory_gb * 0.90)),
-    )
+    system_memory_gb = max(1, math.floor(total_memory_gb * 0.90))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c10f26 and 49fe994.

📒 Files selected for processing (2)
  • tools/deployment/presto-clp/scripts/init.py (2 hunks)
  • tools/deployment/presto-clp/worker.env (1 hunks)
🔇 Additional comments (4)
tools/deployment/presto-clp/worker.env (1)

3-5: Clarify the purpose of these hardcoded memory values.

The init.py script now dynamically computes these memory environment variables using psutil. Are these hardcoded values intended as:

  • Fallback defaults if init.py fails?
  • Example values for documentation?
  • Temporary values that get overwritten by the init script output?

If they're meant to be computed dynamically, consider adding a comment explaining their purpose to avoid confusion.

tools/deployment/presto-clp/scripts/init.py (3)

3-4: LGTM!

The new imports are correctly added to support memory calculation (math) and system memory detection (psutil).

Also applies to: 9-9


232-246: LGTM!

The refactored validation logic correctly uses the new _get_required_env_var helper to enforce explicit handling of missing/empty values, and properly constructs the discovery URI.


289-309: LGTM!

The helper correctly enforces explicit validation for both missing and empty environment variable values, with appropriate error logging and exception handling.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tools/deployment/presto-clp/scripts/init.py (1)

251-287: Add error handling and cgroup-awareness to memory detection.

The function has two critical issues:

  1. No error handling: If psutil.virtual_memory() fails, the function will crash. Wrap the call in a try-except block.

  2. Container limits not respected: psutil.virtual_memory().total reports host RAM, not container cgroup limits. In containerized Presto deployments, this can allocate more memory than the container has, causing OOM kills.

Apply the solution from the previous review:

Add a helper function to read cgroup limits:

from pathlib import Path

def _read_cgroup_memory_limit_bytes() -> Optional[int]:
    """Reads container memory limit from cgroup v2 or v1."""
    # cgroup v2
    v2 = Path("/sys/fs/cgroup/memory.max")
    # cgroup v1
    v1 = Path("/sys/fs/cgroup/memory/memory.limit_in_bytes")
    for p in (v2, v1):
        try:
            with open(p, "r", encoding="utf-8") as f:
                raw = f.read().strip()
            if raw and raw.lower() != "max":
                val = int(raw)
                if val > 0:
                    return val
        except FileNotFoundError:
            continue
        except Exception as exc:
            logger.debug("Ignoring cgroup memory read error from %s: %s", p, exc)
    return None

Then update _configure_worker_memory_env_vars:

 def _configure_worker_memory_env_vars(env_vars: Dict[str, str]) -> bool:
     """Adds memory-related environment variables for the worker."""
 
-    total_memory_bytes = psutil.virtual_memory().total
+    try:
+        total_memory_bytes = int(psutil.virtual_memory().total)
+    except Exception:
+        logger.exception("Unable to determine total system memory")
+        return False
+
+    # Respect container memory limits when present (cgroup v2/v1)
+    cgroup_limit_bytes = _read_cgroup_memory_limit_bytes()
+    if cgroup_limit_bytes and cgroup_limit_bytes > 0:
+        total_memory_bytes = min(total_memory_bytes, cgroup_limit_bytes)
+
     total_memory_gb = total_memory_bytes / (1024**3)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49fe994 and 67a3ded.

📒 Files selected for processing (2)
  • tools/deployment/presto-clp/scripts/init.py (2 hunks)
  • tools/deployment/presto-clp/worker.env (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/deployment/presto-clp/worker.env
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
tools/deployment/presto-clp/scripts/init.py (2)

3-3: LGTM!

The new imports are properly utilized in the memory configuration logic.

Also applies to: 9-9


232-246: LGTM!

The refactoring properly enforces explicit handling of missing coordinator config entries. The try-except pattern cleanly propagates validation failures by returning False.

Comment on lines +289 to +306
def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str:
"""
Fetches a required entry from an environment variables dictionary.
:param config:
:param key: The key that must exist in `config` with a non-empty value.
:param env_file_path:
:return: The string value associated with `key`.
:raises KeyError: If `key` is missing.
"""

value = config.get(key)
if value is None:
logger.error("Missing required key %s in '%s'", key, env_file_path)
raise KeyError(key)

return value

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider validating empty strings to match the docstring.

The docstring states the function requires a "non-empty value," but the implementation only checks for None. Empty strings would pass validation but might be invalid for coordinator settings.

If empty strings should be rejected, apply this diff:

 def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str:
     """
     Fetches a required entry from an environment variables dictionary.
 
     :param config:
     :param key: The key that must exist in `config` with a non-empty value.
     :param env_file_path:
     :return: The string value associated with `key`.
     :raises KeyError: If `key` is missing.
     """
 
     value = config.get(key)
-    if value is None:
+    if not value:
         logger.error("Missing required key %s in '%s'", key, env_file_path)
         raise KeyError(key)
 
     return value

Alternatively, update the docstring to clarify that only None is checked:

-    :param key: The key that must exist in `config` with a non-empty value.
+    :param key: The key that must exist in `config` with a non-None value.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str:
"""
Fetches a required entry from an environment variables dictionary.
:param config:
:param key: The key that must exist in `config` with a non-empty value.
:param env_file_path:
:return: The string value associated with `key`.
:raises KeyError: If `key` is missing.
"""
value = config.get(key)
if value is None:
logger.error("Missing required key %s in '%s'", key, env_file_path)
raise KeyError(key)
return value
def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str:
"""
Fetches a required entry from an environment variables dictionary.
:param config:
:param key: The key that must exist in `config` with a non-empty value.
:param env_file_path:
:return: The string value associated with `key`.
:raises KeyError: If `key` is missing.
"""
value = config.get(key)
if not value:
logger.error("Missing required key %s in '%s'", key, env_file_path)
raise KeyError(key)
return value
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 289 to 306, the
function claims to require a "non-empty value" but only checks for None; change
the guard to reject empty or whitespace-only strings (e.g., treat value is None
or value.strip() == "" as missing), keep the same logger.error and raise
KeyError(key) behavior, and return the value otherwise; if you instead prefer
only to disallow None, update the docstring to remove "non-empty" wording.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants