-
Notifications
You must be signed in to change notification settings - Fork 0
Refine Presto worker config validation #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds automatic worker memory detection and env var configuration: init.py now uses psutil to compute and export query/system memory values, validates required coordinator keys, and updates discovery URI; requirements and templates were updated to include new memory variables. Changes
Sequence Diagram(s)sequenceDiagram
participant Init as init.py
participant Ps as psutil
participant Env as Env / Filesystem
Init->>Init: start worker env setup
Init->>Init: _get_required_env_var(PRESTO_COORDINATOR_SERVICENAME)
alt key missing
Init-->>Env: log error, return False (abort setup)
else key present
Init->>Init: _get_required_env_var(PRESTO_COORDINATOR_HTTPPORT)
Init->>Ps: get total system memory
Ps-->>Init: total memory bytes
rect rgb(220,240,200)
Note over Init: compute query/system memory GB\nand system mem limit (math)
end
Init->>Env: set PRESTO_WORKER_CONFIGPROPERTIES_* env vars
Init->>Init: build PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI
Init-->>Env: write/update worker.env and templates
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tools/deployment/presto-clp/worker.env (1)
3-5: Avoid hardcoding generated memory values in template env.Since init.py always computes these, keep them unset or commented to prevent drift/confusion across hosts.
Proposed diff:
+## Memory values are auto-generated by scripts/init.py based on the host/container limit. +## Leave unset unless you intend to override. -PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEMORY_GB=8 -PRESTO_WORKER_CONFIGPROPERTIES_QUERY_MEMORY_GB=5 +# PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEMORY_GB= +# PRESTO_WORKER_CONFIGPROPERTIES_QUERY_MEMORY_GB= ... -PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEM_LIMIT_GB=8 +# PRESTO_WORKER_CONFIGPROPERTIES_SYSTEM_MEM_LIMIT_GB=Please confirm the runtime actually sources the init.py–generated env file (not worker.env) so these can remain commented.
Also applies to: 8-8
tools/deployment/presto-clp/scripts/init.py (3)
3-4: Guard psutil import to fail fast with a clear message.Top-level import will crash before logging if psutil isn’t installed. Make it explicit and check at use-site.
Proposed diff:
-import math -import os +import math +import os @@ -import psutil +try: + import psutil # type: ignore[import-not-found] +except ImportError: + psutil = None # type: ignore[assignment]And at the start of
_configure_worker_memory_env_vars:def _configure_worker_memory_env_vars(env_vars: Dict[str, str]) -> bool: @@ - try: + if psutil is None: + logger.error( + "psutil is required to detect system memory. Install with: pip install -r tools/deployment/presto-clp/scripts/requirements.txt" + ) + return False + try: total_memory_bytes = int(psutil.virtual_memory().total)Also applies to: 9-9
236-243: Validate coordinator port value before composing discovery URI.Ensure the port is numeric and within [1, 65535] to catch misconfig early.
Proposed diff:
- env_vars["PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI"] = ( - f"http://{coordinator_service_name}:{coordinator_http_port}" - ) + try: + port_int = int(coordinator_http_port) + if not (1 <= port_int <= 65535): + raise ValueError + except ValueError: + logger.error( + "Invalid PRESTO_COORDINATOR_HTTPPORT '%s' in '%s'", + coordinator_http_port, + coordinator_common_env_file_path, + ) + return False + env_vars["PRESTO_COORDINATOR_CONFIGPROPERTIES_DISCOVERY_URI"] = ( + f"http://{coordinator_service_name}:{port_int}" + )
305-314: Treat whitespace-only values as empty for required env keys.dotenv entries like KEY=" " should fail validation, not pass.
Proposed diff:
- value = config.get(key) - if value is None: + raw_value = config.get(key) + if raw_value is None: logger.error("Missing required key %s in '%s'", key, env_file_path) raise KeyError(key) - - if value == "": + value = raw_value.strip() + if value == "": logger.error("Empty value for required key %s in '%s'", key, env_file_path) raise KeyError(key)Consider applying the same empty-string check to
_get_required_config_valuefor parity with file-based configs.tools/deployment/presto-clp/scripts/requirements.txt (1)
1-1: Pin versions for reproducible, secure installs with correct major version bounds.Unpinned packages risk breakage and supply-chain drift. The proposed diff needs a correction: psutil's upper bound should allow major version 7, which is now stable.
Proposed diff:
-psutil -python-dotenv -PyYAML +psutil>=5.9,<8 +python-dotenv>=1.0,<2 +PyYAML>=6.0,<7psutil 7.1.2 (released Oct 25, 2025) is the latest stable version, so the original constraint
<7was too restrictive.Optionally manage dependencies via pip-tools (constraints/lock).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tools/deployment/presto-clp/scripts/init.py(2 hunks)tools/deployment/presto-clp/scripts/requirements.txt(1 hunks)tools/deployment/presto-clp/worker.env(1 hunks)tools/deployment/presto-clp/worker/config-template/config.properties(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.1)
tools/deployment/presto-clp/scripts/init.py
257-257: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
tools/deployment/presto-clp/worker/config-template/config.properties (1)
6-8: LGTM: property placeholders match new env vars.The three properties line up with init.py outputs. No issues spotted.
| try: | ||
| total_memory_bytes = int(psutil.virtual_memory().total) | ||
| except (AttributeError, ValueError) as exc: | ||
| logger.error("Unable to determine total system memory: %s", exc) | ||
| return False | ||
|
|
||
| total_memory_gb = total_memory_bytes / (1024**3) | ||
| total_memory_gb_floor = max(1, math.floor(total_memory_gb)) | ||
|
|
||
| system_memory_gb = min( | ||
| total_memory_gb_floor, | ||
| max(1, math.floor(total_memory_gb * 0.90)), | ||
| ) | ||
| query_memory_gb = min( | ||
| system_memory_gb, | ||
| max(1, math.floor(system_memory_gb * (2 / 3))), | ||
| ) | ||
| system_mem_limit_gb = min( | ||
| total_memory_gb_floor, | ||
| max(system_memory_gb, max(1, math.floor(total_memory_gb * 0.94))), | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make memory detection cgroup-aware; switch to exception logging.
psutil reports host RAM, not container limits. In containers, this can over-allocate Presto and cause OOM. Also, Ruff flags TRY400 here.
Apply these changes:
- Detect Linux cgroup limits (v2: /sys/fs/cgroup/memory.max; v1: /sys/fs/cgroup/memory/memory.limit_in_bytes) and cap to the lower of psutil total and cgroup limit.
- Use logger.exception when memory detection fails.
Patch inside function:
- try:
- total_memory_bytes = int(psutil.virtual_memory().total)
- except (AttributeError, ValueError) as exc:
- logger.error("Unable to determine total system memory: %s", exc)
- return False
+ try:
+ total_memory_bytes = int(psutil.virtual_memory().total)
+ except Exception:
+ logger.exception("Unable to determine total system memory")
+ return False
+
+ # Respect container memory limits when present (cgroup v2/v1)
+ cgroup_limit_bytes = _read_cgroup_memory_limit_bytes()
+ if cgroup_limit_bytes and cgroup_limit_bytes > 0:
+ total_memory_bytes = min(total_memory_bytes, cgroup_limit_bytes)Add this helper (outside the function) to keep things tidy:
from pathlib import Path
def _read_cgroup_memory_limit_bytes() -> Optional[int]:
# cgroup v2
v2 = Path("/sys/fs/cgroup/memory.max")
# cgroup v1
v1 = Path("/sys/fs/cgroup/memory/memory.limit_in_bytes")
for p in (v2, v1):
try:
with open(p, "r", encoding="utf-8") as f:
raw = f.read().strip()
if raw and raw.lower() != "max":
val = int(raw)
if val > 0:
return val
except FileNotFoundError:
continue
except Exception as exc:
logger.debug("Ignoring cgroup memory read error from %s: %s", p, exc)
return NoneThis prevents overshooting container RAM and aligns with typical Presto deployment patterns.
🧰 Tools
🪛 Ruff (0.14.1)
257-257: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 254 to 275,
psutil.virtual_memory() is being used alone which returns host RAM and can
over-allocate inside containers; change the logic to detect cgroup limits and
cap total_memory_bytes to the lower of psutil total and any cgroup memory limit
(check cgroup v2 /sys/fs/cgroup/memory.max and cgroup v1
/sys/fs/cgroup/memory/memory.limit_in_bytes, ignore values of "max" or
non-positive, and treat missing files as no limit). Add a helper function
(placed outside the current function) that reads those two paths robustly,
returns an Optional[int] cgroup limit, and logs debug on read errors; then use
that helper to set total_memory_bytes = min(psutil_total, cgroup_limit) when
cgroup_limit is present. Also switch the existing except handler to
logger.exception(...) so exceptions during memory detection are logged with
traceback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tools/deployment/presto-clp/scripts/init.py (1)
251-286: Container memory limits are not respected; missing error handling.This is the same concern raised in a previous review: psutil reports host RAM, which can cause over-allocation in containers leading to OOM kills. Additionally, there's no error handling for the psutil call.
As suggested in the previous review:
- Detect cgroup limits (v2:
/sys/fs/cgroup/memory.max; v1:/sys/fs/cgroup/memory/memory.limit_in_bytes) and cap to the lower of psutil total and cgroup limit- Add try-except with
logger.exceptionfor the memory detection callBased on learnings
🧹 Nitpick comments (1)
tools/deployment/presto-clp/scripts/init.py (1)
258-261: Simplify redundant min() operation.The outer
min(total_memory_gb_floor, ...)is redundant sincemath.floor(total_memory_gb * 0.90)will always be ≤total_memory_gb_floor.Apply this diff:
- system_memory_gb = min( - total_memory_gb_floor, - max(1, math.floor(total_memory_gb * 0.90)), - ) + system_memory_gb = max(1, math.floor(total_memory_gb * 0.90))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tools/deployment/presto-clp/scripts/init.py(2 hunks)tools/deployment/presto-clp/worker.env(1 hunks)
🔇 Additional comments (4)
tools/deployment/presto-clp/worker.env (1)
3-5: Clarify the purpose of these hardcoded memory values.The init.py script now dynamically computes these memory environment variables using psutil. Are these hardcoded values intended as:
- Fallback defaults if init.py fails?
- Example values for documentation?
- Temporary values that get overwritten by the init script output?
If they're meant to be computed dynamically, consider adding a comment explaining their purpose to avoid confusion.
tools/deployment/presto-clp/scripts/init.py (3)
3-4: LGTM!The new imports are correctly added to support memory calculation (math) and system memory detection (psutil).
Also applies to: 9-9
232-246: LGTM!The refactored validation logic correctly uses the new
_get_required_env_varhelper to enforce explicit handling of missing/empty values, and properly constructs the discovery URI.
289-309: LGTM!The helper correctly enforces explicit validation for both missing and empty environment variable values, with appropriate error logging and exception handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tools/deployment/presto-clp/scripts/init.py (1)
251-287: Add error handling and cgroup-awareness to memory detection.The function has two critical issues:
No error handling: If
psutil.virtual_memory()fails, the function will crash. Wrap the call in a try-except block.Container limits not respected:
psutil.virtual_memory().totalreports host RAM, not container cgroup limits. In containerized Presto deployments, this can allocate more memory than the container has, causing OOM kills.Apply the solution from the previous review:
Add a helper function to read cgroup limits:
from pathlib import Path def _read_cgroup_memory_limit_bytes() -> Optional[int]: """Reads container memory limit from cgroup v2 or v1.""" # cgroup v2 v2 = Path("/sys/fs/cgroup/memory.max") # cgroup v1 v1 = Path("/sys/fs/cgroup/memory/memory.limit_in_bytes") for p in (v2, v1): try: with open(p, "r", encoding="utf-8") as f: raw = f.read().strip() if raw and raw.lower() != "max": val = int(raw) if val > 0: return val except FileNotFoundError: continue except Exception as exc: logger.debug("Ignoring cgroup memory read error from %s: %s", p, exc) return NoneThen update
_configure_worker_memory_env_vars:def _configure_worker_memory_env_vars(env_vars: Dict[str, str]) -> bool: """Adds memory-related environment variables for the worker.""" - total_memory_bytes = psutil.virtual_memory().total + try: + total_memory_bytes = int(psutil.virtual_memory().total) + except Exception: + logger.exception("Unable to determine total system memory") + return False + + # Respect container memory limits when present (cgroup v2/v1) + cgroup_limit_bytes = _read_cgroup_memory_limit_bytes() + if cgroup_limit_bytes and cgroup_limit_bytes > 0: + total_memory_bytes = min(total_memory_bytes, cgroup_limit_bytes) + total_memory_gb = total_memory_bytes / (1024**3)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tools/deployment/presto-clp/scripts/init.py(2 hunks)tools/deployment/presto-clp/worker.env(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tools/deployment/presto-clp/worker.env
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
tools/deployment/presto-clp/scripts/init.py (2)
3-3: LGTM!The new imports are properly utilized in the memory configuration logic.
Also applies to: 9-9
232-246: LGTM!The refactoring properly enforces explicit handling of missing coordinator config entries. The try-except pattern cleanly propagates validation failures by returning
False.
| def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str: | ||
| """ | ||
| Fetches a required entry from an environment variables dictionary. | ||
| :param config: | ||
| :param key: The key that must exist in `config` with a non-empty value. | ||
| :param env_file_path: | ||
| :return: The string value associated with `key`. | ||
| :raises KeyError: If `key` is missing. | ||
| """ | ||
|
|
||
| value = config.get(key) | ||
| if value is None: | ||
| logger.error("Missing required key %s in '%s'", key, env_file_path) | ||
| raise KeyError(key) | ||
|
|
||
| return value | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating empty strings to match the docstring.
The docstring states the function requires a "non-empty value," but the implementation only checks for None. Empty strings would pass validation but might be invalid for coordinator settings.
If empty strings should be rejected, apply this diff:
def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str:
"""
Fetches a required entry from an environment variables dictionary.
:param config:
:param key: The key that must exist in `config` with a non-empty value.
:param env_file_path:
:return: The string value associated with `key`.
:raises KeyError: If `key` is missing.
"""
value = config.get(key)
- if value is None:
+ if not value:
logger.error("Missing required key %s in '%s'", key, env_file_path)
raise KeyError(key)
return valueAlternatively, update the docstring to clarify that only None is checked:
- :param key: The key that must exist in `config` with a non-empty value.
+ :param key: The key that must exist in `config` with a non-None value.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str: | |
| """ | |
| Fetches a required entry from an environment variables dictionary. | |
| :param config: | |
| :param key: The key that must exist in `config` with a non-empty value. | |
| :param env_file_path: | |
| :return: The string value associated with `key`. | |
| :raises KeyError: If `key` is missing. | |
| """ | |
| value = config.get(key) | |
| if value is None: | |
| logger.error("Missing required key %s in '%s'", key, env_file_path) | |
| raise KeyError(key) | |
| return value | |
| def _get_required_env_var(config: Dict[str, Optional[str]], key: str, env_file_path: Path) -> str: | |
| """ | |
| Fetches a required entry from an environment variables dictionary. | |
| :param config: | |
| :param key: The key that must exist in `config` with a non-empty value. | |
| :param env_file_path: | |
| :return: The string value associated with `key`. | |
| :raises KeyError: If `key` is missing. | |
| """ | |
| value = config.get(key) | |
| if not value: | |
| logger.error("Missing required key %s in '%s'", key, env_file_path) | |
| raise KeyError(key) | |
| return value |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/init.py around lines 289 to 306, the
function claims to require a "non-empty value" but only checks for None; change
the guard to reject empty or whitespace-only strings (e.g., treat value is None
or value.strip() == "" as missing), keep the same logger.error and raise
KeyError(key) behavior, and return the value otherwise; if you instead prefer
only to disallow None, update the docstring to remove "non-empty" wording.
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68fcd8c77ea48327afeeddb4baf320f9
Summary by CodeRabbit
New Features
Configuration
Dependencies