fix: Sanitize environment variable injection in script execution#153
fix: Sanitize environment variable injection in script execution#153
Conversation
|
Cursor Agent can help with this pull request. Just |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 1049.3s
🟡 4 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-977fad02
b2b5786 to
17bde6f
Compare
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 184.1s
🟡 1 warnings, 💡 3 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-9c9f297f
| if not ENV_VAR_NAME_PATTERN.match(sanitized): | ||
| return None | ||
|
|
||
| return sanitized |
There was a problem hiding this comment.
💡 Value sanitization allows control characters that could enable log injection
Only null bytes are removed from values; carriage return (\r) and other control characters could be used for terminal/log injection attacks when values are logged or displayed.
Suggested fix:
Consider stripping additional control characters (ASCII 0-31 except \n and \t) or at minimum \r to prevent log injection.
17bde6f to
6c32f27
Compare
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 93% | Review time: 135.9s
🟡 2 warnings, 💡 2 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-bf2a88e4
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 83% | Review time: 247.3s
🟡 1 warnings, 💡 4 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-2d54fa30
|
|
||
| # Convert to uppercase and replace common special characters | ||
| sanitized = name.upper().replace("-", "_").replace(".", "_") | ||
|
|
There was a problem hiding this comment.
🟡 Potential key collision causing silent data loss
Different input keys can sanitize to the same env var name (e.g., 'my-var' and 'my.var' both become 'MY_VAR'), causing the second value to silently overwrite the first.
Suggested fix:
Track seen sanitized names and either warn on collision, skip duplicates with a warning, or use a deterministic tie-breaker.
| # An attacker might try to inject a null byte followed by a command | ||
| malicious = "legitimate_value\x00; rm -rf /" | ||
| result = _sanitize_env_var_value(malicious) | ||
| assert result == "legitimate_value; rm -rf /" |
There was a problem hiding this comment.
💡 Misleading test name implies shell commands are sanitized
The test 'test_injection_attempt_with_null_and_command' and its docstring suggest the command is sanitized, but only the null byte is removed—the ; rm -rf / remains in the value, which could give false confidence about the function's protection scope.
Suggested fix:
Rename to `test_null_byte_removed_from_value` and update docstring to clarify that only null bytes are stripped, not shell metacharacters (which is intentional design).
|
|
||
| # Convert to uppercase and replace common special characters | ||
| sanitized = name.upper().replace("-", "_").replace(".", "_") | ||
|
|
There was a problem hiding this comment.
💡 Key collision allows value overwriting
Different input names like my-var and my.var both map to MY_VAR, allowing an attacker who controls multiple dynamic_values keys to overwrite intended values with malicious ones.
Suggested fix:
Consider logging a warning when a sanitized key already exists in the environment, or maintain a set of seen sanitized keys to detect and skip collisions.
| from merobox.commands.bootstrap.steps.base import BaseStep | ||
| from merobox.commands.utils import console | ||
|
|
||
| # Constants for environment variable sanitization |
There was a problem hiding this comment.
💡 Document rationale for limit constants
The magic numbers 256 and 131072 would benefit from brief comments explaining why these values were chosen (e.g., POSIX limits, security thresholds).
Suggested fix:
Add inline comments like `# POSIX typically allows up to 32KB, 256 is conservative` or similar rationale.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
…njection--ml7pwci8
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 293.8s
🟡 2 warnings, 💡 3 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-82b7b45b
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
…njection--ml7pwci8
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 95% | Review time: 226.1s
🟡 1 warnings, 💡 4 suggestions, 📝 2 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-e5b57d6d
| CONTROL_CHARS_PATTERN = re.compile(r"[\x00-\x08\x0b-\x1f]") | ||
|
|
||
| # Blocklist of sensitive environment variable names that should not be overwritten | ||
| # These could be used for privilege escalation or code execution if modified |
There was a problem hiding this comment.
🟡 Missing BASH_ENV and other shell initialization variables from blocklist
BASH_ENV is executed by bash in non-interactive mode before running commands, making it a privilege escalation vector; also missing ENV (used by sh/dash), CDPATH, SHELLOPTS, and BASH_OPTS.
Suggested fix:
Add 'BASH_ENV', 'ENV', 'CDPATH', 'SHELLOPTS', 'BASH_OPTS', 'POSIXLY_CORRECT' to SENSITIVE_ENV_VARS.
| CONTROL_CHARS_PATTERN = re.compile(r"[\x00-\x08\x0b-\x1f]") | ||
|
|
||
| # Blocklist of sensitive environment variable names that should not be overwritten | ||
| # These could be used for privilege escalation or code execution if modified |
There was a problem hiding this comment.
💡 Consider blocking all LD_ and DYLD_ prefixes rather than specific variables**
Only specific LD_* variables are blocked but many others exist (LD_BIND_NOW, LD_AOUT_PRELOAD, LD_WARN, etc.); new ones may be added to linkers over time.
Suggested fix:
Add a prefix check in _sanitize_env_var_name to reject any name starting with 'LD_' or 'DYLD_' instead of maintaining a specific list.
| @@ -263,11 +405,20 @@ async def _execute_local( | |||
| # This allows scripts to access workflow variables via $VAR_NAME | |||
There was a problem hiding this comment.
💡 Silent overwrites when different keys sanitize to the same name
If two input keys like 'my-var' and 'my.var' both sanitize to 'MY_VAR', the second silently overwrites the first, which could cause subtle bugs.
Suggested fix:
Track seen sanitized names and log a warning when a collision is detected before overwriting.
| @@ -0,0 +1,314 @@ | |||
| """ | |||
There was a problem hiding this comment.
💡 Missing test for name collision scenario
No test verifies behavior when multiple input keys sanitize to the same environment variable name.
Suggested fix:
Add an integration test that passes dynamic_values like `{'my-var': 'a', 'my.var': 'b'}` and verifies expected behavior (either last-wins or detection).
| CONTROL_CHARS_PATTERN = re.compile(r"[\x00-\x08\x0b-\x1f]") | ||
|
|
||
| # Blocklist of sensitive environment variable names that should not be overwritten | ||
| # These could be used for privilege escalation or code execution if modified |
There was a problem hiding this comment.
💡 Missing language-specific path variables from blocklist
NODE_PATH, PERL5LIB, RUBYLIB, GEM_PATH, and similar variables can be used to load attacker-controlled code in polyglot environments.
Suggested fix:
Add NODE_PATH, NODE_OPTIONS, PERL5LIB, PERL5OPT, RUBYLIB, RUBYOPT, GEM_PATH, GEM_HOME to SENSITIVE_ENV_VARS.
| should properly quote variables (e.g., "$VAR" instead of $VAR) to prevent | ||
| command injection. | ||
| """ | ||
| if value is None: |
There was a problem hiding this comment.
📝 Nit: Redundant null byte removal
The explicit .replace("\x00", "") is redundant since CONTROL_CHARS_PATTERN already includes \x00 in the range \x00-\x08.
Suggested fix:
Remove line 140-141 (`str_value = str_value.replace("\x00", "")`) since the regex on line 143-144 will handle null bytes.
| "LD_PROFILE", | ||
| "TMPDIR", | ||
| "TMP", | ||
| "TEMP", |
There was a problem hiding this comment.
📝 Nit: Consider using a NamedTuple for the return type
The tuple return (sanitized_name, rejection_reason) could be more self-documenting as a NamedTuple like SanitizeResult.
Suggested fix:
Define `SanitizeResult = NamedTuple('SanitizeResult', [('name', str | None), ('rejection_reason', str | None)])` for clearer API.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| "TMP", | ||
| "TEMP", | ||
| } | ||
| ) |
There was a problem hiding this comment.
Blocklist missing BASH_ENV allows arbitrary code execution
High Severity
The SENSITIVE_ENV_VARS blocklist is missing BASH_ENV, which is a critical variable for this context. When bash is invoked non-interactively (as with subprocess.run(["/bin/sh", ...], env=env)), it reads and executes the file specified by BASH_ENV before running the script. On systems where /bin/sh is bash (RHEL, CentOS, Fedora, older macOS), an attacker who controls a dynamic value keyed bash_env could point BASH_ENV to an attacker-controlled file and achieve arbitrary code execution, completely bypassing the sanitization this PR intends to provide.
Additional Locations (1)
…njection--ml7pwci8
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 94% | Review time: 203.9s
🟡 2 warnings, 💡 3 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-ed0600c3
| CONTROL_CHARS_PATTERN = re.compile(r"[\x00-\x08\x0b-\x1f]") | ||
|
|
||
| # Blocklist of sensitive environment variable names that should not be overwritten | ||
| # These could be used for privilege escalation or code execution if modified |
There was a problem hiding this comment.
🟡 Missing critical code execution variables from blocklist
BASH_ENV, ENV, and PROMPT_COMMAND are not blocked but can trigger arbitrary code execution when bash/sh starts or displays a prompt.
Suggested fix:
Add `BASH_ENV`, `ENV`, `PROMPT_COMMAND`, `SHELLOPTS`, `BASHOPTS`, and `PS4` to SENSITIVE_ENV_VARS.
| @@ -263,11 +405,20 @@ async def _execute_local( | |||
| # This allows scripts to access workflow variables via $VAR_NAME | |||
There was a problem hiding this comment.
🟡 Name collisions can silently overwrite earlier values
Different input keys can sanitize to the same env var name (e.g., 'my-var', 'my.var', 'my_var' all become 'MY_VAR'), causing silent data loss when the second overwrites the first.
Suggested fix:
Track seen sanitized names and either skip duplicates with a warning, or suffix with a counter to avoid collision.
| CONTROL_CHARS_PATTERN = re.compile(r"[\x00-\x08\x0b-\x1f]") | ||
|
|
||
| # Blocklist of sensitive environment variable names that should not be overwritten | ||
| # These could be used for privilege escalation or code execution if modified |
There was a problem hiding this comment.
💡 Consider blocking language-specific module path variables
NODE_OPTIONS, NODE_PATH, PERL5LIB, RUBYLIB, and GEM_PATH can be exploited to load malicious code in their respective runtimes.
Suggested fix:
Add these variables to SENSITIVE_ENV_VARS if Node.js, Perl, or Ruby scripts may be executed.
| # Convert to string | ||
| str_value = str(value) | ||
|
|
||
| # Remove null bytes which can cause issues |
There was a problem hiding this comment.
💡 Silent truncation of large values without logging
Values exceeding MAX_ENV_VAR_VALUE_LENGTH are silently truncated, which could cause hard-to-debug issues when legitimate large values are cut off.
Suggested fix:
Add a warning log when truncation occurs, similar to how rejected names are logged.
| # Ensure it doesn't start with a digit | ||
| if sanitized and sanitized[0].isdigit(): | ||
| sanitized = "_" + sanitized | ||
|
|
There was a problem hiding this comment.
💡 Silent truncation of long names may cause debugging confusion
Long variable names are silently truncated without any warning; users may not realize their variable was shortened, leading to unexpected mismatches.
Suggested fix:
Consider logging a warning (similar to the rejection case) when truncation occurs, so users can identify potential issues.
| may be needed for legitimate use cases. Scripts consuming these values | ||
| should properly quote variables (e.g., "$VAR" instead of $VAR) to prevent | ||
| command injection. | ||
| """ |
There was a problem hiding this comment.
📝 Nit: Redundant null byte removal before regex
The explicit replace("\x00", "") is redundant since CONTROL_CHARS_PATTERN already matches null bytes in the [\x00-\x08] range.
Suggested fix:
Remove the explicit null byte replacement and rely solely on the regex pattern.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| "TMP", | ||
| "TEMP", | ||
| } | ||
| ) |
There was a problem hiding this comment.
Sensitive env var blocklist missing BASH_ENV and ENV
Medium Severity
The SENSITIVE_ENV_VARS blocklist is missing BASH_ENV and ENV, which are well-known code execution vectors. BASH_ENV causes bash to execute the specified file when invoked non-interactively, and ENV serves a similar role for POSIX shells. While the direct /bin/sh invocation may not honor BASH_ENV in POSIX mode, any sub-scripts or commands that invoke bash directly would inherit and respect it, enabling arbitrary code execution. These are critical omissions for a blocklist explicitly designed to prevent "privilege escalation or code execution."


Sanitize environment variable names and values in script execution to prevent injection attacks.
Dynamic workflow values were previously injected as environment variables with only basic character replacement, posing a risk for environment variable injection. This PR introduces dedicated sanitization functions for both variable names and values, enforcing valid characters, length limits, and removing potentially malicious content like null bytes. New unit tests verify the effectiveness of these sanitization rules.
Note
Medium Risk
Touches script execution by changing how workflow
dynamic_valuesbecome environment variables; could break scripts that relied on unsanitized names/overwriting sensitive vars, but changes are localized and covered by unit tests.Overview
Hardens
ScriptSteplocal script execution by sanitizingdynamic_valuesbefore exporting them as environment variables, including strict name normalization/validation, a protected blocklist (e.g.,PATH,LD_PRELOAD, proxy vars), and value cleaning/truncation to strip null/control characters.Adds comprehensive unit tests covering name/value sanitization behavior, rejection reasons, sensitive-var blocking, and realistic workflow-variable patterns.
Written by Cursor Bugbot for commit 9ffc7a9. This will update automatically on new commits. Configure here.