feat: Add secure Python code execution with llm-sandbox support #217

0xCUB3 · 2025-10-25T00:15:46Z

Flexible Python code execution validation system with three execution modes:

Safe mode (default): Syntax validation only
Unsafe execution: Direct subprocess execution with warnings
Sandbox execution: Secure Docker-based execution via llm-sandbox

Elements

Abstract backend architecture with pluggable execution strategies
Import restrictions as an add'l security layer (AST-based analysis)
Configurable timeouts for both unsafe and sandbox execution
Safety warnings when using unsafe execution mode
Fallbacks when llm-sandbox is not available

API

# Safe mode (default) - validation only
req = PythonExecutesWithoutError()

# Unsafe execution with warning
req = PythonExecutesWithoutError(allow_unsafe_execution=True, timeout=10)

# Secure sandbox execution
req = PythonExecutesWithoutError(use_sandbox=True, timeout=10)

# With import restrictions
req = PythonExecutesWithoutError(
    use_sandbox=True,
    allowed_imports=["os", "sys", "json"],
    timeout=10
)

Dependencies

Adds llm-sandbox[docker]
Requires Docker for sandbox functionality

Testing

# Run core tests (no Docker required)
python -m pytest test/stdlib_basics/test_reqlib_python.py -k "not sandbox"

# Run all tests including sandbox (requires Docker)
python -m pytest test/stdlib_basics/test_reqlib_python.py

No breaking changes. Existing code using PythonExecutesWithoutError() continues to work with safe validation mode.

TODO: Documentation, perhaps more rigorous testing on larger code chunks

- Add PythonExecutesWithoutError requirement with three execution backends: - SafeBackend: Validates syntax and imports without execution (default) - UnsafeBackend: Direct subprocess execution with warnings - LLMSandboxBackend: Docker-based execution using llm-sandbox - Implement allow_unsafe_execution flag with explicit opt-in and warnings - Add import restriction support for defense-in-depth security - Support use_sandbox flag for secure Docker-based execution - Include comprehensive test suite with 21 test cases - Maintain backward compatibility while defaulting to safe mode - Add llm-sandbox[docker] dependency for optional sandbox functionality

Improves code formatting and readability in python.py by splitting long lines, adding whitespace, and updating argument formatting. Also updates test import order in test_reqlib_python.py for consistency.

mergify · 2025-10-25T00:16:20Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

0xCUB3 added 2 commits October 24, 2025 14:03

Refactor Python execution backends and formatting

c38e456

Improves code formatting and readability in python.py by splitting long lines, adding whitespace, and updating argument formatting. Also updates test import order in test_reqlib_python.py for consistency.

nrfulton self-requested a review October 29, 2025 23:05

0xCUB3 mentioned this pull request Nov 7, 2025

feat: Convert legacy verifiers to mellea reqlib generative-computing/mellea-contribs#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add secure Python code execution with llm-sandbox support #217

feat: Add secure Python code execution with llm-sandbox support #217

0xCUB3 commented Oct 25, 2025

Uh oh!

mergify bot commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add secure Python code execution with llm-sandbox support #217

Are you sure you want to change the base?

feat: Add secure Python code execution with llm-sandbox support #217

Conversation

0xCUB3 commented Oct 25, 2025

Elements

API

Dependencies

Testing

Uh oh!

mergify bot commented Oct 25, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant