Skip to content

Conversation

@0xCUB3
Copy link

@0xCUB3 0xCUB3 commented Nov 7, 2025

Converts the imperative legacy verifiers into declarative mellea Requirements.

Precursor

The feat/llm-sandbox-execution branch that I already pushed to the main mellea repo provides the safe/unsafe execution structure with llm-sandbox.

Implementation

7 auto-fixing requirements:

  • python_files_accessible - Creates missing data/image/text files automatically
  • python_imports_resolved - Adds missing import statements with nickname support
  • python_columns_accessible - Adds missing DataFrame columns with dummy data
  • python_code_formatted - Fixes indentation and formatting with autopep8
  • python_packages_installed - Auto-installs missing packages with correct mapping
  • python_paths_fixed - Fixes file path issues (./ prefixes, missing data/)
  • python_auto_fix - Iterative pipeline combining all fixes

Three utility modules support these requirements:

  • data_generators.py - Random data generation for all data types
  • file_utils.py - File type predicates / I/O operations
  • metadata_utils.py - Directory structure conversion and reconstruction

API

# Mellea execution (re-exported)
from mellea_contribs.reqlib import python_executable, python_executable_sandbox, python_syntax_valid

# Auto-fixing requirements
from mellea_contribs.reqlib import (
    python_files_accessible, python_imports_resolved, python_columns_accessible,
    python_code_formatted, python_packages_installed, python_paths_fixed, python_auto_fix
)

# Usage
auto_req = python_auto_fix(max_iterations=5, use_sandbox=True)

Testing

  • Added 18 tests
  • Tested on ~50 Mellea-generated snippets of varying length and complexity

Replaces imperative auto-fixing script with declarative Requirements pattern.
Handles all 5 error types from original with zero code duplication.

Features:
- Auto-creates missing files, imports, DataFrame columns
- Fixes code formatting and installs packages
- Iterative pipeline until code executes successfully
- Full test coverage with 18 tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant