Skip to content

feat: add DeepEval + E2B sandbox SWE evaluation pipeline example#2430

Open
Ayush7614 wants to merge 1 commit intoconfident-ai:mainfrom
Ayush7614:ayush1
Open

feat: add DeepEval + E2B sandbox SWE evaluation pipeline example#2430
Ayush7614 wants to merge 1 commit intoconfident-ai:mainfrom
Ayush7614:ayush1

Conversation

@Ayush7614
Copy link

What does this PR do?

This PR adds a complete end-to-end example demonstrating how to evaluate LLM-generated code using:

  • OpenAI for code & unit test generation
  • E2B sandbox for secure code execution
  • DeepEval (GEval) for SWE-style evaluation

It shows a full pipeline:
Prompt → LLM → Sandbox Execution → Unit Tests → DeepEval Metrics

What's included?

  • test_swe_pipeline.py

    • Generates Python code using LLM
    • Executes it safely inside E2B sandbox
    • Generates unit tests via LLM
    • Re-runs tests in sandbox
    • Evaluates outputs using DeepEval GEval
  • tasks.json

    • List of SWE-style tasks and expected outputs
  • requirements.txt

    • Minimal dependencies required to run the pipeline
  • README.md

    • Setup instructions
    • API key configuration (OpenAI + E2B)
    • Run commands and explanation of workflow

Why this is useful?

This example helps users:

  • Understand how to combine DeepEval with real execution
  • Validate LLM code generation safely
  • Add CI-style evaluation for agent / code workflows
  • Use DeepEval beyond simple text evaluation

@vercel
Copy link

vercel bot commented Jan 12, 2026

@Ayush7614 is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 12, 2026

PR author is not in the allowed authors list.

@Ayush7614
Copy link
Author

cc: @penguine-ip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant