feat: add DeepEval + E2B sandbox SWE evaluation pipeline example by Ayush7614 · Pull Request #2430 · confident-ai/deepeval

Ayush7614 · 2026-01-12T13:08:24Z

What does this PR do?

This PR adds a complete end-to-end example demonstrating how to evaluate LLM-generated code using:

OpenAI for code & unit test generation
E2B sandbox for secure code execution
DeepEval (GEval) for SWE-style evaluation

It shows a full pipeline:
Prompt → LLM → Sandbox Execution → Unit Tests → DeepEval Metrics

What's included?

test_swe_pipeline.py
- Generates Python code using LLM
- Executes it safely inside E2B sandbox
- Generates unit tests via LLM
- Re-runs tests in sandbox
- Evaluates outputs using DeepEval GEval
tasks.json
- List of SWE-style tasks and expected outputs
requirements.txt
- Minimal dependencies required to run the pipeline
README.md
- Setup instructions
- API key configuration (OpenAI + E2B)
- Run commands and explanation of workflow

Why this is useful?

This example helps users:

Understand how to combine DeepEval with real execution
Validate LLM code generation safely
Add CI-style evaluation for agent / code workflows
Use DeepEval beyond simple text evaluation

vercel · 2026-01-12T13:08:29Z

@Ayush7614 is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

greptile-apps · 2026-01-12T13:08:42Z

PR author is not in the allowed authors list.

Ayush7614 · 2026-01-12T13:12:35Z

cc: @penguine-ip

feat: add DeepEval + E2B sandbox SWE evaluation pipeline

b5ba699

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DeepEval + E2B sandbox SWE evaluation pipeline example#2430

feat: add DeepEval + E2B sandbox SWE evaluation pipeline example#2430
Ayush7614 wants to merge 1 commit intoconfident-ai:mainfrom
Ayush7614:ayush1

Ayush7614 commented Jan 12, 2026

Uh oh!

vercel bot commented Jan 12, 2026

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

Ayush7614 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ayush7614 commented Jan 12, 2026

What does this PR do?

What's included?

Why this is useful?

Uh oh!

vercel bot commented Jan 12, 2026

Uh oh!

greptile-apps bot commented Jan 12, 2026

Uh oh!

Ayush7614 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant