feat: gepa algorithm by lspinheiro · Pull Request #502 · microsoft/agent-lightning

lspinheiro · 2026-03-10T07:03:12Z

Description

Agent Lightning already supports weight-level training (VERL) and beam-search prompt optimization (APO). This PR adds GEPA — an evolutionary prompt optimizer that fills an important gap: fast, inference-only prompt improvement that tracks per-example performance via a Pareto frontier.

Unlike APO's beam search, GEPA evolves prompt candidates through reflective mutations: it examines execution traces, identifies where prompts fall short on specific examples, and proposes targeted improvements. This makes it particularly effective for tasks with diverse failure modes, where a single "best prompt" metric can hide regressions on individual cases.

What's included

Algorithm integration (agentlightning/algorithm/gepa/) — GEPA plugs into the standard Trainer workflow just like APO and VERL. It bridges GEPA's synchronous
optimizer with AGL's async store, converts between AGL resources and GEPA candidates, and supports W&B experiment tracking out of the box.
Room-booking example (examples/gepa/) — a complete walkthrough: a tool-calling agent picks meeting rooms, an LLM judge scores the choices, and GEPA
optimizes the prompt across 57 scenarios. The example now supports Azure Entra ID, Azure API key, and plain OpenAI as backends (--provider flag or LLM_PROVIDER
env var), so contributors without Azure access can run it with just an OpenAI key.
Test suite (tests/algorithm/gepa/) — covers config, resource codec, adapter, interface lifecycle, and callbacks.

How GEPA compares to the existing algorithms

GEPA is a good fit when you want to improve prompts without touching model weights, and you care about not regressing on specific inputs while improving overall performance.

Example W&B logs

agent-lightning/examples/gepa$ uv run python room_selector_gepa.py --wandb
GEPA Optimization:  86%|███████████████████████████████████████████████████████████████████████████████████▍             | 215/250 [08:58<01:27,  2.51s/rollouts]
wandb: 
wandb: Run history:
wandb:      agl/best_candidate_valset_score ▁▁▁██████
wandb:                   agl/num_candidates ▁▁▁▃▃▃▃▆█
wandb:                 agl/pareto_front_agg ▁▁▁▆▆▆▆▇█
wandb:               agl/total_metric_calls ▁▂▂▄▄▅▅▇█
wandb:       base_program_full_valset_score ▁
wandb:            base_program_val_coverage ▁
wandb: best_program_as_per_agg_score_valset ▁▁▁
wandb:                 best_score_on_valset ▁▁▁
wandb:                best_valset_agg_score ▁▁▁
wandb:                            iteration ▁▂▃▃▄▅▆▆▇█
wandb:                                  +10 ...
wandb: 
wandb: Run summary:
wandb:      agl/best_candidate_valset_score 0.39655
wandb:                   agl/num_candidates 4
wandb:                 agl/pareto_front_agg 0.55172
wandb:                agl/proposal_accepted True
wandb:               agl/total_metric_calls 260
wandb:       base_program_full_valset_score 0.17759
wandb:            base_program_val_coverage 29
wandb: best_program_as_per_agg_score_valset 1
wandb:                 best_score_on_valset 0.39655
wandb:                best_valset_agg_score 0.39655
wandb:                                  +12 ...
wandb:

lspinheiro added 8 commits March 5, 2026 13:47

implement gepa algorithm

2f5c34f

initial gepa example

23179ba

gepa tests

6a98cf2

gepa packaging

f6b6d06

gepa linting and test

35ea885

gepa example openai extension

5551657

typecheck

09b8ac9

fix tests

626c98c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gepa algorithm#502

feat: gepa algorithm#502
lspinheiro wants to merge 8 commits intomicrosoft:mainfrom
lspinheiro:feat/lpinheiro/gepa-algorithm

lspinheiro commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lspinheiro commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What's included

How GEPA compares to the existing algorithms

Example W&B logs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lspinheiro commented Mar 10, 2026 •

edited

Loading