gepa-artifact/CITATION.cff at main · bespokelabsai/gepa-artifact · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  GEPA: Reflective Prompt Evolution Can Outperform
  Reinforcement Learning
message: >-
  If you use this work, please cite it using the metadata
  from this file.
type: software
authors:
  - given-names: Lakshya A
    family-names: Agrawal
  - given-names: Shangyin
    family-names: Tan
  - given-names: Dilara
    family-names: Soylu
  - given-names: Noah
    family-names: Ziems
  - given-names: Rishi
    family-names: Khare
  - given-names: Krista
    family-names: Opsahl-Ong
  - given-names: Arnav
    family-names: Singhvi
  - given-names: Herumb
    family-names: Shandilya
  - given-names: Michael J
    family-names: Ryan
  - given-names: Meng
    family-names: Jiang
  - given-names: Christopher
    family-names: Potts
  - given-names: Koushik
    family-names: Sen
  - given-names: Alexandros G.
    family-names: Dimakis
  - given-names: Ion
    family-names: Stoica
  - given-names: Dan
    family-names: Klein
  - given-names: Matei
    family-names: Zaharia
  - given-names: Omar
    family-names: Khattab
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2507.19457'
abstract: >-
  Large language models (LLMs) are increasingly adapted to
  downstream tasks via reinforcement learning (RL) methods
  like Group Relative Policy Optimization (GRPO), which
  often require thousands of rollouts to learn new tasks. We
  argue that the interpretable nature of language can often
  provide a much richer learning medium for LLMs, compared
  with policy gradients derived from sparse, scalar rewards.
  To test this, we introduce GEPA (Genetic-Pareto), a prompt
  optimizer that thoroughly incorporates natural language
  reflection to learn high-level rules from trial and error.
  Given any AI system containing one or more LLM prompts,
  GEPA samples system-level trajectories (e.g., reasoning,
  tool calls, and tool outputs) and reflects on them in
  natural language to diagnose problems, propose and test
  prompt updates, and combine complementary lessons from the
  Pareto frontier of its own attempts. As a result of GEPA's
  design, it can often turn even just a few rollouts into a
  large quality gain. Across four tasks, GEPA outperforms
  GRPO by 10% on average and by up to 20%, while using up to
  35x fewer rollouts. GEPA also outperforms the leading
  prompt optimizer, MIPROv2, by over 10% across two LLMs,
  and demonstrates promising results as an inference-time
  search strategy for code optimization.
keywords:
  - language models
  - prompt optimization
  - reinforcement learning
  - LLMs
  - natural language processing
license: MIT