forked from gepa-ai/gepa-artifact
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCITATION.cff
More file actions
80 lines (79 loc) · 2.66 KB
/
CITATION.cff
File metadata and controls
80 lines (79 loc) · 2.66 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
GEPA: Reflective Prompt Evolution Can Outperform
Reinforcement Learning
message: >-
If you use this work, please cite it using the metadata
from this file.
type: software
authors:
- given-names: Lakshya A
family-names: Agrawal
- given-names: Shangyin
family-names: Tan
- given-names: Dilara
family-names: Soylu
- given-names: Noah
family-names: Ziems
- given-names: Rishi
family-names: Khare
- given-names: Krista
family-names: Opsahl-Ong
- given-names: Arnav
family-names: Singhvi
- given-names: Herumb
family-names: Shandilya
- given-names: Michael J
family-names: Ryan
- given-names: Meng
family-names: Jiang
- given-names: Christopher
family-names: Potts
- given-names: Koushik
family-names: Sen
- given-names: Alexandros G.
family-names: Dimakis
- given-names: Ion
family-names: Stoica
- given-names: Dan
family-names: Klein
- given-names: Matei
family-names: Zaharia
- given-names: Omar
family-names: Khattab
identifiers:
- type: url
value: 'https://arxiv.org/abs/2507.19457'
abstract: >-
Large language models (LLMs) are increasingly adapted to
downstream tasks via reinforcement learning (RL) methods
like Group Relative Policy Optimization (GRPO), which
often require thousands of rollouts to learn new tasks. We
argue that the interpretable nature of language can often
provide a much richer learning medium for LLMs, compared
with policy gradients derived from sparse, scalar rewards.
To test this, we introduce GEPA (Genetic-Pareto), a prompt
optimizer that thoroughly incorporates natural language
reflection to learn high-level rules from trial and error.
Given any AI system containing one or more LLM prompts,
GEPA samples system-level trajectories (e.g., reasoning,
tool calls, and tool outputs) and reflects on them in
natural language to diagnose problems, propose and test
prompt updates, and combine complementary lessons from the
Pareto frontier of its own attempts. As a result of GEPA's
design, it can often turn even just a few rollouts into a
large quality gain. Across four tasks, GEPA outperforms
GRPO by 10% on average and by up to 20%, while using up to
35x fewer rollouts. GEPA also outperforms the leading
prompt optimizer, MIPROv2, by over 10% across two LLMs,
and demonstrates promising results as an inference-time
search strategy for code optimization.
keywords:
- language models
- prompt optimization
- reinforcement learning
- LLMs
- natural language processing
license: MIT