Skip to content

Commit c05c94b

Browse files
Update README.md
1 parent 6154749 commit c05c94b

File tree

1 file changed

+0
-6
lines changed

1 file changed

+0
-6
lines changed

README.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -207,12 +207,6 @@ The following plot from the independent [Research Engineering Benchmark (RE-Benc
207207
<img src="https://github.com/user-attachments/assets/ff0e471d-2f50-4e2d-b718-874862f533df" alt="RE-Bench Performance Across Time" width="60%"/>
208208
</p>
209209

210-
<div align="center">
211-
212-
*(Source: METR RE-Bench, AIDE (o1-preview) vs. Human Expert Percentiles)*
213-
214-
</div>
215-
216210
As shown, AIDE demonstrates strong performance gains over time, surpassing lower human expert percentiles within hours and continuing to improve. This highlights the potential of evaluation-driven optimization but also indicates that reaching high levels of performance comparable to human experts on difficult benchmarks can take considerable time (tens of hours in this specific benchmark, corresponding to many `--steps` in the Weco CLI). Factor this into your planning when setting the number of `--steps` for your optimization runs.
217211

218212
---

0 commit comments

Comments
 (0)