Eval Framework? #20

stobias123 · 2024-08-10T19:12:08Z

stobias123
Aug 10, 2024

Doing some tinkering and absolutely love it so far. Trying to learn more and I'm curious if you guys have any docs on how you evaluate new functions / agents.

danielcampagnolitg · 2024-08-12T08:34:47Z

danielcampagnolitg
Aug 12, 2024
Maintainer

I have been building the GAIA and SWE-lite benchmark runners to evaluate the agents overall ability. As for individual prompts, they just been manually tweaked so far from observations. Evaluation and meta-prompting is an area I'm looking into, DSPy etc to get a feel of what exists and what would be most suitable for integrating/building. I have a few ideas I'd like to play with so building evaluation datasets will be an important part. In SWE-bench there is an oracle dataset which has the files that need to be edited, so that gives a dataset for evaluating functionality in the selectFilesToEdit.ts file

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval Framework? #20

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Eval Framework? #20

Uh oh!

stobias123 Aug 10, 2024

Replies: 1 comment

Uh oh!

danielcampagnolitg Aug 12, 2024 Maintainer

stobias123
Aug 10, 2024

danielcampagnolitg
Aug 12, 2024
Maintainer