-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
CoTARAG needs to determine what benchmarks are most appropriate to check agents against in order to have some objective notion of "progress". There are a few considerations we must keep in mind when doing so.
- Benchmarks are a moving goal post
- The modularity and complexity of AcceleRAG and the CoTAEngine make it difficult to know if a small tweak to the current implementation may result in significant "performance gains".
My current plan is to do the following
- Have users submit proposed benchmarks. If proposing an existing one, please provide a brief explanation for why it is appropriate for the goals of this repository.
- Conduct a poll to vote on proposed benchmarks.
- Add the most popular ones into the repository and include them in CI.
- Revisit benchmarks every 3 months, swap out old ones as needed.
If time permits, add in a public leaderboard of CoTARAG designed agents and compare them to other AI agents.
Metadata
Metadata
Assignees
Labels
No labels