Benchmarks

CoTARAG needs to determine what benchmarks are most appropriate to check agents against in order to have some objective notion of "progress". There are a few considerations we must keep in mind when doing so. 

1) Benchmarks are a moving goal post
2) The modularity and complexity of AcceleRAG and the CoTAEngine make it difficult to know if a small tweak to the current implementation may result in significant "performance gains". 


My current plan is to do the following

1) Have users submit proposed benchmarks. If proposing an existing one, please provide a brief explanation for why it is appropriate for the goals of this repository. 
2) Conduct a poll to vote on proposed benchmarks.
3) Add the most popular ones into the repository and include them in CI. 
4) Revisit benchmarks every 3 months, swap out old ones as needed. 

If time permits, add in a public leaderboard of CoTARAG designed agents and compare them to other AI agents. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarks #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmarks #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions