Skip to content

Benchmarks #4

@Kernel-Dirichlet

Description

@Kernel-Dirichlet

CoTARAG needs to determine what benchmarks are most appropriate to check agents against in order to have some objective notion of "progress". There are a few considerations we must keep in mind when doing so.

  1. Benchmarks are a moving goal post
  2. The modularity and complexity of AcceleRAG and the CoTAEngine make it difficult to know if a small tweak to the current implementation may result in significant "performance gains".

My current plan is to do the following

  1. Have users submit proposed benchmarks. If proposing an existing one, please provide a brief explanation for why it is appropriate for the goals of this repository.
  2. Conduct a poll to vote on proposed benchmarks.
  3. Add the most popular ones into the repository and include them in CI.
  4. Revisit benchmarks every 3 months, swap out old ones as needed.

If time permits, add in a public leaderboard of CoTARAG designed agents and compare them to other AI agents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions