Skip to content

论文勘误 & Gaia实验细节讨论 #226

@Rooooyy

Description

@Rooooyy
Image

这里的描述非常confusing, GAIA全集是466 questions,但是咱们测的是text-only,应该描述text-only的数量。实验使用的是validation还是test?考虑到可以筛出text-only那默认是validation了? 另,看起来text-only只报了Youtu-Agent的结果,没有和其他模型对比,很难support摘要的这句“state-of-the-art performance on WebWalkerQA (71.47%) and GAIA (72.8%) using open-weight models”

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions