-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Hi,
In the Tevatron 2.0 paper (arXiv:2505.02466), the new unified data format was introduced as below:
query: { "query_id": "<query id>", "query_text": "<query text>", "query_image": "<query image>", "query_video": "<path to video>", "query_audio": "<path to audio>" "positive_document_ids": ["<document id>", ...], "negative_document_ids": ["<document id>", ...], } corpus: { "docid": "<document id>", "document_text": "<document text>", "document_image": "<document image>", "document_video": "<document video path>", "document_audio": "<document audio path>", }
I'm wondering if this new format is already supported in the current main branch.
I checked the documentation, but it doesn’t seem to mention the updated format. Could you clarify its current status?
Thanks!