Question for using QA datasets to evaluate retrieval

Hi all,

I have one question about using QA dataset for evaluating retrieval task.

Some QA datasets in MMTEB (e.g. [XPQARetrieval](https://huggingface.co/datasets/jinaai/xpqa), [WebFaqRetrieval](https://huggingface.co/datasets/PaDaS-Lab/webfaq-retrieval)) does not seem to evaluate retrieval well.

These are the examples for the two datasets:

- XPQARetrieval (original Korean query-positive, translated to English by gemini-2.5-pro)
```
Example 1)
Query: Is it new/unopened?
Document: No. It is a renewed product.

Example 2)
Query: Does it come with two batteries?
Document: No. Another buyer said, "I wish it came with two batteries."
```
- WebFaqRetrieval (original English query-positive)
```
Example 1)
Query: How do I turn on Walk Assist?
Document: Hold thw (-) button on display, until walking figure appears on display screen.

Example 2)
Query: Can I freeze this macaroni salad?
Document: No. The mayonnaise does not freeze well and will separate when frozen.
```

I think these datasets are query-positive set for evaluating QA tasks, not Retrieval tasks. 

What are your thoughts?
I kindly tag @KennethEnevoldsen, @Samoed.

Thank you.

- Youngjoon Jang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question for using QA datasets to evaluate retrieval #3083

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question for using QA datasets to evaluate retrieval #3083

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions