Skip to content

Conversation

@roipony
Copy link
Contributor

@roipony roipony commented Sep 29, 2025

  • I have outlined why this dataset is filling an existing gap in mteb

  • I have tested that the dataset runs with the mteb package.

  • I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.

    • ibm-granite/granite-vision-3.3-2b-embedding
    • jinaai/jina-embeddings-v4
  • I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).

  • I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)

@Samoed Samoed requested a review from isaac-chung September 29, 2025 20:05
@Samoed Samoed changed the title Add REAL_MM_RAG benchmark dataset: Add REAL_MM_RAG benchmark Sep 29, 2025
@Samoed Samoed added the new benchmark Issues related to adding a new benchmark label Sep 29, 2025
@Samoed
Copy link
Member

Samoed commented Sep 29, 2025

can you run make format-citations?

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi great to see a PR and congratulations on the paper release!

I think the main thing that is missing at the moment is documentation I have put a few pointers below.

Note: will be influenced by #3222 (if merged we can move it down to the retrieval section)

"RealMMRagTechSlidesRetrieval",
],
),
description="Realistic and multi-modal document retrieval benchmark.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is too short. Why should I prefer this over another VDR benchmark

class RealMMRagFinReportRetrieval(AbsTaskAny2AnyRetrieval):
metadata = TaskMetadata(
name="RealMMRagFinReportRetrieval",
description="Retrieve associated pages according to questions.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is too vague - It should be clear from the description what queries and corpus it contains, as well as the retrieval goal. Please fix this for all tasks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Samoed should we snake-case the filename (easier to merge with v2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that will be better

"MIEB(Img)",
"VisualDocumentRetrieval",
"JinaVDR",
"REAL_MM_RAG"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us not add it to a benchmark yet due to #3222 (this means that we can merge this without caring about the other PR and once both is merge then we can add both)

@Samoed Samoed added the image The image extension of MTEB label Oct 19, 2025
@roipony
Copy link
Contributor Author

roipony commented Nov 2, 2025

@KennethEnevoldsen @Samoed
Let me know if there’s anything else I should do to help move the pull request forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since v2 release, you need to move your tasks into retrieval/eng/ folder

Comment on lines 1 to 6
from __future__ import annotations

from datasets import load_dataset

from mteb.abstasks.Image.AbsTaskAny2AnyRetrieval import AbsTaskAny2AnyRetrieval
from mteb.abstasks.TaskMetadata import TaskMetadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also would be

Suggested change
from __future__ import annotations
from datasets import load_dataset
from mteb.abstasks.Image.AbsTaskAny2AnyRetrieval import AbsTaskAny2AnyRetrieval
from mteb.abstasks.TaskMetadata import TaskMetadata
from datasets import load_dataset
from mteb.abstasks.retrieval import AbsTaskRetrieval
from mteb.abstasks.task_metadata import TaskMetadata

Comment on lines 32 to 33
"image": None,
"modality": "text",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't add columns with None and don't need modality column

Comment on lines 107 to 118
prompt={"query": "Find a screenshot that relevant to the user's question."},
descriptive_stats={
"n_samples": None,
"avg_character_length": {
"test": {
"average_document_length": 141.5,
"num_documents": 19,
"num_queries": 853,
"average_relevant_docs_per_query": 1.0,
}
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have descriptive_stats in task metadata. You need to use task.calculate_desriptive_statistics()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prompt={"query": "Find a screenshot that relevant to the user's question."},
descriptive_stats={
"n_samples": None,
"avg_character_length": {
"test": {
"average_document_length": 141.5,
"num_documents": 19,
"num_queries": 853,
"average_relevant_docs_per_query": 1.0,
}
},
},
prompt={"query": "Find a screenshot that relevant to the user's question."},

from mteb.abstasks.TaskMetadata import TaskMetadata


def _load_data(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reupload your tasks using task.push_dataset_to_hub() to use our format

@Samoed
Copy link
Member

Samoed commented Nov 3, 2025

Can you merge main to resolve conflicts?

@roipony
Copy link
Contributor Author

roipony commented Nov 3, 2025

Can you merge main to resolve conflicts?

I'm working on it.
Once finish I'll ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

image The image extension of MTEB new benchmark Issues related to adding a new benchmark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants