[WIP] Adding Google SVQ dataset + some refactoring #3529

silky1708 · 2025-11-06T03:15:28Z

I have outlined why this dataset is filling an existing gap in mteb
I have tested that the dataset runs with the mteb package.
I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
I have considered the size of the dataset and reduced it if it is too big (2048 examples is typically large enough for most tasks)

If you add a model or a dataset, please add the corresponding checklist:

Samoed · 2025-11-13T10:13:05Z

mteb/tasks/audio/any_2_any_retrieval/multilingual/google_svq.py

+}
+
+
+class GoogleSVQA2TRetrieval(AbsTaskAny2AnyRetrieval):


After #3528 you need to inherit from AbsTaskRetrieval

silky1708 and others added 7 commits March 14, 2025 23:50

add task subtype

1e0e4b9

add voxlingua107-top10 dataset

58b477c

updates

a546ae0

Merge branch 'embeddings-benchmark:maeb' into maeb

b3c9573

rename audio reranking

db70816

add google svq dataset

3b93da7

rename audio reranking

8108372

silky1708 self-assigned this Nov 6, 2025

silky1708 added new dataset Issues related to adding a new task or dataset maeb Audio extension labels Nov 6, 2025

make lint

f0fb1bd

KennethEnevoldsen marked this pull request as draft November 6, 2025 08:15

Samoed reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Adding Google SVQ dataset + some refactoring #3529

[WIP] Adding Google SVQ dataset + some refactoring #3529

Uh oh!

silky1708 commented Nov 6, 2025

Uh oh!

Samoed Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		}


		class GoogleSVQA2TRetrieval(AbsTaskAny2AnyRetrieval):

[WIP] Adding Google SVQ dataset + some refactoring #3529

Are you sure you want to change the base?

[WIP] Adding Google SVQ dataset + some refactoring #3529

Uh oh!

Conversation

silky1708 commented Nov 6, 2025

Uh oh!

Samoed Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants