17 Feb 23:21

KennethEnevoldsen

c436cbb

2.8.1 Latest

Latest

2.8.1 (2026-02-17)

Fix

fix: Remove duplicate citations and add test to prevent it going forward (#4032)
test: add test to detect duplicate citations
quality
move changes to task file
fix falsepositives
update citations
add models and benchmarks
search close titles
fix maeb

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (377f1b4)

Unknown

benchmark: add 6 new VisRAG retrieval tasks and corresponding stats (#4059)
dataset: add 6 new VisRAG retrieval tasks and corresponding stats

Introduced VisRAGRetArxivQA, VisRAGRetChartQA, VisRAGRetInfoVQA, VisRAGRetMPDocVQA, VisRAGRetPlotQA, and VisRAGRetSlideVQA classes for various retrieval tasks.
Added JSON files containing descriptive statistics for each task, including sample counts, image dimensions, and query statistics.
Updated the retrieval module's init.py to include the new tasks in the module exports.

fix a linter error
dataset: introduce VisRAG Retrieval Benchmark
fix: metadata update of VisRag
Update benchmark metadata
Update VisRAG datasets metadata, including one-line description and the domains
Update slideVQA domain
Add Aliases for VisRAG
Fix bibtex format
Update dataset metadata to point to the mteb versions
Remove redundant data loading

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (7a9e653)

fix: Remove MAEB+ and MAEB(extended) from leaderboard and add "beta" to all MAEB (#4103)
fix: Remove MAEB+ and MAEB(extended)

related to #3470

We could consider keeping the two benchmarks (would still need to be beta as paper is review) so they could change.

fix: Remove MAEB+ and MAEB(extended) from leaderboard and make add "beta" to all MAEB

related to #3470

Remove MAEB+ and MAEB(extended) from leaderboard
Added the beta tag to denote that these might change

Currently implemented it as keeping the two temporary benchmarks. We could consider removing them as well (I am unsure how much of a burden it is for us to maintain them), but I would probably not add them to the leaderboard.

All of these changes should be backward compatible

docs: Added whatsnew
implement fixes
update description to explain beta status (77ac52b)

Assets 6

12 Feb 16:15

KennethEnevoldsen

2.7.30

7b6c20f

2.7.30

2.7.30 (2026-02-12)

Fix

fix: correct reference link for MIRACLVisionRetrieval task (#4092) (d3e9b06)

Assets 6

12 Feb 14:55

KennethEnevoldsen

2.7.29

7d4cc63

2.7.29

2.7.29 (2026-02-12)

Documentation

docs: Improved adding a benchmark docs (#4087)
docs: Improved adding a benchmark docs

expanded the explanation of provide more information about the process.

Update docs/contributing/adding_a_benchmark.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix
fix
minor heading change
Apply suggestions from code review

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (8149d4a)

Fix

fix: constrain the transformers library version for jina-clip (#4061)
fix: constrain the transformers library version for jina-clip to avoid compatibility issue
add require package
ad to conflicts
try to run again
upd lock
tmp
try
fix: pylate dependency on outdated version of transformers

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (eee82cf)

Unknown

dataset: Add MTEB(spa) Spanish language benchmark (#4053)
dataset: Add MTEB(spa) Spanish language benchmark

Define MTEB(spa, v1) benchmark grouping 23 existing Spanish tasks
across 6 task types: Classification (8), Clustering (3),
PairClassification (2), Reranking (1), Retrieval (5), and STS (4).

fix: Replace MIRACLRetrieval with HardNegatives.v2 per review
fix: Remove tasks with known issues, add contact, reduce to 16 tasks
Apply suggestion from @KennethEnevoldsen

Co-authored-by: Clemente <clemente@Clementes-MacBook-Pro.local>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (2507bec)

Move MIEB datasets to mteb HuggingFace org (#4070)
Move 15 MIEB datasets to mteb HuggingFace org

Update dataset paths and revisions for tasks that now use datasets
forked to the mteb org:

MMSoc_HatefulMemes, MMSoc_Memotion (Ahren09 -> mteb)
blink-it2i, blink-it2i-multi, blink-it2t, blink-it2t-multi (JamieSJS -> mteb)
gld-v2-i2t, imagecode, imagecode-multi (JamieSJS -> mteb)
imagenet-10, imagenet-dog-15, met (JamieSJS -> mteb)
r-oxford-easy-multi, r-oxford-medium-multi, r-oxford-hard-multi (JamieSJS -> mteb)

Part of #4049

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

transfer mrbench
Move 8 isaacchung MIEB datasets to mteb HuggingFace org

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move 12 MIEB datasets to mteb HuggingFace org

Datasets moved:

dpdl-benchmark/sun397
ethz/food101
floschne/xflickrco
floschne/xm3600
flwrlabs/ucf101
nyu-visionx/CV-Bench
tanganke/dtd
tanganke/stl10
timm/eurosat-rgb
timm/resisc45
uoft-cs/cifar10
uoft-cs/cifar100

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move 15 MIEB datasets to mteb HuggingFace org

Datasets moved:

JamieSJS/r-paris-easy-multi
JamieSJS/r-paris-medium-multi
JamieSJS/r-paris-hard-multi
JamieSJS/rp2k
JamieSJS/sketchy
JamieSJS/stanford-online-products
JamieSJS/vizwiz
JamieSJS/vqa-2
Pixel-Linguist/rendered-sts12
Pixel-Linguist/rendered-sts13
Pixel-Linguist/rendered-sts14
Pixel-Linguist/rendered-sts15
Pixel-Linguist/rendered-sts16
ylecun/mnist
zh-plus/tiny-imagenet

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move 7 clip-benchmark datasets to mteb HuggingFace org

Datasets moved:

clip-benchmark/wds_country211
clip-benchmark/wds_fer2013
clip-benchmark/wds_gtsrb
clip-benchmark/wds_renderedsst2
clip-benchmark/wds_vtab-clevr_closest_object_distance
clip-benchmark/wds_vtab-clevr_count_all
clip-benchmark/wds_vtab-pcam

Note: wds_imagenet1k failed due to storage limits.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move 14 MIEB datasets to mteb HuggingFace org

Datasets migrated:

clip-benchmark/wds_imagenet1k → mteb/wds_imagenet1k
m-a-p/SciMMIR → mteb/SciMMIR
yjkimstats/SUGARCREPE_fmt → mteb/SUGARCREPE_fmt
nelorth/oxford-flowers → mteb/oxford-flowers
vidore/arxivqa_test_subsampled_beir → mteb/arxivqa_test_subsampled_beir
vidore/docvqa_test_subsampled_beir → mteb/docvqa_test_subsampled_beir
vidore/infovqa_test_subsampled_beir → mteb/infovqa_test_subsampled_beir
vidore/shiftproject_test_beir → mteb/shiftproject_test_beir
vidore/syntheticDocQA_artificial_intelligence_test_beir → mteb/syntheticDocQA_artificial_intelligence_test_beir
vidore/syntheticDocQA_energy_test_beir → mteb/syntheticDocQA_energy_test_beir
vidore/syntheticDocQA_government_reports_test_beir → mteb/syntheticDocQA_government_reports_test_beir
vidore/syntheticDocQA_healthcare_industry_test_beir → mteb/syntheticDocQA_healthcare_industry_test_beir
vidore/tabfquad_test_subsampled_beir → mteb/tabfquad_test_subsampled_beir
vidore/tatdqa_test_beir → mteb/tatdqa_test_beir

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

update rest

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> (2ef04f8)

model: add voyage-4-nano (#4086)
model: add voyage-4-nano model implementation
Apply suggestion from @Samoed

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (2ce07c4)

Contributors

KennethEnevoldsen and Samoed

Assets 6

11 Feb 16:04

KennethEnevoldsen

2.7.28

f5c5578

2.7.28

2.7.28 (2026-02-11)

Fix

fix: Remove task performance by type tab when there is only one type (#4067)
Remove task performance by type Tab when the Radar plot can't be generated
Apply suggestion (9f95b58)

Unknown

Correct Embedding Dimension for paraphrase-multilingual-MiniLM-L12-v2 (#4089) (334d690)

Assets 6

11 Feb 11:27

KennethEnevoldsen

2.7.27

61bd5b5

2.7.27

2.7.27 (2026-02-11)

Documentation

docs: Outline for adding a task documentation (#4082)
docs: Outline for adding a task documentation

This is a suggested structure, PR is just to get feedback before I finish it up.

fixes #4077

upd docs
install dependencies in ci
add example with retrieval
filled out the missing segments
lint and format
Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix numerating and indent
add missing imports
fix links
add full example for retrieval dataset

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (42b8058)

docs: Improve docstring for some of the main abstasks (#4083)
docs: fix AbsTaskClassification docstring formatting and improve docstrings for some of the main tasks
format (50bd0fa)

Fix

fix: Add performance per language tab to more benchmarks (#4066)

Add Performance per language Tab to more benchmarks (4ca1922)

Unknown

dataset: add 'law-ir_ko' dataset for IR task (#4052)
law_ir_ko
Update mteb/tasks/retrieval/kor/law_ir_ko.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

law_ir_ko info revision
description
metadata-info rev
metadata-info rev
Update mteb/tasks/retrieval/kor/law_ir_ko.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

statistics(), reference
format citation
author & howpublished rev
make lint
description rev
Update mteb/tasks/retrieval/kor/law_ir_ko.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

Update mteb/tasks/retrieval/kor/law_ir_ko.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (1cdc662)

model: Add ModelMeta for geoffsee/auto-g-embed-st (#4074)

Add ModelMeta for geoffsee/auto-g-embed-st (81540a2)

Add MetaCLIP 2 model integration (#4065)
Add MetaCLIP 2 model integration

Add support for facebook/metaclip-2-mt5-worldwide-b32, a multilingual
vision-language model using mT5 tokenizer for worldwide language support.

254M parameters, 512 embedding dimension
Supports 99 languages (XLMR language set)
Handles MetaCLIP 2's BaseModelOutputWithPooling return format

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add n_embedding_parameters for MetaCLIP 2 model

Set n_embedding_parameters to 128,057,344 (mT5 vocab size 250,112 × embed_dim 512)
to fix test_n_embedding_parameters test failure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add training metadata for MetaCLIP 2 model

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Apply suggestion from @Samoed

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (4280dd9)

Contributors

Samoed

Assets 6

07 Feb 22:36

KennethEnevoldsen

2.7.26

cf18065

2.7.26

2.7.26 (2026-02-07)

Fix

fix: filter corrupted image in Birdsnap (#4068)
fix: filter corrupted image in Birdsnap and drop unused splits in zero-shot tasks

Filter out corrupted/truncated image at index 3854 in Birdsnap train split
Add dataset_transform to AbsTaskZeroShotClassification to keep only eval splits
(zero-shot tasks don't need train splits)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: handle BaseModelOutputWithPooling in CLIP model wrapper

In transformers 5.x, get_text_features and get_image_features return
BaseModelOutputWithPooling instead of a tensor directly. Extract the
pooler_output when needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: add None check for dataset in zeroshot classification transform

Fixes mypy type errors where self.dataset could be None when accessing
.keys() and deleting splits in dataset_transform method.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (6c00506)

Unknown

Backfill missing metadata for historic datasets (#4063)
Backfill missing metadata for historic datasets

Fill in missing TaskMetadata fields for ~90 historic datasets as
described in issue #2502. This includes:

Classification tasks (Polish, Chinese)
Clustering tasks (German, French, Spanish, Swedish, Chinese, Multilingual)
Pair classification tasks (Polish, Chinese)
Reranking tasks (English, French, Chinese)
Retrieval tasks (German, English, Japanese, Korean, Polish, Spanish, Chinese, Multilingual)
STS tasks (German, English, French, Korean, Spanish, Chinese)

Fields filled include: date, domains, task_subtypes, license,
annotations_creators, dialect, sample_creation, and bibtex_citation.

The _HISTORIC_DATASETS list is reduced from ~90 entries to just 4
aggregate tasks whose metadata computation has a separate issue
(the compute* methods return None for single-valued fields).

Closes #2502

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix type annotation in _compute_license method

Add StrURL to the return type and set type annotation to match
the license field type (Licenses | StrURL | None).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (b5fb471)

Assets 6

07 Feb 18:14

KennethEnevoldsen

2.7.25

f780270

2.7.25

2.7.25 (2026-02-07)

Fix

fix: mscoco (#4062)
fix mscoco
fix jina clip (c2d1bfe)

Unknown

fix Remove the hardcoded batch_size=1 when generating text and image embeddings for Nemotron-Colembed-v2 models (#4054)

remove hardcoded batch_size 1 (1682b2f)

Update nemotron v2 citation (#4051)

update nemotron v2 citation (0be1df3)

Assets 6

05 Feb 10:36

KennethEnevoldsen

2.7.24

6c5a782

2.7.24

2.7.24 (2026-02-05)

Fix

fix: leaderboard errors (#3969)
fix leaderboard
fix leaderboard errors
simplify
upd description (fd37337)

Unknown

fix docs deploy command (#4044) (acb3d8c)
fix docs links (#4043) (5d3b1a8)

Assets 6

04 Feb 11:16

KennethEnevoldsen

2.7.23

88296f8

2.7.23

2.7.23 (2026-02-04)

Fix

fix: Fill in embedding and total parameters in ModelMeta (#4031)
Filling Embedding/Total Parameters in ModelMeta
Add parameter for other models
Add parameters for more models
Added exact value for n_parameters
Fix tests
set n_embedding_parameters to None
Add results of some more models
Add tests
Add _HISTORIC_MODELS list in test
Update tests/test_models/test_model_meta.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

fix tests
correct tests
fix _HISTORIC_MODELS list

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (bc6e6cb)

Unknown

dataset: Add ERESS reranking task (#3991)
dataset: Add ERESS reranking task

Add ERESSReranking task for e-commerce product relevance reranking
Dataset: thebajajra/eress with ~72k query-product pairs
Supports graded relevance (0-100 integer scale)
Main metric: nDCG@5
Add E-commerce domain and Product Reranking subtypes to TaskMetadata
Include descriptive statistics

fix: align dataset_transform signature with base class
fix: dataset reuploaded, custom transformation removed
fix: rev updated with title + text combination
description moved away from docstring
Update mteb/tasks/reranking/eng/ecommerce_product_relevance_reranking.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> (fe67f8e)

Assets 6

03 Feb 12:59

KennethEnevoldsen

2.7.22

363a27e

2.7.22

2.7.22 (2026-02-03)

Documentation

docs: Added changelog (#3741)
docs: Added changelog

Clean up docs to prepare for adding the changelog. By adding missing links and removing references to documentation that does not exist
Added whats new section
Added changes from 2.0 upwards. I might be missing some

I think going forward we can just update this as well go.

minor fix
added autogenerated changelog
rename
add autogenerated workflows
updates
update
update (2082d3e)

Fix

fix: backfilling historic tasks (#4034)
fix: backfilling historic tasks

Backfilled task metadata
extended test to ensure that backfilled tasks are removed from the historic list

addresses #2502

back citation, date and task subtypes where only those are missing
Update mteb/tasks/pair_classification/pol/polish_pc.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

add famteb citation

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (e542519)

Assets 6

Releases: embeddings-benchmark/mteb

2.8.1

2.8.1 (2026-02-17)

Fix

Unknown

Uh oh!

2.7.30

2.7.30 (2026-02-12)

Fix

Uh oh!

2.7.29

2.7.29 (2026-02-12)

Documentation

Fix

Unknown

Contributors

Uh oh!

2.7.28

2.7.28 (2026-02-11)

Fix

Unknown

Uh oh!

2.7.27

2.7.27 (2026-02-11)

Documentation

Fix

Unknown

Contributors

Uh oh!

2.7.26

2.7.26 (2026-02-07)

Fix

Unknown

Uh oh!

2.7.25

2.7.25 (2026-02-07)

Fix

Unknown

Uh oh!

2.7.24

2.7.24 (2026-02-05)

Fix

Unknown

Uh oh!

2.7.23

2.7.23 (2026-02-04)

Fix

Unknown

Uh oh!

2.7.22

2.7.22 (2026-02-03)

Documentation

Fix

Uh oh!