B3

Preprint Out: https://arxiv.org/pdf/2505.11293

This repo contains the code and data for Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining (B3). The model attains state-of-the-art results on Massive Multimodal Embeddings Benchmark (MMEB). Specifically, for retrieval, our model significantly outperforms other methods. Our 2B model surpasses the performance of several existing 7B models. The following figure details our methodology.

Our model ranks top on the MMEB Leaderboard on May 15th, 2025.

🧠 Trained Models

Model Scale	Hugging Face Model Hub
2B	raghavlite/B3_Qwen2_2B
8B	raghavlite/B3_Qwen2_7B

Batch mining and Training data creation

Look at the data_processing/ folder. This step has already been completed, and the processed data is available in MMEB-train2.

Training

The training command below is configured for 8 GPUs, but it can be easily modified to run on 4 GPUs.
This exact training command was used to train B3++ Qwen2b.

torchrun --nproc_per_node=8 --master_port=2215 --max_restarts=0 train.py \
    --output_dir ./MMEB-trainedmodels/0425_InstuctionP_odibn32_sdibn_20D_HNPS_Metis_bs1024.32bi_30.130_30P.10.5_70.170_qwen2b_2k_dy \
    --lora --lora_r 8 \
    --model_name Qwen/Qwen2-VL-2B-Instruct \
    --bf16 --pooling eos --normalize True --temperature 0.02 \
    --dataloader_num_workers 0 \
    --dataset_config configs/data_configs/mmeb_new/mmeb20_HNPS_bs32bi_30.130_30P.10.5_70.170_qwen2b.yaml \
    --grad_cache True --per_device_train_batch_size 128 \
    --gc_q_chunk_size 128 --gc_p_chunk_size 128 --gc_dynamic_limit 64 \
    --lr_scheduler_type linear --learning_rate 1e-4 --max_steps 2000 --warmup_steps 200 \
    --save_steps 1000 --logging_steps 1 --save_safetensors False --remove_unused_columns False \
    --resume_from auto --resize_use_processor True --interleave_batch_size 1 --ddp_timeout 14400 \
    --ignore_data_skip true --eval_steps 200 --eval_strategy steps \
    --eval_dataset_name TIGER-Lab/MMEB-eval --eval_subset_name MSCOCO_i2t \
    --eval_image_dir ../VLM2Vec/MMEB-eval/eval_images --per_device_eval_batch_size 2 \
    --sdibn --odibn --chunk_size 32

Key Parameters

Argument	Description
`--chunk_size 32`	Picks 32-sized clusters from the dataset file. Used later in `odibn`.
`--grad_cache`, `--gc_*`	Enables dynamic gradient cache chunking. Values generally don’t need changing unless running on low-memory GPUs.
`--sdibn`	Enables batches to be mined from the same dataset (e.g., all examples in a batch are mined from `MSCOCO_i2t`).
`--odibn`	Applies B3 clustering. Picks `chunk_size`-sized clusters and puts random `(batch_size / chunk_size)` clusters in the same batch.
`--pos_only`	Removes hard negatives and only relies on in-batch negatives. By default, `--odibn` uses hard negatives from `MMEB-train2` datasets.
`--dataset_config $path`	Path to training dataset configuration file.

Inference & Evaluation

Download the image file zip from huggingface

wget https://huggingface.co/datasets/TIGER-Lab/MMEB-eval/resolve/main/images.zip
unzip images.zip -d eval_images/

To evaluate our model on an MMEB dataset (e.g., MSCOCO_i2t), run:

python  eval_mmeb.py  --model_name raghavlite/B3_Qwen2_7B --encode_output_path  ./MMEB-evaloutputs/B2_Qwen2_7B/  --pooling  eos  --normalize  True  --lora  --lora_r  8  --bf16  --dataset_name  TIGER-Lab/MMEB-eval  --subset_name  MSCOCO_i2t  --dataset_split  test  --per_device_eval_batch_size  4  --image_dir  eval_images/  --tgt_prefix_mod

Running on your Data

To run B3 models on your dataset, just repurpose the eval_mmeb.py code. It is quick and simple to do. Lines 120-126 create a query dataset. Lines 127-138 create a target dataset. Ensure your data is in the same format as a reference query and target dataset (Eg. MSCOCO_i2t or MSCOCO_t2i). Lines 159-160 extract query embeddings and Lines 172-172 extract target embeddings. We will soon release a much simpler script for general purpose usage.

Acknowledgement

We have adapted code from VLM2Vec.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
MMEB-train2		MMEB-train2
adhoc		adhoc
archive		archive
configs/data_configs/mmeb_new		configs/data_configs/mmeb_new
data_processing		data_processing
evaluation		evaluation
examples		examples
figures		figures
grad_cache		grad_cache
llava		llava
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_all.sh		eval_all.sh
eval_all_print.py		eval_all_print.py
eval_gui.py		eval_gui.py
eval_mmeb.py		eval_mmeb.py
eval_mmeb_new.py		eval_mmeb_new.py
eval_video_cls.py		eval_video_cls.py
eval_video_mret.py		eval_video_mret.py
eval_video_qa.py		eval_video_qa.py
eval_video_ret.py		eval_video_ret.py
get_average_size.py		get_average_size.py
methodology.png		methodology.png
ranking.png		ranking.png
requirements.txt		requirements.txt
run_eval_all.sh		run_eval_all.sh
run_inf.sh		run_inf.sh
test_ddp.py		test_ddp.py
train.py		train.py
update_run_names.py		update_run_names.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

B3

🧠 Trained Models

Batch mining and Training data creation

Training

Key Parameters

Inference & Evaluation

Running on your Data

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

raghavlite/B3

Folders and files

Latest commit

History

Repository files navigation

B3

🧠 Trained Models

Batch mining and Training data creation

Training

Key Parameters

Inference & Evaluation

Running on your Data

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages