Running RTEB with Docker

This document explains how to run the RTEB (Retrieval Embedding Benchmark) application using Docker.

Prerequisites

Run with custom arguments:

./run_rteb.sh --gpus 2 --batch_size 32 --save_embds

All arguments supported by the RTEB application can be passed directly to the Docker container. Here are some common ones:

--gpus <num>: Number of GPUs to use (default: 0, requires NVIDIA Docker runtime)
--cpus <num>: Number of CPUs to use (default: 1)
--batch_size <num>: Batch size for encoding (default: 16)
--data_path <path>: Path to the dataset (default: /app/data)
--save_path <path>: Path to save output (default: /app/output)
--save_embds: Save embeddings
--load_embds: Load pre-computed embeddings
--overwrite: Overwrite existing results

For a complete list of arguments, run:

./run_rteb.sh --help

The Docker setup includes:

A Docker image with all necessary dependencies
Volume mounts for data and output
Optional GPU support for accelerated processing (requires NVIDIA Docker runtime)
Memory limits to prevent out-of-memory errors

To modify the Docker environment:

After making changes, rebuild the Docker image:

sudo docker-compose build