Skip to content

Commit 6aedd39

Browse files
Merge pull request #31 from Bobbins228/unify-benchmarks
Make modular benchmarking script
2 parents fc630f4 + 1464393 commit 6aedd39

File tree

12 files changed

+1851
-1720
lines changed

12 files changed

+1851
-1720
lines changed

benchmarks/beir-benchmarks/README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Benchmarking Llama Stack with the BEIR Framework
2+
3+
## Purpose
4+
The purpose of this script is to provide a variety of different benchmarks users can run with Llama Stack using standardized information retrieval benchmarks from the [BEIR](https://github.com/beir-cellar/beir) framework.
5+
6+
## Available Benchmarks
7+
Currently there is only one benchmark available:
8+
1. [Benchmarking embedding models with BEIR Datasets and Llama Stack](benchmarking_embedding_models.md)
9+
10+
11+
## Prerequisites
12+
* [Python](https://www.python.org/downloads/) > v3.12
13+
* [uv](https://github.com/astral-sh/uv?tab=readme-ov-file#installation) installed
14+
* [ollama](https://ollama.com/) set up on your system and running the `meta-llama/Llama-3.2-3B-Instruct` model
15+
16+
> [!NOTE]
17+
> Ollama can be replaced with an [inference provider](https://llama-stack.readthedocs.io/en/latest/providers/inference/index.html) of your choice
18+
19+
## Installation
20+
21+
Initialize a virtual environment:
22+
``` bash
23+
uv venv .venv --python 3.12 --seed
24+
source .venv/bin/activate
25+
```
26+
27+
Install the required dependencies:
28+
29+
```bash
30+
uv pip install -r requirements.txt
31+
```
32+
33+
Prepare your environment by running:
34+
``` bash
35+
# The run.yaml file is based on starter template https://github.com/meta-llama/llama-stack/tree/main/llama_stack/templates/starter
36+
# We run a build here to install all of the dependencies for the starter template
37+
llama stack build --template starter --image-type venv
38+
```
39+
40+
## Quick Start
41+
42+
1. **Run a basic benchmark**:
43+
```bash
44+
# Runs the embedding models benchmark by default
45+
ENABLE_OLLAMA=ollama ENABLE_MILVUS=milvus OLLAMA_INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" uv run python beir_benchmarks.py --dataset-names scifact --embedding-models granite-embedding-125m
46+
```
47+
48+
2. **View results**: Results will be saved in the `results/` directory with detailed evaluation metrics.
49+
50+
## File Structure
51+
52+
```
53+
beir-benchmarks/
54+
├── README.md # This file
55+
├── beir_benchmarks.py # Main benchmarking script for multiple benchmarks
56+
├── benchmarking_embedding_models.md # Detailed documentation and guide
57+
├── requirements.txt # Python dependencies
58+
└── run.yaml # Llama Stack configuration
59+
```
60+
61+
## Usage Examples
62+
63+
### Basic Usage
64+
```bash
65+
# Run benchmark with default settings
66+
ENABLE_OLLAMA=ollama ENABLE_MILVUS=milvus OLLAMA_INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" uv run python beir_benchmarks.py
67+
68+
# Specify custom dataset and model
69+
ENABLE_OLLAMA=ollama ENABLE_MILVUS=milvus OLLAMA_INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" uv run python beir_benchmarks.py --dataset-names scifact --embedding-models granite-embedding-125m
70+
71+
# Run with custom batch size
72+
ENABLE_OLLAMA=ollama ENABLE_MILVUS=milvus OLLAMA_INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" uv run python beir_benchmarks.py --batch-size 100
73+
```
74+
75+
### Advanced Configuration
76+
For advanced configuration options and detailed setup instructions, see [benchmarking_embedding_models.md](benchmarking_embedding_models.md).
77+
78+
## Results
79+
80+
Benchmark results are automatically saved in the `results/` directory in TREC evaluation format. Each result file contains:
81+
- Query-document relevance scores
82+
- Ranking information for retrieval evaluation
83+
- Timestamp and model information in the filename
84+
85+
## Support
86+
87+
For detailed technical documentation, refer to:
88+
- [benchmarking_embedding_models.md](benchmarking_embedding_models.md) - Comprehensive guide for embedding models benchmark
89+
- [BEIR Documentation](https://github.com/beir-cellar/beir) - Official BEIR framework docs
90+
- [Llama Stack Documentation](https://llama-stack.readthedocs.io/) - Llama Stack API reference

0 commit comments

Comments
 (0)