diff --git a/public_rerank_benchmarks/bm25_with_rerank.md b/public_rerank_benchmarks/bm25_with_rerank.md new file mode 100644 index 00000000..f44b8f32 --- /dev/null +++ b/public_rerank_benchmarks/bm25_with_rerank.md @@ -0,0 +1,5 @@ +| Model | First Stage| BEIR Evaluation (14 Datasets)| Code Evaluation (6 Datasets)| Long Context Evaluation (7 Datasets)| Multilingual (18 Datasets)| Semi-Structured Data Evaluation (5 Datasets) +| -------- | -------- | -------- | -------- | -------- | -------- | -------- | +|N/A|BM25|43.7|34.0|54.6|36.5|47.5| +|Rerank v2|BM25|49.4|37.6|63.1|62.8|60.3| +|Rerank v3|BM25|53.0|51.7|69.0|70.8|62.7| \ No newline at end of file diff --git a/public_rerank_benchmarks/embed_with_rerank.md b/public_rerank_benchmarks/embed_with_rerank.md new file mode 100644 index 00000000..5d226539 --- /dev/null +++ b/public_rerank_benchmarks/embed_with_rerank.md @@ -0,0 +1,7 @@ +| Model | First Stage| Code Evaluation (6 Datasets)| Long Context Evaluation (7 Datasets)| Multilingual (18 Datasets)| Semi-Structured Data Evaluation (5 Datasets) +| -------- | -------- | -------- | -------- | -------- | -------- | +|N/A|embed-v3.0|65.0|56.8|TBU|47.8 +|Rerank v2|embed-v3.0|77.6|62.8|TBU|TBU +|Rerank v3|embed-v3.0|64.5|69.3|TBU|62.7 + +Note: For Multilingual, both we used `embed-multilingual-v3.0`, `rerank-multilingual-v2.0`, and `rerank-multilingual-v3.0`. All other evaluations were done with the english model variants \ No newline at end of file diff --git a/public_rerank_benchmarks/miracl.md b/public_rerank_benchmarks/miracl.md new file mode 100644 index 00000000..abdd6268 --- /dev/null +++ b/public_rerank_benchmarks/miracl.md @@ -0,0 +1,7 @@ +| Model | First Stage| Arabic (ar) | Bengali (bn) | English (en) | Spanish (es) | Persian (fa) | Finnish (fi) | French (fr) | Hindi (hi) | Indonesian (id) | Japanese (ja) | Korean (ko) | Russian (ru) | Swahili (sw) | Telugu (te) | Thai (th) | Chinese (zh) | Germany (de) | Yoruba (yo) | Avg (18 datasets) | Avg (excl. de and yo) | +| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | +| BM25 | N/A | 48.1 | 50.8 | 35.1 | 31.9 | 33.3 | 55.1 | 18.3 | 45.8 | 44.9 | 36.9 | 41.9 | 33.4 | 38.3 | 49.4 | 48.8 | 18.0 | N/A | N/A | N/A | 39.4 | +| embed-multilingual-v3.0 | N/A | 76.5 | 75.8 | 57.0 | 55.1 | 57.5 | 77.1 | 57.4 | 61.7 | 52.5 | 69.6 | 66.0 | 68.8 | 75.7 | 83.3 | 79.5 | 58.9 | 58.7 | 61.8 | 66.3 | 67.0 | +| Rerank 3 | BM25 | 75.3 | 78.6 | 60.7 | 55.1 | 56.1 | 77.5 | 47.2 | 62.2 | 58.1 | 67.4 | 44.0 | 55.9 | 70.3 | 72.6 | 77.3 | 50.9 | 45.0 | 76.5 | 62.8 | 63.1 | +| Rerank 3 | embed-multilingual-v3.0 | 80.4 | 82.5 | 61.4 | 57.0 | 62.4 | 80.8 | 58.4 | 62.9 | 57.5 | 75.1 | 74.5 | 67.3 | 77.2 | 83.8| 82.8 | 65.7 | 60.8 | 83.1 | 70.8 | 70.6 | +