Multi-Scale Retrieval with RRF

This repository demonstrates a multi-scale retrieval approach for RAG (Retrieval-Augmented Generation) systems, showing that chunk size is query-dependent and that aggregating results across multiple chunk sizes improves retrieval robustness.

Overview

Instead of committing to a single chunk size, we:

Index the same corpus multiple times with different chunk sizes (100, 200, 500 tokens)
Query all indices in parallel at inference time
Aggregate results using Reciprocal Rank Fusion (RRF) to produce final document rankings

Repository Structure

├── multi-window-chunk-size.ipynb   # Main notebook demonstrating the approach
├── seinfeld_trivia/
│   ├── data.json                   # Dataset with trivia questions and gold documents
│   └── documents_content/          # Markdown files for each Seinfeld episode
│       ├── S01E00.md
│       ├── S01E01.md
│       └── ...                     # 174 episode summaries
└── README.md

Dataset

The seinfeld_trivia/ directory contains:

documents_content/: 174 markdown files, each containing a summary of a Seinfeld episode (e.g., S05E14.md for Season 5, Episode 14)
data.json: A dataset of trivia questions with:
- query: The trivia question
- targets: The gold document(s) containing the answer
- answer: The expected answer

Notebook

The multi-window-chunk-size.ipynb notebook demonstrates:

Corpus Loading: Reading markdown documents from the dataset
Vector Store Creation: Creating OpenAI vector stores with different chunk sizes
Retrieval: Querying each vector store and comparing results
RRF Aggregation: Combining rankings across chunk sizes

Key Examples

The notebook includes three examples showing how different queries benefit from different chunk sizes:

Example	Query	Best Chunk Size
1	"What's the name for Jerry's favorite shirt?"	Small (100-200 tokens)
2	"What is Kramer's first name?"	Large (500 tokens)
3	"Where did George Costanza famously pull out a golf ball from?"	Medium (200 tokens)

RRF aggregation consistently matches or exceeds the best individual chunk size performance.

Requirements

pip install openai
export OPENAI_API_KEY=your_key_here

Usage

Set your OpenAI API key as an environment variable
Open and run multi-window-chunk-size.ipynb
The notebook will create vector stores (or reuse existing ones) and demonstrate retrieval across different chunk sizes

Key Takeaways

Chunk size is query-dependent: Fine-grained factual queries benefit from smaller chunks; contextual queries benefit from larger chunks
No single size is optimal: What works for one query may fail for another
RRF provides robustness: By aggregating multiple rank signals, we typically match or exceed the best individual configuration
Simple implementation: No retraining or query classification needed—just parallel retrieval and rank aggregation

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
seinfeld_trivia		seinfeld_trivia
README.md		README.md
multi-window-chunk-size.ipynb		multi-window-chunk-size.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Scale Retrieval with RRF

Overview

Repository Structure

Dataset

Notebook

Key Examples

Requirements

Usage

Key Takeaways

About

Uh oh!

Releases

Packages

Contributors 2

Languages

AI21Labs/multi-window-chunk-size

Folders and files

Latest commit

History

Repository files navigation

Multi-Scale Retrieval with RRF

Overview

Repository Structure

Dataset

Notebook

Key Examples

Requirements

Usage

Key Takeaways

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages