This repository demonstrates a multi-scale retrieval approach for RAG (Retrieval-Augmented Generation) systems, showing that chunk size is query-dependent and that aggregating results across multiple chunk sizes improves retrieval robustness.
Instead of committing to a single chunk size, we:
- Index the same corpus multiple times with different chunk sizes (100, 200, 500 tokens)
- Query all indices in parallel at inference time
- Aggregate results using Reciprocal Rank Fusion (RRF) to produce final document rankings
├── multi-window-chunk-size.ipynb # Main notebook demonstrating the approach
├── seinfeld_trivia/
│ ├── data.json # Dataset with trivia questions and gold documents
│ └── documents_content/ # Markdown files for each Seinfeld episode
│ ├── S01E00.md
│ ├── S01E01.md
│ └── ... # 174 episode summaries
└── README.md
The seinfeld_trivia/ directory contains:
-
documents_content/: 174 markdown files, each containing a summary of a Seinfeld episode (e.g.,S05E14.mdfor Season 5, Episode 14) -
data.json: A dataset of trivia questions with:query: The trivia questiontargets: The gold document(s) containing the answeranswer: The expected answer
The multi-window-chunk-size.ipynb notebook demonstrates:
- Corpus Loading: Reading markdown documents from the dataset
- Vector Store Creation: Creating OpenAI vector stores with different chunk sizes
- Retrieval: Querying each vector store and comparing results
- RRF Aggregation: Combining rankings across chunk sizes
The notebook includes three examples showing how different queries benefit from different chunk sizes:
| Example | Query | Best Chunk Size |
|---|---|---|
| 1 | "What's the name for Jerry's favorite shirt?" | Small (100-200 tokens) |
| 2 | "What is Kramer's first name?" | Large (500 tokens) |
| 3 | "Where did George Costanza famously pull out a golf ball from?" | Medium (200 tokens) |
RRF aggregation consistently matches or exceeds the best individual chunk size performance.
pip install openai
export OPENAI_API_KEY=your_key_here- Set your OpenAI API key as an environment variable
- Open and run
multi-window-chunk-size.ipynb - The notebook will create vector stores (or reuse existing ones) and demonstrate retrieval across different chunk sizes
- Chunk size is query-dependent: Fine-grained factual queries benefit from smaller chunks; contextual queries benefit from larger chunks
- No single size is optimal: What works for one query may fail for another
- RRF provides robustness: By aggregating multiple rank signals, we typically match or exceed the best individual configuration
- Simple implementation: No retraining or query classification needed—just parallel retrieval and rank aggregation

