Skip to content

Commit 21cfcc6

Browse files
salgadocodefromthecryptJessicaGarson
authored
source code: Add Multimodal RAG with Elasticsearch Gotham City tutorial (#390)
Signed-off-by: Adrian Cole <[email protected]> Co-authored-by: Adrian Cole <[email protected]> Co-authored-by: Jess Garson <[email protected]>
1 parent 24c2e81 commit 21cfcc6

29 files changed

+1377
-0
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Building a Multimodal RAG Pipeline with Elasticsearch: The Story of Gotham City
2+
3+
This repository contains the code for implementing a Multimodal Retrieval-Augmented Generation (RAG) system using Elasticsearch. The system processes and analyzes different types of evidence (images, audio, text, and depth maps) to solve a crime in Gotham City.
4+
5+
## Overview
6+
7+
The pipeline demonstrates how to:
8+
- Generate unified embeddings for multiple modalities using ImageBind
9+
- Store and search vectors efficiently in Elasticsearch
10+
- Analyze evidence using GPT-4 to generate forensic reports
11+
12+
## Prerequisites
13+
14+
- Python 3.x
15+
- Elasticsearch cluster (cloud or local)
16+
- OpenAI API key - Setup an OpenAI account and create a [secret key](https://platform.openai.com/docs/quickstart)
17+
- 8GB+ RAM
18+
- GPU (optional but recommended)
19+
20+
## Code execution
21+
22+
We provide a Google Colab notebook that allows you to explore the entire pipeline interactively:
23+
- [Open the Multimodal RAG Pipeline Notebook](notebook/01-mmrag-blog-quick-start.ipynb)
24+
- This notebook includes step-by-step instructions and explanations for each stage of the pipeline
25+
26+
27+
## Project Structure
28+
29+
```
30+
├── README.md
31+
├── requirements.txt
32+
├── notebook/
33+
│ ├── 01-mmrag-blog-quick-start.ipynb # Jupyter notebook execution
34+
├── src/
35+
│ ├── embedding_generator.py # ImageBind wrapper
36+
│ ├── elastic_manager.py # Elasticsearch operations
37+
│ └── llm_analyzer.py # GPT-4 integration
38+
├── stages/
39+
│ ├── 01-stage/ # File organization
40+
│ ├── 02-stage/ # Embedding generation
41+
│ ├── 03-stage/ # Elasticsearch indexing/search
42+
│ └── 04-stage/ # Evidence analysis
43+
└── data/ # Sample data
44+
├── images/
45+
├── audios/
46+
├── texts/
47+
└── depths/
48+
49+
```
50+
51+
## Sample Data
52+
53+
The repository includes sample evidence files:
54+
- Images: Crime scene photos and security camera footage
55+
- Audio: Suspicious sound recordings
56+
- Text: Mysterious notes and riddles
57+
- Depth Maps: 3D scene captures
58+
59+
## How It Works
60+
61+
1. **Evidence Collection**: Files are organized by modality in the `data/` directory
62+
2. **Embedding Generation**: ImageBind converts each piece of evidence into a 1024-dimensional vector
63+
3. **Vector Storage**: Elasticsearch stores embeddings with metadata for efficient retrieval
64+
4. **Similarity Search**: New evidence is compared against the database using k-NN search
65+
5. **Analysis**: GPT-4 analyzes the connections between evidence to identify suspects
66+
57.5 KB
Loading
126 KB
Loading
201 KB
Loading
134 KB
Loading
2.41 MB
Loading
95.4 KB
Loading
1.98 MB
Loading
2.3 MB
Loading

0 commit comments

Comments
 (0)