Skip to content

Commit 76475fa

Browse files
polish-and-docker
Signed-off-by: Adrian Cole <[email protected]>
1 parent 4182f10 commit 76475fa

File tree

18 files changed

+359
-512
lines changed

18 files changed

+359
-512
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Ignore everything
2+
**
3+
4+
# Allow specific files and directories
5+
!requirements.txt
6+
!data/
7+
!src/
8+
!stages/
Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
1-
# Elasticsearch Configuration
2-
ELASTIC_API_KEY=your_api_key_here
3-
ELASTICSEARCH_ENDPOINT=your_elastic_endpoint
1+
# Make a copy of this file with the name .env and assign values to variables
2+
3+
# How you connect to Elasticsearch: change details to your instance
4+
ELASTICSEARCH_URL=
5+
ELASTICSEARCH_API_KEY=
6+
# If not using API key, uncomment these and fill them in:
7+
# ELASTICSEARCH_USER=elastic
8+
# ELASTICSEARCH_PASSWORD=elastic
49

510
# OpenAI Configuration
6-
OPENAI_API_KEY=your_openai_api_key_here
11+
OPENAI_API_KEY=
712

813
# Model Configuration
9-
MODEL_PATH=~/.cache/torch/checkpoints/imagebind_huge.pth
1014

1115
# Optional Configuration
12-
#LOG_LEVEL=INFO
13-
#DEBUG=False
16+
# LOG_LEVEL=INFO
17+
# DEBUG=False

supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/Dockerfile

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,36 @@
1-
FROM ubuntu:24.04
1+
# Use non-slim image due to OS dependencies of python packages. This gives us
2+
# git, build-essential, libglib2 (opencv) and gomp (torchaudio).
3+
FROM python:3.12
24

3-
# Install necessary packages
4-
RUN apt update && apt install -y --no-install-recommends \
5-
python3 \
6-
python3-pip \
7-
python3-venv \
8-
g++ \
9-
gcc \
10-
python3.12-dev
5+
COPY /requirements.txt .
116

12-
# Create and activate a virtual environment
13-
RUN python3 -m venv /opt/venv
14-
ENV PATH="/opt/venv/bin:$PATH"
7+
# Our python requirements have some OS dependencies beyond the base layer:
8+
#
9+
# * imagebind pulls in cartopy which has OS dependencies on geos and proj
10+
# * opencv has a runtime OS dependency on libgl1-mesa-glx
11+
#
12+
# The dev dependencies are installed temporarily to compile the wheels.
13+
# We leave the only the runtime dependencies, to keep the image smaller.
14+
RUN apt-get update && \
15+
# install build and runtime dependencies
16+
apt-get install -y --no-install-recommends \
17+
libgeos-dev \
18+
libproj-dev \
19+
libgeos-c1v5 \
20+
libproj25 \
21+
libgl1-mesa-glx && \
22+
# Install everything except xformers first
23+
grep -v "\bxformers\b" requirements.txt > /tmp/r.txt && pip install -r /tmp/r.txt && \
24+
# Now, install xformers, as it should be able to see torch now
25+
grep "\bxformers\b" requirements.txt > /tmp/r.txt && pip install -r /tmp/r.txt && \
26+
# remove build dependencies
27+
apt-get purge -y libgeos-dev libproj-dev && \
28+
apt-get autoremove -y && \
29+
rm -rf /var/lib/apt/lists/*
30+
31+
WORKDIR /app
32+
RUN mkdir -p ./data ./src ./stages
33+
COPY ./data ./data
34+
COPY ./src ./src
35+
COPY ./stages ./stages
1536

16-
# Install Python packages in the virtual environment
17-
RUN pip install --upgrade pip
18-
RUN pip install torch
19-
RUN pip install wheel setuptools
20-
RUN pip install transformers xformers

supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/README.md

Lines changed: 17 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -11,67 +11,34 @@ The pipeline demonstrates how to:
1111

1212
## Prerequisites
1313

14-
- Python 3.10+
14+
- A Docker runtime with 8GB+ free ram
15+
- GPU is optional, but recommended
1516
- Elasticsearch cluster (cloud or local)
1617
- OpenAI API key - Setup an OpenAI account and create a [secret key](https://platform.openai.com/docs/quickstart)
17-
- 8GB+ RAM
18-
- GPU (optional but recommended)
1918

2019
## Quick Start
2120

22-
1. **Setup Environment**
23-
```bash
24-
rm -rf .venv requirements.txt
25-
python3 -m venv .venv
26-
source .venv/bin/activate
27-
pip install pip-tools
28-
# Recreate requirements.txt
29-
pip-compile
30-
# Install main dependencies
31-
pip install -r requirements.txt
32-
33-
34-
35-
python3 -m venv .venv
36-
source .venv/bin/activate
37-
pip install "python-dotenv[cli]"
38-
pip install -r requirements-torch.txt
39-
pip install -r requirements.txt
40-
41-
# Make sure you have pytorch installed and Python 3.10+
42-
pip install torch torchvision torchaudio
43-
44-
# Create and activate virtual environment
45-
python -m venv env_mmrag
46-
source env_mmrag/bin/activate # Unix/MacOS
47-
# or
48-
.\env_mmrag\Scripts\activate # Windows
49-
50-
# Install dependencies
51-
pip install -r requirements.txt
52-
```
21+
This example runs four stages as docker compose services:
5322

54-
2. **Configure Credentials**
55-
Create a `.env` file:
56-
```env
57-
ELASTICSEARCH_ENDPOINT="your-elasticsearch-endpoint"
58-
ELASTIC_API_KEY="your-elastic-api-key"
59-
OPENAI_API_KEY="your-openai-api-key"
23+
```mermaid
24+
graph TD
25+
verify-file-structure --> generate-embeddings
26+
generate-embeddings --> index-content
27+
index-content --> search-and-analyze
6028
```
6129

62-
3. **Run the Demo**
63-
```bash
64-
# Verify file structure
65-
python stages/01-stage/files_check.py
30+
First, copy [env.example](env.example) to `.env` and fill in values noted inside.
6631

67-
# Generate embeddings
68-
python stages/02-stage/test_embedding_generation.py
32+
Now, enter below to run the pipeline:
33+
```bash
34+
docker compose run --build --rm search-and-analyze
35+
```
6936

70-
# Index content
71-
python stages/03-stage/index_all_modalities.py
37+
The first time takes a while to build the image and download ImageBind weights.
7238

73-
# Search and analyze
74-
python stages/04-stage/rag_crime_analyze.py
39+
If you want to re-run just one stage, add `--no-deps` like this:
40+
```bash
41+
docker compose run --no-deps --build --rm search-and-analyze
7542
```
7643

7744
## Project Structure
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
name: gotham-city-crime-analysis
2+
3+
services:
4+
verify-file-structure:
5+
build:
6+
context: .
7+
container_name: verify-file-structure
8+
restart: 'no' # no need to re-verify file structure
9+
env_file:
10+
- .env
11+
command: python stages/01-stage/files_check.py
12+
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
13+
- "localhost:host-gateway"
14+
15+
generate-embeddings:
16+
depends_on:
17+
verify-file-structure:
18+
condition: service_completed_successfully
19+
build:
20+
context: .
21+
container_name: generate-embeddings
22+
restart: 'no' # no need to re-generate embeddings
23+
env_file:
24+
- .env
25+
command: python stages/02-stage/test_embedding_generation.py
26+
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
27+
- "localhost:host-gateway"
28+
volumes:
29+
- torch-checkpoints:/root/cache/torch/checkpoints/
30+
31+
index-content:
32+
depends_on:
33+
generate-embeddings:
34+
condition: service_completed_successfully
35+
build:
36+
context: .
37+
container_name: index-content
38+
restart: 'no' # no need to re-verify file structure
39+
env_file:
40+
- .env
41+
command: python stages/03-stage/index_all_modalities.py
42+
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
43+
- "localhost:host-gateway"
44+
45+
search-and-analyze:
46+
depends_on:
47+
index-content:
48+
condition: service_completed_successfully
49+
build:
50+
context: .
51+
container_name: search-and-analyze
52+
restart: 'no' # no need to re-verify file structure
53+
env_file:
54+
- .env
55+
command: python stages/04-stage/rag_crime_analyze.py
56+
extra_hosts: # send localhost traffic to the docker host, e.g. your laptop
57+
- "localhost:host-gateway"
58+
59+
volumes:
60+
# Avoid re-downloading a >4GB model checkpoint
61+
torch-checkpoints:

supporting-blog-content/building-multimodal-rag-with-elasticsearch-gotham/requirements.in

Lines changed: 0 additions & 15 deletions
This file was deleted.

0 commit comments

Comments
 (0)