Skip to content

Commit 7c698ba

Browse files
Merge pull request #20 from feast-dev/rag
uploading rag demo
2 parents 443c130 + cd4b28b commit 7c698ba

23 files changed

+4809
-5
lines changed

.github/workflows/feast_apply_aws.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
id: setup-python
1414
uses: actions/setup-python@v2
1515
with:
16-
python-version: "3.7"
16+
python-version: "3.9"
1717
architecture: x64
1818
- name: Set up AWS SDK
1919
uses: aws-actions/configure-aws-credentials@v1

.github/workflows/feast_apply_gcp.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
id: setup-python
1414
uses: actions/setup-python@v2
1515
with:
16-
python-version: "3.7"
16+
python-version: "3.9"
1717
architecture: x64
1818
- name: Set up Cloud SDK
1919
uses: google-github-actions/setup-gcloud@v0

.github/workflows/feast_plan_aws.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
id: setup-python
1111
uses: actions/setup-python@v2
1212
with:
13-
python-version: "3.7"
13+
python-version: "3.9"
1414
architecture: x64
1515
- name: Set up AWS SDK
1616
uses: aws-actions/configure-aws-credentials@v1

.github/workflows/feast_plan_gcp.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
id: setup-python
1111
uses: actions/setup-python@v2
1212
with:
13-
python-version: "3.7"
13+
python-version: "3.9"
1414
architecture: x64
1515
- name: Set up Cloud SDK
1616
uses: google-github-actions/setup-gcloud@v0

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ terraform.tfstate.backup
1111
.vscode/*
1212
**/derby.log
1313
**/metastore_db/*
14-
.env
14+
.env
15+
.idea

module_4_rag/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
data/*

module_4_rag/.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.9

module_4_rag/Dockerfile

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
FROM python:3.9
2+
3+
# Set environment varibles
4+
ENV PYTHONDONTWRITEBYTECODE 1
5+
ENV PYTHONUNBUFFERED 1
6+
7+
# Set work directory
8+
WORKDIR /code
9+
10+
11+
# Install dependencies
12+
RUN LIBMEMCACHED=/opt/local
13+
RUN apt-get update && apt-get install -y \
14+
libmemcached11 \
15+
libmemcachedutil2 \
16+
libmemcached-dev \
17+
libz-dev \
18+
curl \
19+
gettext
20+
21+
ENV PYTHONHASHSEED=random \
22+
PIP_NO_CACHE_DIR=off \
23+
PIP_DISABLE_PIP_VERSION_CHECK=on \
24+
PIP_DEFAULT_TIMEOUT=100 \
25+
# Poetry's configuration: \
26+
POETRY_NO_INTERACTION=1 \
27+
POETRY_VIRTUALENVS_CREATE=false \
28+
POETRY_CACHE_DIR='/var/cache/pypoetry' \
29+
POETRY_HOME='/usr/local' \
30+
POETRY_VERSION=1.4.1
31+
32+
RUN curl -sSL https://install.python-poetry.org | python3 - --version $POETRY_VERSION
33+
34+
COPY pyproject.toml poetry.lock /code/
35+
RUN poetry install --no-interaction --no-ansi --no-root
36+
37+
COPY . ./code/

module_4_rag/README.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
This is a demo to show how you can use Feast to do RAG
2+
3+
## Installation via PyEnv and Poetry
4+
5+
This demo assumes you have Pyenv (2.3.10) and Poetry (1.4.1) installed on your machine as well as Python 3.9.
6+
7+
```bash
8+
pyenv local 3.9
9+
poetry shell
10+
poetry install
11+
```
12+
## Setting up the data and Feast
13+
14+
To fetch the data simply run
15+
```bash
16+
python pull_states.py
17+
```
18+
Which will output a file called `city_wikipedia_summaries.csv`.
19+
20+
Then run
21+
```bash
22+
python batch_score_documents.py
23+
```
24+
Which will output data to `data/city_wikipedia_summaries_with_embeddings.parquet`
25+
26+
Next we'll need to do some Feast work and move the data into a repo created by
27+
Feast.
28+
29+
## Feast
30+
31+
To get started, make sure to have Feast installed and PostGreSQL.
32+
33+
First run
34+
```bash
35+
cp ./data feature_repo/
36+
```
37+
38+
And then open the `module_4.ipynb` notebook and follow those instructions.
39+
40+
It will walk you through a trivial tutorial to retrieve the top `k` most similar
41+
documents using PGVector.
42+
43+
# Overview
44+
45+
The overview is relatively simple, the goal is to define an architecture
46+
to support the following:
47+
48+
```mermaid
49+
flowchart TD;
50+
A[Pull Data] --> B[Batch Score Embeddings];
51+
B[Batch Score Embeddings] --> C[Materialize Online];
52+
C[Materialize Online] --> D[Retrieval Augmented Generation];
53+
```
54+
55+
# Results
56+
57+
The simple demo shows the code below with the retrieved data shown.
58+
59+
```python
60+
import pandas as pd
61+
62+
from feast import FeatureStore
63+
from batch_score_documents import run_model, TOKENIZER, MODEL
64+
from transformers import AutoTokenizer, AutoModel
65+
66+
df = pd.read_parquet("./feature_repo/data/city_wikipedia_summaries_with_embeddings.parquet")
67+
68+
store = FeatureStore(repo_path=".")
69+
70+
# Prepare a query vector
71+
question = "the most populous city in the U.S. state of Texas?"
72+
73+
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER)
74+
model = AutoModel.from_pretrained(MODEL)
75+
query_embedding = run_model(question, tokenizer, model)
76+
query = query_embedding.detach().cpu().numpy().tolist()[0]
77+
78+
# Retrieve top k documents
79+
features = store.retrieve_online_documents(
80+
feature="city_embeddings:Embeddings",
81+
query=query,
82+
top_k=3
83+
)
84+
```
85+
And running `features_df.head()` will show:
86+
87+
```
88+
features_df.head()
89+
Embeddings distance
90+
0 [0.11749928444623947, -0.04684492573142052, 0.... 0.935567
91+
1 [0.10329511761665344, -0.07897591590881348, 0.... 0.939936
92+
2 [0.11634305864572525, -0.10321836173534393, -0... 0.983343
93+
```

module_4_rag/app.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
from flask import (
2+
Flask,
3+
jsonify,
4+
request,
5+
render_template,
6+
)
7+
from flasgger import Swagger
8+
from datetime import datetime
9+
10+
app = Flask(__name__)
11+
swagger = Swagger(app)
12+
13+
14+
@app.route("/get_documents")
15+
def get_documents():
16+
"""Example endpoint returning features by id
17+
This is using docstrings for specifications.
18+
---
19+
parameters:
20+
- name: state
21+
type: string
22+
in: query
23+
required: true
24+
default: NJ
25+
responses:
26+
200:
27+
description: A JSON of documents
28+
schema:
29+
id: Document ID
30+
properties:
31+
is_gt_18_years_old:
32+
type: array
33+
items:
34+
schema:
35+
id: value
36+
type: number
37+
"""
38+
question = request.form["question"]
39+
documents = store.get_online_documents(query)
40+
return render_template("documents.html", documents=documents)
41+
42+
43+
@app.route("/")
44+
def home():
45+
return render_template("home.html")
46+
47+
48+
if __name__ == "__main__":
49+
app.run(debug=True)

0 commit comments

Comments
 (0)