Skip to content

Commit c3a0117

Browse files
committed
add notebook
Signed-off-by: cmuhao <[email protected]>
1 parent e8c3882 commit c3a0117

File tree

10 files changed

+3785
-5
lines changed

10 files changed

+3785
-5
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ terraform.tfstate.backup
1111
.vscode/*
1212
**/derby.log
1313
**/metastore_db/*
14-
.env
14+
.env
15+
.idea

module_4_rag/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
data/*

module_4_rag/.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.9

module_4_rag/batch_score_documents.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
import torch
55
import torch.nn.functional as F
66

7-
INPUT_FILENAME = "city_wikipedia_summaries.csv"
8-
EXPORT_FILENAME = "city_wikipedia_summaries_with_embeddings.csv"
7+
INPUT_FILENAME = "./data/city_wikipedia_summaries.csv"
8+
EXPORT_FILENAME = "./data/city_wikipedia_summaries_with_embeddings.parquet"
99
TOKENIZER = 'sentence-transformers/all-MiniLM-L6-v2'
1010
MODEL = 'sentence-transformers/all-MiniLM-L6-v2'
1111

@@ -35,8 +35,10 @@ def score_data() -> None:
3535
print('shape = ', df.shape)
3636
df['Embeddings'] = list(embeddings.detach().cpu().numpy())
3737
print("embeddings generated...")
38+
df['event_timestamp'] = pd.to_datetime('today')
39+
df["item_id"] = df.index
3840
print(df.head())
39-
df.to_csv(EXPORT_FILENAME, index=False)
41+
df.to_parquet(EXPORT_FILENAME, index=False)
4042
print("...data exported. job complete")
4143
else:
4244
print("scored data found...skipping generating embeddings.")
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from feast import Entity
2+
3+
item = Entity(name="item_id")
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
project: feast_demo_local
2+
provider: local
3+
registry:
4+
registry_type: sql
5+
path: postgresql://@localhost:5432/feast
6+
online_store:
7+
type: postgres
8+
pgvector_enabled: true
9+
vector_len: 384
10+
host: 127.0.0.1
11+
port: 5432
12+
database: feast
13+
user: ""
14+
password: ""
15+
16+
17+
offline_store:
18+
type: file
19+
entity_key_serialization_version: 2
20+
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
from datetime import timedelta
2+
3+
from feast import (
4+
FeatureView,
5+
Field, FileSource,
6+
)
7+
from feast.data_format import ParquetFormat
8+
from feast.types import Float32, Array
9+
from entities import item
10+
11+
12+
parquet_file_path = "../data/city_wikipedia_summaries_with_embeddings.parquet"
13+
14+
source = FileSource(
15+
file_format=ParquetFormat(),
16+
path=parquet_file_path,
17+
timestamp_field="event_timestamp",
18+
)
19+
20+
city_embeddings_feature_view = FeatureView(
21+
name="city_embeddings",
22+
entities=[item],
23+
schema=[
24+
Field(name="Embeddings", dtype=Array(Float32)),
25+
],
26+
source=source,
27+
ttl=timedelta(hours=2),
28+
)

module_4_rag/feature_repo/module_1.ipynb

Lines changed: 354 additions & 0 deletions
Large diffs are not rendered by default.

module_4_rag/poetry.lock

Lines changed: 3370 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

module_4_rag/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ packages = [{include = "feast_rag"}]
88

99
[tool.poetry.dependencies]
1010
python = "^3.9"
11-
feast = "^0.35.0"
11+
feast = "^0.37.0"
1212
torch = "^2.2.0"
1313
flasgger = "^0.9.7.1"
1414
wikipedia = "^1.4.0"

0 commit comments

Comments
 (0)