Skip to content

Commit a1388a5

Browse files
feat: Add Milvus tutorial with Feast integration (feast-dev#5292)
* feat: Add Milvus tutorial with Feast integration Signed-off-by: Yassin Nouh <[email protected]> * Update examples/online_store/milvus_tutorial/README.md Co-authored-by: Francisco Arceo <[email protected]> Signed-off-by: Yassin Nouh <[email protected]> --------- Signed-off-by: Yassin Nouh <[email protected]> Co-authored-by: Francisco Arceo <[email protected]>
1 parent d8f1c97 commit a1388a5

File tree

4 files changed

+304
-0
lines changed

4 files changed

+304
-0
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Milvus Tutorial with Feast
2+
3+
This tutorial demonstrates how to use Milvus as a vector database backend for Feast. You'll learn how to set up Milvus, create embeddings, store them in Feast, and perform similarity searches.
4+
5+
## Prerequisites
6+
7+
- Python 3.10+
8+
- Docker (for running Milvus)
9+
- Feast installed (`pip install 'feast[milvus]'`)
10+
11+
## Setup
12+
13+
1. Start Milvus containers with Docker Compose:
14+
15+
```bash
16+
docker compose up -d
17+
```
18+
19+
This will start three containers:
20+
- `milvus-standalone`: The Milvus server
21+
- `milvus-etcd`: For metadata storage
22+
- `milvus-minio`: For object storage
23+
24+
2. Wait until all containers are healthy (this may take a minute or two):
25+
26+
```bash
27+
docker ps
28+
```
29+
30+
## Project Structure
31+
32+
```
33+
milvus_tutorial/
34+
├── README.md
35+
├── feature_store.yaml # Feast configuration
36+
├── docker-compose.yml # Docker Compose configuration for Milvus
37+
├── data/ # Data directory
38+
│ └── sample_data.parquet # Sample data with embeddings (generated by the script)
39+
└── milvus_example.py # Example script
40+
```
41+
42+
## Tutorial Steps
43+
44+
1. Configure Feast with Milvus
45+
2. Generate sample data with embeddings
46+
3. Define feature views
47+
4. Register and apply feature definitions
48+
5. Perform vector similarity search
49+
50+
Run the complete example:
51+
52+
```bash
53+
python milvus_example.py
54+
```
55+
56+
## How It Works
57+
58+
This tutorial demonstrates:
59+
60+
- Setting up Milvus as a vector database
61+
- Configuring Feast to use Milvus as the online store
62+
- Generating embeddings for text data
63+
- Storing embeddings in Feast feature views
64+
- Performing vector similarity searches using Feast's retrieval API
65+
66+
Milvus is a powerful vector database designed for efficient similarity searches, making it an excellent choice for applications like semantic search and recommendation systems.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
version: "3.9"
2+
3+
services:
4+
etcd:
5+
image: quay.io/coreos/etcd:v3.5.18
6+
command: >
7+
etcd -advertise-client-urls=http://etcd:2379
8+
-listen-client-urls http://0.0.0.0:2379
9+
volumes: ["./volumes/etcd:/etcd"]
10+
healthcheck:
11+
test: ["CMD", "etcdctl", "endpoint", "health"]
12+
interval: 30s
13+
14+
minio:
15+
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
16+
environment:
17+
MINIO_ACCESS_KEY: minioadmin
18+
MINIO_SECRET_KEY: minioadmin
19+
command: server /data --console-address ":9001"
20+
volumes: ["./volumes/minio:/data"]
21+
ports: ["9000:9000", "9001:9001"]
22+
23+
milvus:
24+
image: milvusdb/milvus:v2.5.10
25+
command: ["milvus", "run", "standalone"]
26+
environment:
27+
ETCD_ENDPOINTS: etcd:2379
28+
MINIO_ADDRESS: minio:9000
29+
depends_on: [etcd, minio]
30+
volumes: ["./volumes/milvus:/var/lib/milvus"]
31+
ports: ["19530:19530", "9091:9091"]
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
project: milvus_tutorial
2+
provider: local
3+
registry: data/registry.db
4+
online_store:
5+
type: milvus
6+
host: localhost
7+
port: 19530
8+
vector_enabled: true
9+
embedding_dim: 384
10+
index_type: "FLAT"
11+
metric_type: "L2"
12+
13+
offline_store:
14+
type: file
15+
16+
entity_key_serialization_version: 3
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Milvus Tutorial with Feast
2+
#
3+
# This example demonstrates how to use Milvus
4+
# as a vector database backend for Feast.
5+
6+
import os
7+
import subprocess
8+
from datetime import datetime, timedelta
9+
10+
import pandas as pd
11+
12+
# For generating embeddings
13+
try:
14+
from sentence_transformers import SentenceTransformer
15+
except ImportError:
16+
print("Installing sentence_transformers...")
17+
subprocess.check_call(["pip", "install", "sentence-transformers"])
18+
from sentence_transformers import SentenceTransformer
19+
20+
from feast import FeatureStore, Entity, FeatureView, Field, FileSource
21+
from feast.data_format import ParquetFormat
22+
from feast.types import Float32, Array, String
23+
from feast.value_type import ValueType
24+
25+
# Create data directory if it doesn't exist
26+
os.makedirs("data", exist_ok=True)
27+
28+
29+
# Step 1: Generate sample data with embeddings
30+
def generate_sample_data():
31+
print("Generating sample data with embeddings...")
32+
33+
# Sample product data
34+
products = [
35+
{"id": 1, "name": "Smartphone",
36+
"description": "A high-end smartphone with advanced camera features and long battery life."},
37+
{"id": 2, "name": "Laptop",
38+
"description": "Powerful laptop with fast processor and high-resolution display for professional use."},
39+
{"id": 3, "name": "Headphones",
40+
"description": "Wireless noise-cancelling headphones with premium sound quality."},
41+
{"id": 4, "name": "Smartwatch",
42+
"description": "Fitness tracking smartwatch with heart rate monitoring and sleep analysis."},
43+
{"id": 5, "name": "Tablet",
44+
"description": "Lightweight tablet with vibrant display perfect for reading and browsing."},
45+
{"id": 6, "name": "Camera",
46+
"description": "Professional digital camera with high-resolution sensor and interchangeable lenses."},
47+
{"id": 7, "name": "Speaker",
48+
"description": "Bluetooth speaker with rich bass and long battery life for outdoor use."},
49+
{"id": 8, "name": "Gaming Console",
50+
"description": "Next-generation gaming console with 4K graphics and fast loading times."},
51+
{"id": 9, "name": "E-reader",
52+
"description": "E-ink display reader with backlight for comfortable reading in any lighting condition."},
53+
{"id": 10, "name": "Smart TV",
54+
"description": "4K smart television with built-in streaming apps and voice control."}
55+
]
56+
57+
# Create DataFrame
58+
df = pd.DataFrame(products)
59+
60+
# Generate embeddings using sentence-transformers
61+
model = SentenceTransformer('all-MiniLM-L6-v2') # Small, fast model with 384-dim embeddings
62+
embeddings = model.encode(df['description'].tolist())
63+
64+
# Add embeddings and timestamp to DataFrame
65+
df['embedding'] = embeddings.tolist()
66+
df['event_timestamp'] = datetime.now() - timedelta(days=1)
67+
df['created_timestamp'] = datetime.now() - timedelta(days=1)
68+
69+
# Save to parquet file
70+
parquet_path = "data/sample_data.parquet"
71+
df.to_parquet(parquet_path, index=False)
72+
73+
print(f"Sample data saved to {parquet_path}")
74+
return parquet_path
75+
76+
77+
# Step 2: Define feature repository
78+
def create_feature_definitions(data_path):
79+
print("Creating feature definitions...")
80+
81+
product = Entity(
82+
name="product_id",
83+
description="Product ID",
84+
join_keys=["id"],
85+
value_type=ValueType.INT64,
86+
)
87+
88+
source = FileSource(
89+
file_format=ParquetFormat(),
90+
path=data_path,
91+
timestamp_field="event_timestamp",
92+
created_timestamp_column="created_timestamp",
93+
)
94+
95+
# Define feature view with vector embeddings
96+
product_embeddings = FeatureView(
97+
name="product_embeddings",
98+
entities=[product],
99+
ttl=timedelta(days=30),
100+
schema=[
101+
Field(
102+
name="embedding",
103+
dtype=Array(Float32),
104+
vector_index=True, # Mark as vector field
105+
),
106+
Field(name="name", dtype=String),
107+
Field(name="description", dtype=String),
108+
],
109+
source=source,
110+
online=True,
111+
)
112+
113+
return product, product_embeddings
114+
115+
116+
def setup_feature_store(product, product_embeddings):
117+
print("Setting up feature store...")
118+
119+
store = FeatureStore(repo_path=".")
120+
121+
store.apply([product, product_embeddings])
122+
123+
# Materialize features to online store
124+
store.materialize(
125+
start_date=datetime.now() - timedelta(days=2),
126+
end_date=datetime.now(),
127+
)
128+
129+
print("Feature store setup complete")
130+
return store
131+
132+
133+
# Step 4: Perform vector similarity search
134+
def perform_similarity_search(store, query_text: str, top_k: int = 3):
135+
print(f"\nPerforming similarity search for: '{query_text}'")
136+
137+
# Generate embedding for query text
138+
model = SentenceTransformer('all-MiniLM-L6-v2')
139+
query_embedding = model.encode(query_text).tolist()
140+
141+
# Perform similarity search using vector embeddings with version 2 API
142+
try:
143+
results = store.retrieve_online_documents_v2(
144+
features=["product_embeddings:embedding", "product_embeddings:name", "product_embeddings:description"],
145+
query=query_embedding,
146+
top_k=top_k,
147+
distance_metric="L2"
148+
).to_df()
149+
150+
# Print results
151+
print(f"\nTop {top_k} similar products:")
152+
for i, row in results.iterrows():
153+
print(f"\n{i + 1}. Name: {row['product_embeddings__name']}")
154+
print(f" Description: {row['product_embeddings__description']}")
155+
print(f" Distance: {row['distance']}")
156+
157+
return results
158+
except Exception as e:
159+
print(f"Error performing search: {e}")
160+
return None
161+
162+
163+
# Main function to run the example
164+
def main():
165+
print("=== Milvus Tutorial with Feast ===")
166+
167+
# Check if Milvus is running
168+
print("\nEnsure Milvus is running:")
169+
print("docker compose up -d")
170+
171+
input("\nPress Enter to continue once Milvus is ready...")
172+
173+
# Generate sample data
174+
data_path = generate_sample_data()
175+
176+
# Create feature definitions
177+
product, product_embeddings = create_feature_definitions(data_path)
178+
179+
# Setup feature store
180+
store = setup_feature_store(product, product_embeddings)
181+
182+
# Perform similarity searches
183+
perform_similarity_search(store, "wireless audio device with good sound", top_k=3)
184+
perform_similarity_search(store, "portable computing device for work", top_k=3)
185+
186+
print("\n=== Tutorial Complete ===")
187+
print("You've successfully set up Milvus with Feast and performed vector similarity searches!")
188+
189+
190+
if __name__ == "__main__":
191+
main()

0 commit comments

Comments
 (0)