Skip to content

Commit d183c4b

Browse files
feat: Add support for data labeling in UI (feast-dev#5409)
* Add GenAI documentation page to Introduction section Co-Authored-By: Francisco Javier Arceo <[email protected]> * Move GenAI page to getting-started directory and update SUMMARY.md Co-Authored-By: Francisco Javier Arceo <[email protected]> * Update SUMMARY.md * Update genai.md * Add unstructured data transformation and Spark integration details to GenAI documentation Co-Authored-By: Francisco Javier Arceo <[email protected]> * Update genai.md * Add document labeling functionality to Feast UI - Create dedicated document labeling page for RAG text chunk annotation - Add DocumentLabelingPage.tsx with text selection and highlighting - Implement backend endpoint for reading document files - Add document labeling infrastructure with DocumentLabel class - Support relevant/irrelevant labeling for RAG retrieval improvement - Include navigation integration and proper UI routing - Follow existing Feast UI patterns and design conventions Co-Authored-By: Francisco Javier Arceo <[email protected]> Signed-off-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Apply Python code formatting for document labeling files - Format Python files according to ruff standards - Fix whitespace and import ordering issues - Ensure compliance with Feast coding standards Co-Authored-By: Francisco Javier Arceo <[email protected]> Signed-off-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Fix Python test file formatting for CI lint check - Format 9 test files according to ruff standards - Resolve lint-python CI failure in PR #27 - Ensure all Python code meets formatting requirements Co-Authored-By: Francisco Javier Arceo <[email protected]> Signed-off-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Add light blue highlighting for selected text before labeling - Show light blue background when text is selected for labeling - Clean up temporary highlights when label is applied - Improve user experience with visual feedback for text selection Co-Authored-By: Francisco Javier Arceo <[email protected]> Signed-off-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Fix formatting issues for CI checks - Apply ruff formatting to feast/type_map.py - Apply prettier formatting to DocumentLabelingPage.tsx - Ensure all code follows project formatting standards Co-Authored-By: Francisco Javier Arceo <[email protected]> Signed-off-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Fix Python formatting issues for CI compliance - Apply ruff formatting to 21 Python files - Resolve lint-python CI failure by ensuring all files meet formatting standards - Files reformatted: feast/feature_store.py, feast/feature_view.py, and 19 others - Maintain code quality and consistency across the codebase Co-Authored-By: Francisco Javier Arceo <[email protected]> Co-Authored-By: Francisco Javier Arceo <[email protected]> Signed-off-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Fix text selection errors by replacing DOM manipulation with React state management - Remove problematic range.surroundContents() logic that conflicted with React's virtual DOM - Replace manual DOM manipulation with pure React state management approach - Add light blue highlighting for temporary text selection using conditional rendering - Change file path from absolute to relative path (./src/test-document.txt) - Improve text selection reliability and follow React best practices - Resolve 'Failed to execute removeChild' errors during text selection Signed-off-by: Devin AI <devin-ai-integration[bot]@users.noreply.github.com> Co-Authored-By: Francisco Javier Arceo <[email protected]> Co-Authored-By: Francisco Javier Arceo <[email protected]> --------- Signed-off-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
1 parent 6b05153 commit d183c4b

File tree

16 files changed

+656
-2
lines changed

16 files changed

+656
-2
lines changed

docs/getting-started/genai.md

Lines changed: 93 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ The transformation workflow typically involves:
5656
3. **Chunking**: Split documents into smaller, semantically meaningful chunks
5757
4. **Embedding Generation**: Convert text chunks into vector embeddings
5858
5. **Storage**: Store embeddings and metadata in Feast's feature store
59-
6059
### Feature Transformation for LLMs
6160

6261
Feast supports transformations that can be used to:
@@ -66,6 +65,99 @@ Feast supports transformations that can be used to:
6665
* Normalize and preprocess features before serving to LLMs
6766
* Apply custom transformations to adapt features for specific LLM requirements
6867

68+
## Getting Started with Feast for GenAI
69+
70+
### Installation
71+
72+
To use Feast with vector database support, install with the appropriate extras:
73+
74+
```bash
75+
# For Milvus support
76+
pip install feast[milvus,nlp]
77+
78+
# For Elasticsearch support
79+
pip install feast[elasticsearch]
80+
81+
# For Qdrant support
82+
pip install feast[qdrant]
83+
84+
# For SQLite support (Python 3.10 only)
85+
pip install feast[sqlite_vec]
86+
```
87+
88+
### Configuration
89+
90+
Configure your feature store to use a vector database as the online store:
91+
92+
```yaml
93+
project: genai-project
94+
provider: local
95+
registry: data/registry.db
96+
online_store:
97+
type: milvus
98+
path: data/online_store.db
99+
vector_enabled: true
100+
embedding_dim: 384 # Adjust based on your embedding model
101+
index_type: "IVF_FLAT"
102+
103+
offline_store:
104+
type: file
105+
entity_key_serialization_version: 3
106+
```
107+
108+
### Defining Vector Features
109+
110+
Create feature views with vector index support:
111+
112+
```python
113+
from feast import FeatureView, Field, Entity
114+
from feast.types import Array, Float32, String
115+
116+
document = Entity(
117+
name="document_id",
118+
description="Document identifier",
119+
join_keys=["document_id"],
120+
)
121+
122+
document_embeddings = FeatureView(
123+
name="document_embeddings",
124+
entities=[document],
125+
schema=[
126+
Field(
127+
name="vector",
128+
dtype=Array(Float32),
129+
vector_index=True, # Enable vector search
130+
vector_search_metric="COSINE", # Similarity metric
131+
),
132+
Field(name="document_id", dtype=String),
133+
Field(name="content", dtype=String),
134+
],
135+
source=document_source,
136+
ttl=timedelta(days=30),
137+
)
138+
```
139+
140+
### Retrieving Similar Documents
141+
142+
Use the `retrieve_online_documents_v2` method to find similar documents:
143+
144+
```python
145+
# Generate query embedding
146+
query = "How does Feast support vector databases?"
147+
query_embedding = embed_text(query) # Your embedding function
148+
149+
# Retrieve similar documents
150+
context_data = store.retrieve_online_documents_v2(
151+
features=[
152+
"document_embeddings:vector",
153+
"document_embeddings:document_id",
154+
"document_embeddings:content",
155+
],
156+
query=query_embedding,
157+
top_k=3,
158+
distance_metric='COSINE',
159+
).to_df()
160+
```
69161
## Use Cases
70162

71163
### Document Question-Answering
@@ -104,7 +196,6 @@ This integration enables:
104196
- Generating embeddings for millions of text chunks
105197
- Efficiently materializing features to vector databases
106198
- Scaling RAG applications to enterprise-level document repositories
107-
108199
## Learn More
109200

110201
For more detailed information and examples:

sdk/python/feast/document_labeling.py

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
import json
2+
from typing import Any, Dict, List, Optional
3+
4+
from feast.feature import Feature
5+
6+
7+
class DocumentLabel:
8+
def __init__(
9+
self,
10+
chunk_id: str,
11+
document_id: str,
12+
label: str,
13+
confidence: Optional[float] = None,
14+
metadata: Optional[Dict[str, Any]] = None,
15+
):
16+
self.chunk_id = chunk_id
17+
self.document_id = document_id
18+
self.label = label
19+
self.confidence = confidence
20+
self.metadata = metadata or {}
21+
22+
def to_dict(self) -> Dict[str, Any]:
23+
return {
24+
"chunk_id": self.chunk_id,
25+
"document_id": self.document_id,
26+
"label": self.label,
27+
"confidence": self.confidence,
28+
"metadata": self.metadata,
29+
}
30+
31+
@classmethod
32+
def from_dict(cls, data: Dict[str, Any]) -> "DocumentLabel":
33+
return cls(
34+
chunk_id=data["chunk_id"],
35+
document_id=data["document_id"],
36+
label=data["label"],
37+
confidence=data.get("confidence"),
38+
metadata=data.get("metadata", {}),
39+
)
40+
41+
42+
def store_document_label(feature: Feature, label: DocumentLabel) -> None:
43+
if not hasattr(feature, "labels") or feature.labels is None:
44+
if hasattr(feature, "_labels"):
45+
feature._labels = {}
46+
else:
47+
return
48+
49+
labels_dict = feature.labels if hasattr(feature, "labels") else feature._labels
50+
labels_key = "document_labels"
51+
if labels_key not in labels_dict:
52+
labels_dict[labels_key] = "[]"
53+
54+
existing_labels = json.loads(labels_dict[labels_key])
55+
existing_labels.append(label.to_dict())
56+
labels_dict[labels_key] = json.dumps(existing_labels)
57+
58+
59+
def get_document_labels(feature: Feature) -> List[DocumentLabel]:
60+
labels_dict = None
61+
if hasattr(feature, "labels") and feature.labels:
62+
labels_dict = feature.labels
63+
elif hasattr(feature, "_labels") and feature._labels:
64+
labels_dict = feature._labels
65+
66+
if not labels_dict or "document_labels" not in labels_dict:
67+
return []
68+
69+
labels_data = json.loads(labels_dict["document_labels"])
70+
return [DocumentLabel.from_dict(label_dict) for label_dict in labels_data]
71+
72+
73+
def remove_document_label(feature: Feature, chunk_id: str, document_id: str) -> bool:
74+
labels_dict = None
75+
if hasattr(feature, "labels") and feature.labels:
76+
labels_dict = feature.labels
77+
elif hasattr(feature, "_labels") and feature._labels:
78+
labels_dict = feature._labels
79+
80+
if not labels_dict or "document_labels" not in labels_dict:
81+
return False
82+
83+
existing_labels = json.loads(labels_dict["document_labels"])
84+
original_length = len(existing_labels)
85+
86+
filtered_labels = [
87+
label
88+
for label in existing_labels
89+
if not (label["chunk_id"] == chunk_id and label["document_id"] == document_id)
90+
]
91+
92+
if len(filtered_labels) < original_length:
93+
labels_dict["document_labels"] = json.dumps(filtered_labels)
94+
return True
95+
96+
return False

sdk/python/feast/feature_server.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,10 @@ class ChatRequest(BaseModel):
101101
messages: List[ChatMessage]
102102

103103

104+
class ReadDocumentRequest(BaseModel):
105+
file_path: str
106+
107+
104108
def _get_features(request: GetOnlineFeaturesRequest, store: "feast.FeatureStore"):
105109
if request.feature_service:
106110
feature_service = store.get_feature_service(
@@ -356,6 +360,21 @@ async def chat(request: ChatRequest):
356360
# For now, just return dummy text
357361
return {"response": "This is a dummy response from the Feast feature server."}
358362

363+
@app.post("/read-document")
364+
async def read_document_endpoint(request: ReadDocumentRequest):
365+
try:
366+
import os
367+
368+
if not os.path.exists(request.file_path):
369+
return {"error": f"File not found: {request.file_path}"}
370+
371+
with open(request.file_path, "r", encoding="utf-8") as file:
372+
content = file.read()
373+
374+
return {"content": content, "file_path": request.file_path}
375+
except Exception as e:
376+
return {"error": str(e)}
377+
359378
@app.get("/chat")
360379
async def chat_ui():
361380
# Serve the chat UI

ui/src/FeastUISansProviders.tsx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ import FeatureServiceInstance from "./pages/feature-services/FeatureServiceInsta
2222
import DataSourceInstance from "./pages/data-sources/DataSourceInstance";
2323
import RootProjectSelectionPage from "./pages/RootProjectSelectionPage";
2424
import DatasetInstance from "./pages/saved-data-sets/DatasetInstance";
25+
import DocumentLabelingPage from "./pages/document-labeling/DocumentLabelingPage";
2526
import PermissionsIndex from "./pages/permissions/Index";
2627
import LineageIndex from "./pages/lineage/Index";
2728
import NoProjectGuard from "./components/NoProjectGuard";
@@ -145,6 +146,10 @@ const FeastUISansProvidersInner = ({
145146
path="data-set/:datasetName/*"
146147
element={<DatasetInstance />}
147148
/>
149+
<Route
150+
path="document-labeling/"
151+
element={<DocumentLabelingPage />}
152+
/>
148153
<Route path="permissions/" element={<PermissionsIndex />} />
149154
<Route path="lineage/" element={<LineageIndex />} />
150155
</Route>

ui/src/custom-tabs/TabsRegistryContext.tsx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,7 @@ export {
289289
useDataSourceCustomTabs,
290290
useEntityCustomTabs,
291291
useDatasetCustomTabs,
292+
292293
// Routes
293294
useRegularFeatureViewCustomTabRoutes,
294295
useOnDemandFeatureViewCustomTabRoutes,

ui/src/custom-tabs/document-labeling-tab/DocumentLabelingTab.tsx

Whitespace-only changes.

ui/src/custom-tabs/document-labeling-tab/example-config.ts

Whitespace-only changes.

ui/src/custom-tabs/document-labeling-tab/index.ts

Whitespace-only changes.

ui/src/custom-tabs/document-labeling-tab/useDocumentLabelingQuery.tsx

Whitespace-only changes.

ui/src/custom-tabs/types.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,20 @@ interface DatasetCustomTabRegistrationInterface
136136
}: DatasetCustomTabProps) => JSX.Element;
137137
}
138138

139+
// Type for Document Labeling Custom Tabs
140+
interface DocumentLabelingCustomTabProps {
141+
id: string | undefined;
142+
feastObjectQuery: RegularFeatureViewQueryReturnType;
143+
}
144+
interface DocumentLabelingCustomTabRegistrationInterface
145+
extends CustomTabRegistrationInterface {
146+
Component: ({
147+
id,
148+
feastObjectQuery,
149+
...args
150+
}: DocumentLabelingCustomTabProps) => JSX.Element;
151+
}
152+
139153
export type {
140154
CustomTabRegistrationInterface,
141155
RegularFeatureViewQueryReturnType,
@@ -157,4 +171,6 @@ export type {
157171
FeatureCustomTabProps,
158172
DatasetCustomTabRegistrationInterface,
159173
DatasetCustomTabProps,
174+
DocumentLabelingCustomTabRegistrationInterface,
175+
DocumentLabelingCustomTabProps,
160176
};

0 commit comments

Comments
 (0)