A production-ready Retrieval-Augmented Generation (RAG) pipeline with advanced feature store capabilities. Built with Feast feature store, Milvus-lite vector database, and Ollama LLM for document processing and intelligent question answering.
Main dashboard showing query results with context-aware responses, source citations, and relevance scoring
Document processing workflow showing upload progress, chunking process, and embedding generation
System statistics dashboard displaying real-time metrics, document counts, and performance indicators
Question and Answer interface with query input, response display, and document source references
Primary query interface with smart question processing and real-time response generation
π Web Interface (localhost:8000)
β
βββββββββββββββββΌββββββββββββββββ
β FastAPI Server β
β (Feast RAG Pipeline) β
βββββββββββββββββ¬ββββββββββββββββ
β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β π€ Ollama LLM β β π§ Feast Store β β ποΈ Milvus-Lite β
β β β β β β
β β’ llama3.2:3b β β β’ Feature Mgmt β β β’ Vector Store β
β β’ Embeddings β β β’ Online Store β β β’ Similarity β
β β’ Generation β β β’ Registry β β β’ Collections β
β Port: 11434 β β β’ Milvus Backendβ β File-based DB β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
π Document Upload
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Document Processing Pipeline β
β β
β 1. Parse Document β 2. Chunk Text β 3. Generate Embeddings β
β 4. Store in Feast β 5. Sync to Milvus β 6. Index Vectors β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query Processing β
β β
β 1. User Question β 2. Query Embedding β 3. Vector Search β
β 4. Retrieve Context β 5. LLM Generation β 6. Return Answer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- πͺ Enterprise Feature Store - Feast for advanced feature management & serving
- π High-Performance Vector DB - Milvus-lite for scalable similarity search
- π€ Advanced LLM - Ollama with llama3.2:3b (3B parameters)
- π§ Smart Embeddings - all-MiniLM-L6-v2 (384 dimensions)
- π 100% Local Processing - No data leaves your machine
- π Modern Web UI - Responsive interface with real-time updates
- Multi-format Support - PDF, Markdown, Text, and Word documents
- Smart Chunking - Intelligent text segmentation with overlap
- Original Filename Preservation - Maintains document identity
- Real-time Processing - Live feedback during upload
- Seamless Clear Operations - PyMilvus-based collection management
- Semantic Search - Advanced vector similarity retrieval
- Context-aware Responses - LLM with retrieved document context
- Source Attribution - Detailed citations with relevance scores
- Flexible Context Limits - Configurable result count
- Real-time Stats - Live document count and system metrics
- Refresh Stats - Real-time system status updates
- Clear All Documents - Complete collection reset with PyMilvus
- Health Monitoring - Comprehensive system health checks
- Performance Metrics - Document count, chunk statistics
- Error Handling - Graceful failure recovery
Component | Technology | Version | Purpose |
---|---|---|---|
API Framework | FastAPI | 0.104.1+ | REST API & Web UI |
Feature Store | Feast | 0.51.0+ | Feature management & registry |
Vector Database | Milvus-lite | 2.3.0+ | File-based vector storage |
LLM Engine | Ollama | Latest | Local language model serving |
Language Model | llama3.2:3b | 3B params | Text generation & reasoning |
Embedding Model | all-MiniLM-L6-v2 | 384 dims | Document & query embeddings |
Container Engine | Podman/Docker | Latest | Optional containerization |
- Python 3.12+ (required)
- Poetry (recommended) or pip (alternative)
- Ollama (for LLM serving)
- At least 8GB RAM (16GB recommended for optimal performance)
- 5GB+ disk space (for models and data)
# Clone the repository
git clone <repo-url>
cd rag-project
# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies with Poetry
poetry install --with=test,lint
# Activate virtual environment
poetry shell
# Clone the repository
git clone <repo-url>
cd rag-project
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txt
# Initialize Feast feature store
cd feast_feature_repo
feast apply
cd ..
# Start Ollama (in a separate terminal)
ollama serve
# Pull required models
ollama pull llama3.2:3b
# Development mode (auto-reload)
make dev
# or
poetry run uvicorn src.api:app --host 0.0.0.0 --port 8000 --reload
# Production mode
make run
# or
poetry run uvicorn src.api:app --host 0.0.0.0 --port 8000
# Start the FastAPI server
uvicorn src.api:app --host 0.0.0.0 --port 8000
- Web UI: http://localhost:8000
- API Docs: http://localhost:8000/docs
curl -X GET "http://localhost:8000/health"
Response:
{
"status": "healthy",
"feast_store": "True",
"milvus_connection": "False",
"embedding_model": "True",
"message": "Feast RAG pipeline is running with unified Milvus backend"
}
curl -X GET "http://localhost:8000/stats"
Response:
{
"pipeline_status": "ready",
"vector_store_stats": {
"collection_name": "rag_document_embeddings",
"document_count": 3,
"chunk_count": 15,
"backend": "feast_milvus_lite"
},
"embedding_model": "all-MiniLM-L6-v2",
"llm_model": "llama3.2:3b"
}
curl -X POST "http://localhost:8000/ingest" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@sample_docs/sample_document.md"
Response:
{
"message": "Successfully ingested sample_document.md with 5 chunks using feast_official",
"chunks_created": 5,
"source": "sample_document.md",
"metadata": {
"storage_method": "feast_official",
"status": "success",
"file_name": "sample_document.md",
"document_id": "feast_sample_document.md_5"
}
}
curl -X POST "http://localhost:8000/query" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"question": "What are the key features of this system?",
"context_limit": 5
}'
Response:
{
"answer": "Based on the provided documents, the key features include...",
"sources": [
{
"text": "Feature store capabilities with Feast...",
"metadata": {
"document_title": "sample_document.md",
"chunk_index": 0,
"file_path": "/path/to/document.md"
},
"similarity_score": 0.92
}
],
"context_used": 3,
"relevance_scores": [0.92, 0.87, 0.84]
}
curl -X GET "http://localhost:8000/documents"
Response:
{
"documents": [
{
"title": "sample_document.md",
"chunks": 5,
"status": "processed"
}
],
"total_count": 1,
"backend": "feast_milvus"
}
curl -X DELETE "http://localhost:8000/documents"
Response:
{
"status": "success",
"message": "Successfully cleared all documents from Feast Milvus database",
"backend": "feast_milvus"
}
The web interface provides:
- π€ Document Upload - Drag & drop interface supporting PDF, MD, TXT, DOCX
- π Intelligent Query - Natural language questions with context-aware responses
- π System Dashboard - Real-time monitoring and statistics
- ποΈ Document Management - List, view, and clear uploaded documents
- π Refresh Stats - Live system status updates
rag-project/
βββ src/ # Core application code
β βββ api.py # FastAPI server & endpoints
β βββ feast_rag_pipeline.py # Main RAG pipeline with Feast
β βββ feast_rag_retriever.py # Feast-based document retrieval
β βββ __init__.py
βββ feast_feature_repo/ # Feast feature store configuration
β βββ feature_store.yaml # Feast configuration
β βββ feature_definitions.py # Feature views & entities
β βββ data/ # Feature store data (excluded from git)
βββ static/ # Web interface files
β βββ index.html # Main web UI
β βββ script.js # Frontend JavaScript
β βββ style.css # UI styling
βββ sample_docs/ # Example documents & screenshots
β βββ ui_screenshots/ # Web interface screenshots
βββ requirements.txt # Python dependencies
βββ requirements-dev.txt # Development dependencies
βββ README.md # This file
project: rag
provider: local
registry: data/registry.db
online_store:
type: milvus
path: data/online_store.db
vector_enabled: true
embedding_dim: 384
index_type: "FLAT"
metric_type: "COSINE"
offline_store:
type: file
entity_key_serialization_version: 3
auth:
type: no_auth
# Optional configuration
export FEAST_REPO_PATH="feast_feature_repo"
export OLLAMA_HOST="localhost"
export OLLAMA_PORT="11434"
export LLM_MODEL="llama3.2:3b"
export EMBEDDING_MODEL="all-MiniLM-L6-v2"
-
Feast repository not found
cd feast_feature_repo feast apply
-
Ollama model not available
# Pull required models ollama pull llama3.2:3b ollama list # Verify models are installed
-
Collection not found after clear
cd feast_feature_repo && feast apply # Restart the server to pick up recreated collection # The system automatically handles collection recreation
-
Port conflicts
# Use different port uvicorn src.api:app --host 0.0.0.0 --port 8000
# Check service status
curl http://localhost:8000/health
curl http://localhost:8000/stats
# Check Ollama models
curl http://localhost:11434/api/tags
# Verify Feast setup
cd feast_feature_repo
feast entities list
feast feature-views list
- Minimum: 8GB RAM, 4 CPU cores, 5GB storage
- Recommended: 16GB RAM, 8 CPU cores, 20GB storage
- Optimal: 32GB RAM, 16 CPU cores, 50GB SSD
# For better quality (requires more resources)
ollama pull llama3.2:3b
# For faster performance (lower quality)
ollama pull llama3.2:1b
# Update model in configuration
# Edit src/feast_rag_pipeline.py, line with model_name
- β Simplified deployment: No external containers required
- β
Single file database: Everything in
feast_feature_repo/data/online_store.db
- β Production ready: Proven integration with Feast
- β Portable: Easy to backup and version control
- β Fast startup: No complex container orchestration
# Using the deploy directory
cd deploy
./run.sh
# Or manually with docker-compose
docker-compose up --build -d
# Apply Kubernetes manifests
kubectl apply -f deploy/k8s-deployment.yaml
# Check deployment status
kubectl get pods -n feast-rag-pipeline
Set environment variables for customization:
export RAG_API_PORT=9000
export RAG_LLM_MODEL=llama3.2:7b
export RAG_DEBUG_MODE=true
- Configure persistent volumes for Feast data
- Set appropriate resource limits (CPU/Memory)
- Configure Ollama models for your use case
- Set up monitoring and logging
- Configure backup for Milvus database
π Full deployment guide: deploy/README.md
Run the test suite to verify your setup:
# Run all tests with verbose output
make test
# or
poetry run pytest -v
# Run tests with coverage
make test-cov
# or
poetry run pytest --cov=src --cov-report=html
# Run specific test class
poetry run pytest tests/test_rag_pipeline.py::TestFeastRAGPipeline -v
# Stop on first failure
poetry run pytest -x
# Format code before testing
make format
# Run all linting checks
make lint
# Run all tests with verbose output
python -m pytest tests/test_rag_pipeline.py -v
# Run tests with coverage (install pytest-cov first)
pip install pytest-cov
python -m pytest tests/test_rag_pipeline.py --cov=src
# Run specific test class
python -m pytest tests/test_rag_pipeline.py::TestFeastRAGPipeline -v
# Stop on first failure
python -m pytest tests/test_rag_pipeline.py -x
Test Coverage:
- β Pipeline initialization & error handling
- β Document processing with Feast integration
- β Query processing and retrieval
- β Collection clearing operations
- β Embedding generation functionality
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Feast - Feature store for ML
- Milvus - Vector database for AI
- Ollama - Local LLM serving
- FastAPI - Modern Python web framework
- Sentence Transformers - Embedding models