Skip to content

abhijeet-dhumal/rag-pipeline

Repository files navigation

πŸš€ Advanced RAG Pipeline with Feast Feature Store + Milvus Vector DB

License: MIT Python 3.11+ FastAPI Ollama Feast Milvus

A production-ready Retrieval-Augmented Generation (RAG) pipeline with advanced feature store capabilities. Built with Feast feature store, Milvus-lite vector database, and Ollama LLM for document processing and intelligent question answering.

πŸ“Έ Web Interface

Main Dashboard

Query Results Main dashboard showing query results with context-aware responses, source citations, and relevance scoring

Document Processing Pipeline

Document Processing Document processing workflow showing upload progress, chunking process, and embedding generation

System Dashboard Statistics & Monitoring

Dashboard Interface System statistics dashboard displaying real-time metrics, document counts, and performance indicators

QnA

Upload Interface Question and Answer interface with query input, response display, and document source references

Query Interface

Query Interface Primary query interface with smart question processing and real-time response generation

πŸ—οΈ System Architecture

                    🌐 Web Interface (localhost:8000)
                                    β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚        FastAPI Server        β”‚
                    β”‚     (Feast RAG Pipeline)     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                           β”‚                           β”‚
        β–Ό                           β–Ό                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ€– Ollama LLM  β”‚    β”‚ 🧠 Feast Store  β”‚     β”‚ πŸ—„οΈ Milvus-Lite  β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β€’ llama3.2:3b   β”‚    β”‚ β€’ Feature Mgmt  β”‚    β”‚ β€’ Vector Store  β”‚
β”‚ β€’ Embeddings    β”‚    β”‚ β€’ Online Store  β”‚    β”‚ β€’ Similarity    β”‚
β”‚ β€’ Generation    β”‚    β”‚ β€’ Registry      β”‚    β”‚ β€’ Collections   β”‚
β”‚ Port: 11434     β”‚    β”‚ β€’ Milvus Backendβ”‚    β”‚  File-based DB  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ Data Flow

πŸ“„ Document Upload
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                Document Processing Pipeline                 β”‚
β”‚                                                             β”‚
β”‚ 1. Parse Document β†’ 2. Chunk Text β†’ 3. Generate Embeddings β”‚
β”‚ 4. Store in Feast β†’ 5. Sync to Milvus β†’ 6. Index Vectors   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Query Processing                         β”‚
β”‚                                                             β”‚
β”‚ 1. User Question β†’ 2. Query Embedding β†’ 3. Vector Search    β”‚
β”‚ 4. Retrieve Context β†’ 5. LLM Generation β†’ 6. Return Answer  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Key Features

🎯 Core Capabilities

  • πŸͺ Enterprise Feature Store - Feast for advanced feature management & serving
  • πŸš€ High-Performance Vector DB - Milvus-lite for scalable similarity search
  • πŸ€– Advanced LLM - Ollama with llama3.2:3b (3B parameters)
  • 🧠 Smart Embeddings - all-MiniLM-L6-v2 (384 dimensions)
  • πŸ”’ 100% Local Processing - No data leaves your machine
  • 🌐 Modern Web UI - Responsive interface with real-time updates

πŸ“„ Document Management

  • Multi-format Support - PDF, Markdown, Text, and Word documents
  • Smart Chunking - Intelligent text segmentation with overlap
  • Original Filename Preservation - Maintains document identity
  • Real-time Processing - Live feedback during upload
  • Seamless Clear Operations - PyMilvus-based collection management

πŸ” Query & Retrieval

  • Semantic Search - Advanced vector similarity retrieval
  • Context-aware Responses - LLM with retrieved document context
  • Source Attribution - Detailed citations with relevance scores
  • Flexible Context Limits - Configurable result count
  • Real-time Stats - Live document count and system metrics

πŸ› οΈ System Management

  • Refresh Stats - Real-time system status updates
  • Clear All Documents - Complete collection reset with PyMilvus
  • Health Monitoring - Comprehensive system health checks
  • Performance Metrics - Document count, chunk statistics
  • Error Handling - Graceful failure recovery

πŸ› οΈ Tech Stack

Component Technology Version Purpose
API Framework FastAPI 0.104.1+ REST API & Web UI
Feature Store Feast 0.51.0+ Feature management & registry
Vector Database Milvus-lite 2.3.0+ File-based vector storage
LLM Engine Ollama Latest Local language model serving
Language Model llama3.2:3b 3B params Text generation & reasoning
Embedding Model all-MiniLM-L6-v2 384 dims Document & query embeddings
Container Engine Podman/Docker Latest Optional containerization

πŸ“‹ Prerequisites

  • Python 3.12+ (required)
  • Poetry (recommended) or pip (alternative)
  • Ollama (for LLM serving)
  • At least 8GB RAM (16GB recommended for optimal performance)
  • 5GB+ disk space (for models and data)

πŸš€ Quick Start

1. Clone and Setup

Option A: Poetry (Recommended)

# Clone the repository
git clone <repo-url>
cd rag-project

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies with Poetry
poetry install --with=test,lint

# Activate virtual environment
poetry shell

Option B: Traditional pip

# Clone the repository
git clone <repo-url>
cd rag-project

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

2. Initialize Feast Feature Store

# Initialize Feast feature store
cd feast_feature_repo
feast apply
cd ..

3. Start Ollama and Pull Models

# Start Ollama (in a separate terminal)
ollama serve

# Pull required models
ollama pull llama3.2:3b

4. Start the RAG Pipeline

With Poetry:

# Development mode (auto-reload)
make dev
# or
poetry run uvicorn src.api:app --host 0.0.0.0 --port 8000 --reload

# Production mode
make run
# or
poetry run uvicorn src.api:app --host 0.0.0.0 --port 8000

With pip:

# Start the FastAPI server
uvicorn src.api:app --host 0.0.0.0 --port 8000

5. Access the Application

πŸ”§ API Reference

Health Check

curl -X GET "http://localhost:8000/health"

Response:

{
  "status": "healthy",
  "feast_store": "True",
  "milvus_connection": "False",
  "embedding_model": "True",
  "message": "Feast RAG pipeline is running with unified Milvus backend"
}

System Statistics

curl -X GET "http://localhost:8000/stats"

Response:

{
  "pipeline_status": "ready",
  "vector_store_stats": {
    "collection_name": "rag_document_embeddings",
    "document_count": 3,
    "chunk_count": 15,
    "backend": "feast_milvus_lite"
  },
  "embedding_model": "all-MiniLM-L6-v2",
  "llm_model": "llama3.2:3b"
}

Document Ingestion

curl -X POST "http://localhost:8000/ingest" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@sample_docs/sample_document.md"

Response:

{
  "message": "Successfully ingested sample_document.md with 5 chunks using feast_official",
  "chunks_created": 5,
  "source": "sample_document.md",
  "metadata": {
    "storage_method": "feast_official",
    "status": "success",
    "file_name": "sample_document.md",
    "document_id": "feast_sample_document.md_5"
  }
}

Query Documents

curl -X POST "http://localhost:8000/query" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What are the key features of this system?",
    "context_limit": 5
  }'

Response:

{
  "answer": "Based on the provided documents, the key features include...",
  "sources": [
    {
      "text": "Feature store capabilities with Feast...",
      "metadata": {
        "document_title": "sample_document.md",
        "chunk_index": 0,
        "file_path": "/path/to/document.md"
      },
      "similarity_score": 0.92
    }
  ],
  "context_used": 3,
  "relevance_scores": [0.92, 0.87, 0.84]
}

List Documents

curl -X GET "http://localhost:8000/documents"

Response:

{
  "documents": [
    {
      "title": "sample_document.md",
      "chunks": 5,
      "status": "processed"
    }
  ],
  "total_count": 1,
  "backend": "feast_milvus"
}

Clear All Documents

curl -X DELETE "http://localhost:8000/documents"

Response:

{
  "status": "success",
  "message": "Successfully cleared all documents from Feast Milvus database",
  "backend": "feast_milvus"
}

🌐 Web Interface Features

The web interface provides:

  • πŸ“€ Document Upload - Drag & drop interface supporting PDF, MD, TXT, DOCX
  • πŸ” Intelligent Query - Natural language questions with context-aware responses
  • πŸ“Š System Dashboard - Real-time monitoring and statistics
  • πŸ—‚οΈ Document Management - List, view, and clear uploaded documents
  • πŸ”„ Refresh Stats - Live system status updates

πŸ“ Project Structure

rag-project/
β”œβ”€β”€ src/                              # Core application code
β”‚   β”œβ”€β”€ api.py                       # FastAPI server & endpoints
β”‚   β”œβ”€β”€ feast_rag_pipeline.py        # Main RAG pipeline with Feast
β”‚   β”œβ”€β”€ feast_rag_retriever.py       # Feast-based document retrieval
β”‚   └── __init__.py
β”œβ”€β”€ feast_feature_repo/               # Feast feature store configuration
β”‚   β”œβ”€β”€ feature_store.yaml           # Feast configuration
β”‚   β”œβ”€β”€ feature_definitions.py       # Feature views & entities
β”‚   └── data/                        # Feature store data (excluded from git)
β”œβ”€β”€ static/                           # Web interface files
β”‚   β”œβ”€β”€ index.html                   # Main web UI
β”‚   β”œβ”€β”€ script.js                    # Frontend JavaScript
β”‚   └── style.css                    # UI styling
β”œβ”€β”€ sample_docs/                      # Example documents & screenshots
β”‚   └── ui_screenshots/              # Web interface screenshots
β”œβ”€β”€ requirements.txt                  # Python dependencies
β”œβ”€β”€ requirements-dev.txt              # Development dependencies
└── README.md                        # This file

πŸ”§ Configuration

Feast Feature Store (feast_feature_repo/feature_store.yaml)

project: rag
provider: local
registry: data/registry.db
online_store:
  type: milvus
  path: data/online_store.db
  vector_enabled: true
  embedding_dim: 384
  index_type: "FLAT"
  metric_type: "COSINE"
offline_store:
  type: file
entity_key_serialization_version: 3
auth:
  type: no_auth

Environment Variables

# Optional configuration
export FEAST_REPO_PATH="feast_feature_repo"
export OLLAMA_HOST="localhost"
export OLLAMA_PORT="11434"
export LLM_MODEL="llama3.2:3b"
export EMBEDDING_MODEL="all-MiniLM-L6-v2"

πŸ” Troubleshooting

Common Issues

  1. Feast repository not found

    cd feast_feature_repo
    feast apply
  2. Ollama model not available

    # Pull required models
    ollama pull llama3.2:3b
    ollama list  # Verify models are installed
  3. Collection not found after clear

    cd feast_feature_repo && feast apply
    # Restart the server to pick up recreated collection
    # The system automatically handles collection recreation
  4. Port conflicts

    # Use different port
    uvicorn src.api:app --host 0.0.0.0 --port 8000

Debug Commands

# Check service status
curl http://localhost:8000/health
curl http://localhost:8000/stats

# Check Ollama models
curl http://localhost:11434/api/tags

# Verify Feast setup
cd feast_feature_repo
feast entities list
feast feature-views list

πŸš€ Performance Optimization

Resource Requirements

  • Minimum: 8GB RAM, 4 CPU cores, 5GB storage
  • Recommended: 16GB RAM, 8 CPU cores, 20GB storage
  • Optimal: 32GB RAM, 16 CPU cores, 50GB SSD

Model Selection

# For better quality (requires more resources)
ollama pull llama3.2:3b

# For faster performance (lower quality)
ollama pull llama3.2:1b

# Update model in configuration
# Edit src/feast_rag_pipeline.py, line with model_name

File-based Milvus-lite Benefits

  • βœ… Simplified deployment: No external containers required
  • βœ… Single file database: Everything in feast_feature_repo/data/online_store.db
  • βœ… Production ready: Proven integration with Feast
  • βœ… Portable: Easy to backup and version control
  • βœ… Fast startup: No complex container orchestration

πŸš€ Deployment

🐳 Docker Compose (Recommended)

# Using the deploy directory
cd deploy
./run.sh

# Or manually with docker-compose
docker-compose up --build -d

☸️ Kubernetes

# Apply Kubernetes manifests
kubectl apply -f deploy/k8s-deployment.yaml

# Check deployment status
kubectl get pods -n feast-rag-pipeline

πŸ”§ Environment Configuration

Set environment variables for customization:

export RAG_API_PORT=9000
export RAG_LLM_MODEL=llama3.2:7b
export RAG_DEBUG_MODE=true

πŸ“‹ Production Checklist

  • Configure persistent volumes for Feast data
  • Set appropriate resource limits (CPU/Memory)
  • Configure Ollama models for your use case
  • Set up monitoring and logging
  • Configure backup for Milvus database

πŸ“– Full deployment guide: deploy/README.md

πŸ§ͺ Testing

Run the test suite to verify your setup:

With Poetry (Recommended):

# Run all tests with verbose output
make test
# or
poetry run pytest -v

# Run tests with coverage
make test-cov
# or
poetry run pytest --cov=src --cov-report=html

# Run specific test class
poetry run pytest tests/test_rag_pipeline.py::TestFeastRAGPipeline -v

# Stop on first failure
poetry run pytest -x

# Format code before testing
make format

# Run all linting checks
make lint

With pip:

# Run all tests with verbose output
python -m pytest tests/test_rag_pipeline.py -v

# Run tests with coverage (install pytest-cov first)
pip install pytest-cov
python -m pytest tests/test_rag_pipeline.py --cov=src

# Run specific test class
python -m pytest tests/test_rag_pipeline.py::TestFeastRAGPipeline -v

# Stop on first failure
python -m pytest tests/test_rag_pipeline.py -x

Test Coverage:

  • βœ… Pipeline initialization & error handling
  • βœ… Document processing with Feast integration
  • βœ… Query processing and retrieval
  • βœ… Collection clearing operations
  • βœ… Embedding generation functionality

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments


About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published