RAG Demo - Document Q&A System (Track B)

A Retrieval-Augmented Generation (RAG) application that allows users to upload documents and ask questions, receiving answers with source citations.

📄 Resume: View Resume

🔗 Live Demo: View Live Demo

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Streamlit     │    │   Document       │    │   Pinecone      │
│   Frontend      │───▶│   Processor      │───▶│   Vector DB     │
│                 │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                                               │
         ▼                                               ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   User Query    │───▶│   Retrieval      │◀───│   Query Similar │
│                 │    │   Pipeline       │    │   Documents     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │
                                ▼
                       ┌──────────────────┐
                       │   Cohere         │
                       │   Reranker       │
                       └──────────────────┘
                                │
                                ▼
                       ┌──────────────────┐
                       │   Groq LLM       │
                       │   Answer Gen     │
                       └──────────────────┘

🚀 Features

Document Upload: Support for PDF and TXT files
Text Chunking: Smart chunking with 15% overlap (1000-1200 tokens)
Vector Search: Pinecone cloud-hosted vector database with local embeddings
Reranking: Cohere Rerank API for improved relevance
LLM Integration: Groq Cloud for fast answer generation
Source Citations: Inline citations with source mapping
Performance Metrics: Query timing and token usage tracking

📋 Requirements Met

✅ Vector Database (Hosted)

Service: Pinecone cloud vector database
Model: sentence-transformers/all-mpnet-base-v2 (local embedding)
Dimension: 768
Strategy: Local embedding with SentenceTransformers, stored in Pinecone

✅ Embeddings & Chunking

Chunking: 1000-1200 tokens with 15% overlap
Metadata: Source, title, section, position stored for precise citations
Section Detection: Automatic section identification from headers and structure
Title Extraction: Smart title detection from document content
Embedding: SentenceTransformers all-mpnet-base-v2 (768D local embeddings)

✅ Retriever + Reranker

Retrieval: Top-20 MMR from Pinecone vector DB
Reranker: Cohere Rerank API (rerank-english-v3.0)
Fallback: Score-based reranker if Cohere unavailable

✅ LLM & Answering

Provider: Groq Cloud (fast inference)
Model: llama-3.1-8b-instant
Citations: Inline citations [1], [2] mapped to sources
Graceful handling: No-answer cases handled properly

✅ Frontend

Framework: Streamlit
Features: Upload/paste, query box, answers with citations
Metrics: Request timing and token cost estimates

✅ Hosting & Docs

Deployment: Ready for free hosting (Streamlit Cloud, Render, etc.)
Documentation: Architecture diagram, configuration details
Quick start: Complete setup instructions

🛠️ Setup Instructions

1. Clone and Install Dependencies

git clone <repository-url>
cd predusk
pip install -r requirements.txt

2. Environment Configuration

Create a .env file with your API keys:

cp .env.example .env

Edit .env with your credentials:

PINECONE_API_KEY=your_pinecone_api_key_here
GROQ_API_KEY=your_groq_api_key_here  
COHERE_API_KEY=your_cohere_api_key_here
PINECONE_INDEX_NAME=predusk-demo-v2

3. Pinecone Index Setup

Create a Pinecone index with these settings:

Name: predusk-demo-v2
Dimension: 768
Metric: cosine
Cloud: AWS
Region: us-east-1
Note: No auto-embedding needed (using local SentenceTransformers)

Index Details & Current Status

Index Name: predusk-demo-v2
Vector Count: ~1700 vectors (from Cisco SDMH24.pdf documentation)
Storage Used: ~5.08 MB estimated
Embedding Model: sentence-transformers/all-mpnet-base-v2
Vector Dimension: 768
Metadata Fields: source, title, section, position, text
Created: Ready for demo with pre-loaded Cisco documentation
Backup Content: Sample AI/ML content available for direct text input testing

4. Run the Application

streamlit run app.py

⚙️ Configuration

Chunking Parameters

Chunk Size: 1000 tokens
Overlap: 150 tokens (15%)
Strategy: Sentence-aware chunking with section detection
Metadata: Source, title, section, position for enhanced citations

Retrieval Settings

Initial Retrieval: Top-20 documents
Reranking: Top-5 after rerank
Similarity Metric: Cosine similarity

LLM Settings

Model: llama-3.1-8b-instant (Groq)
Max Tokens: 1000
Temperature: 0.1 (focused responses)

📚 Example Q&A Pairs

Note: Current vector database contains Cisco Router documentation. Use these questions for testing:

1. Cisco SDM Overview

Q: "What is the Cisco Router and Security Device Manager?" Expected: Explanation of SDM as a web-based configuration tool for Cisco routers

2. VPN Configuration

Q: "How do you configure VPN settings using SDM?" Expected: Description of VPN configuration steps and available options

3. Security Features

Q: "What security features are available in Cisco SDM?" Expected: Information about firewall, VPN, intrusion prevention, and security auditing

4. System Requirements

Q: "What are the system requirements for running SDM?" Expected: Browser requirements, Java versions, and supported operating systems

5. Alternative Content Testing

Q: Use the sample AI/ML content in sample_document.md via Direct Text Input Expected: Test with questions like "What are the types of AI?" after adding content

🔧 Architecture Details

Document Processing Pipeline

Upload: PDF/TXT file processing
Extraction: Text extraction with metadata
Chunking: Overlapping chunks with context preservation
Embedding: Local embedding via SentenceTransformers (all-mpnet-base-v2)
Storage: Vector storage with metadata in Pinecone

Query Processing Pipeline

Query: User question input
Retrieval: Vector similarity search (top-20)
Reranking: Cohere rerank to top-5
Generation: LLM answer generation with citations
Display: Formatted response with source mapping

📊 Performance Characteristics

Retrieval Speed: ~200-500ms (Pinecone)
Reranking: ~300-800ms (Cohere API)
Generation: ~1-3s (Groq LLM)
Total Response: ~2-4s end-to-end

🚨 Limitations & Trade-offs

Current Limitations

File Format: Only PDF and TXT supported
Language: English-optimized (Cohere rerank model)
Context Window: 1000 token chunks may break long concepts
Rate Limits: Subject to API provider rate limits

Design Trade-offs

Local vs Cloud Embedding: Chose local SentenceTransformers for consistency and control
Chunk Size: Balanced context vs. precision
Reranker: API dependency vs. local model performance
LLM Provider: Groq speed vs. OpenAI quality

🛡️ Error Handling

API Failures: Graceful fallbacks for each service
File Processing: Clear error messages for unsupported formats
No Results: Proper "no information found" responses
Rate Limiting: Automatic retries with backoff

📈 Future Enhancements

Possible Improvements

Multi-format Support: Word docs, PowerPoint, etc.
Advanced Chunking: Semantic chunking strategies
Multi-language: Support for non-English content
Hybrid Search: Combine dense + sparse retrieval
Conversation Memory: Multi-turn conversation support

Scalability Considerations

Caching: Query result caching
Batch Processing: Bulk document upload
Index Management: Multiple specialized indices
Load Balancing: API request distribution

📞 Support & Deployment

Quick Deploy to Streamlit Cloud (Recommended)

Push to GitHub: Commit your code to a GitHub repository
Connect to Streamlit Cloud:
- Visit share.streamlit.io
- Connect your GitHub account
- Select your repository and branch
- Set main file path to app.py

Configure Secrets:

In Streamlit Cloud dashboard, go to "Secrets"
Copy contents from .env.example and add your actual API keys:

PINECONE_API_KEY = "your_actual_pinecone_key"
GROQ_API_KEY = "your_actual_groq_key"
COHERE_API_KEY = "your_actual_cohere_key"
PINECONE_INDEX_NAME = "predusk-demo-v2"

Deploy: Click "Deploy" - your app will be live in minutes!

Alternative Deployment Options

Render: Container deployment with Dockerfile
Railway: Fast deployment platform
Heroku: Container deployment (may require paid dyno)

Local Development

streamlit run app.py

Environment Variables Required

PINECONE_API_KEY=<required>
GROQ_API_KEY=<required>
COHERE_API_KEY=<optional - fallback reranker used>
PINECONE_INDEX_NAME=<default: predusk-demo-v2>

Health Checks

The application includes built-in connectivity checks for:

Pinecone vector database connection
Groq LLM API availability
Cohere reranker service (with fallback)

📝 Remarks

Provider Limitations & Considerations

Pinecone (Vector Database)

Free Tier: 1 index, 100K vectors, 10MB storage
Performance: ~200-500ms query latency
Scalability: Excellent for production use
Cost: Usage-based pricing after free tier

Groq (LLM Provider)

Speed: Fastest inference (~1-3s for responses)
Rate Limits: 30 requests/minute on free tier
Model: llama-3.1-8b-instant optimized for speed
Quality: Good balance of speed vs. accuracy

Cohere (Reranker)

Free Tier: 1000 API calls/month
Fallback: Local score-based reranker when unavailable
Language: Optimized for English content
Performance: ~300-800ms reranking latency

SentenceTransformers (Embeddings)

Model: all-mpnet-base-v2 (768 dimensions)
Advantages: Consistent, local processing, no API costs
Performance: Fast local embedding generation
Quality: High-quality semantic representations

Architecture Decisions

Local Embeddings: Chose SentenceTransformers over cloud embedding APIs for consistency, cost control, and reduced external dependencies.
Groq LLM: Selected for fastest inference time, crucial for interactive demo experience.
Hybrid Reranking: Implemented fallback reranker to ensure system works even without Cohere API access.
Streamlit Frontend: Rapid prototyping and deployment, perfect for demo applications with built-in hosting support.

Built for: AI Engineer Assessment - Mini RAG System (Track B)
Tech Stack: Streamlit + Pinecone + Cohere + Groq + SentenceTransformers
Deployment Ready: ✅ All requirements met
Resume: View Resume

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
.env.example		.env.example
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
app.py		app.py
config.py		config.py
document_processor.py		document_processor.py
llm_service.py		llm_service.py
requirements.txt		requirements.txt
reranker.py		reranker.py
sample_document.md		sample_document.md
start.sh		start.sh
test_setup.py		test_setup.py
vector_store.py		vector_store.py

Folders and files

Latest commit

History

Repository files navigation

RAG Demo - Document Q&A System (Track B)

🏗️ Architecture

🚀 Features

📋 Requirements Met

✅ Vector Database (Hosted)

✅ Embeddings & Chunking

✅ Retriever + Reranker

✅ LLM & Answering

✅ Frontend

✅ Hosting & Docs

🛠️ Setup Instructions

1. Clone and Install Dependencies

2. Environment Configuration

3. Pinecone Index Setup

Index Details & Current Status

4. Run the Application

⚙️ Configuration

Chunking Parameters

Retrieval Settings

LLM Settings

📚 Example Q&A Pairs

1. Cisco SDM Overview

2. VPN Configuration

3. Security Features

4. System Requirements

5. Alternative Content Testing

🔧 Architecture Details

Document Processing Pipeline

Query Processing Pipeline

📊 Performance Characteristics

🚨 Limitations & Trade-offs

Current Limitations

Design Trade-offs

🛡️ Error Handling

📈 Future Enhancements

Possible Improvements

Scalability Considerations

📞 Support & Deployment

Quick Deploy to Streamlit Cloud (Recommended)

Alternative Deployment Options

Local Development

Environment Variables Required

Health Checks

📝 Remarks

Provider Limitations & Considerations

Pinecone (Vector Database)

Groq (LLM Provider)

Cohere (Reranker)

SentenceTransformers (Embeddings)

Architecture Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages