Skip to content

Latest commit

 

History

History
196 lines (142 loc) · 6.24 KB

File metadata and controls

196 lines (142 loc) · 6.24 KB

Quick Start Guide for Agentic RAG Chatbot

✅ System Status

Your Agentic RAG system is 95% ready! Here's what's working:

✅ Working Components:

  • ✅ FastAPI backend with all dependencies installed
  • ✅ React frontend with TypeScript
  • ✅ PDF processing with OCR (PyMuPDF + Tesseract)
  • ✅ Whisper STT for voice queries
  • ✅ Web search agent (DuckDuckGo - no API key needed)
  • ✅ Google Drive MCP (with mock fallback - works without credentials)
  • ✅ Vector database (ChromaDB)
  • ✅ Citation system with image modals

🔑 API Requirements:

  • Required: Google API Key (for Gemini LLM) - Only 1 step needed!
  • Optional: Google Drive credentials (currently using mock responses)
  • Optional: SerpAPI key (currently using free DuckDuckGo)

⚠️ Required Setup

🔑 Essential Setup (Required)

1. Get Your Google API Key (5 minutes) - REQUIRED

  1. Go to Google AI Studio: https://makersuite.google.com/app/apikey
  2. Click "Create API Key"
  3. Copy the key
  4. Edit backend/.env and replace:
    GOOGLE_API_KEY=your_google_api_key_here
    
    with:
    GOOGLE_API_KEY=YOUR_ACTUAL_KEY_HERE
    

🌐 Optional Enhancements

2. Google Drive MCP Setup (Optional)

Current Status: ✅ Working with mock fallback (simulated responses) For Real Google Drive: Follow these steps to access your actual Google Drive files:

  1. Go to Google Cloud Console: https://console.cloud.google.com/
  2. Create a new project or select existing one
  3. Enable Google Drive API
  4. Create OAuth 2.0 credentials
  5. Download credentials.json → Save to backend/credentials.json

Note: Without real credentials, system uses mock Google Drive responses (works perfectly for demo)

3. Web Search API Enhancement (Optional)

Current Status: ✅ Working with DuckDuckGo (free, no API key needed) For Enhanced Web Search: You can optionally use SerpAPI for better results:

  1. Get SerpAPI key: https://serpapi.com/users/sign_up
  2. Add to .env:
    SERPAPI_API_KEY=your_serpapi_key_here
    
  3. Uncomment SerpAPI code in backend/agents/web_search_agent.py

Note: DuckDuckGo works great for most queries (no API key required)

🚀 Start the System

Terminal 1 - Start Backend:

cd backend
python main.py

Server starts at: http://localhost:8001

Terminal 2 - Start Frontend:

cd frontend
npm run dev

Frontend starts at: http://localhost:5173

🧪 Test All Features

  1. Upload PDF: Click "Choose File" and upload a PDF with images
  2. Text Query: Type a question and click "Ask"
  3. Voice Query: Click "🎙️ Voice Query" and speak
  4. View Citations: Click on citation numbers to see sources
  5. Image Modal: Click on PDF page images to view full size

📋 All Features Implemented

1. ✅ Streaming Speech-to-Text (STT)

  • Implementation: OpenAI Whisper model (base)
  • How it works: Click voice button → speak for 10 seconds → automatic transcription
  • File: backend/stt/streaming_stt.py

2. ✅ MultiModal RAG

  • PDF Text Extraction: PyMuPDF for clean text extraction
  • Image Processing: Automatic page screenshots saved as PNG
  • OCR: Tesseract OCR on images for text in graphics/charts
  • File: backend/rag/pdf_processor.py

3. ✅ Agentic Query System

a) ✅ RAG + Web Search + MCP Google Drive

  • RAG: ChromaDB vector search on uploaded PDFs
  • Web Search: DuckDuckGo search for recent/external info
  • Google Drive MCP: Searches Google Drive docs (with mock fallback)
  • File: backend/rag/query_engine.py

b) ✅ Citations & Grounding

  • Smart Citations: [1], [2], [3] format in responses
  • Source Tracking: Shows PDF pages, Google Drive docs, web results
  • Source Summary: Displays count of each source type used

c) ✅ Click-to-View Images

  • PDF Images: Click citation images to view full-size page screenshots
  • Image Modal: Beautiful overlay with close button
  • Web Links: Clickable links to Google Drive and web sources

🎯 System Architecture

Frontend (React + TypeScript)
├── ChatBox.tsx      # Main chat interface with citations
├── UploadPDF.tsx    # PDF upload with progress
├── VoiceMic.tsx     # Voice recording and submission
└── App.tsx          # Main app with status indicators

Backend (FastAPI + Python)
├── main.py          # REST API server with CORS
├── rag/
│   ├── pdf_processor.py    # PDF text + image extraction
│   ├── chroma_store.py     # Vector database
│   ├── query_engine.py     # Multi-source query processing
│   └── embedder.py         # Text embeddings
├── stt/
│   └── streaming_stt.py    # Whisper voice transcription
├── agents/
│   └── web_search_agent.py # DuckDuckGo search
└── mcp/
    └── google_drive_client.py # Google Drive integration

🔧 Technical Details

Voice Processing Flow:

  1. Record: Frontend captures audio via MediaRecorder API
  2. Upload: Audio sent to /voice-query/ endpoint
  3. Transcribe: Whisper converts speech to text
  4. Query: Text processed through full RAG pipeline
  5. Response: Returns transcription + answer + citations

PDF Processing Flow:

  1. Upload: PDF sent to /upload-pdf/ endpoint
  2. Extract: PyMuPDF extracts text and renders page images
  3. OCR: Tesseract processes images for additional text
  4. Embed: Text chunks vectorized with ChromaDB
  5. Store: Vectors saved for similarity search

Query Processing Flow:

  1. Input: Text or transcribed voice query
  2. RAG Search: Vector similarity search on PDF content
  3. Web Search: DuckDuckGo for recent/external information
  4. Google Drive: MCP search of user's Google Drive
  5. Generate: LLM combines all sources with citations
  6. Format: Response with clickable citations and images

🚀 You're Ready!

Your system implements all requested features:

  • ✅ Streaming STT for voice queries
  • ✅ MultiModal RAG with PDF images & graphs
  • ✅ Agentic search (RAG + Web + Google Drive MCP)
  • ✅ Citation/grounding with source tracking
  • ✅ Click-to-view images and content

Just add your Google API key and start the servers! 🎉