Quick Start Guide for Agentic RAG Chatbot

✅ System Status

Your Agentic RAG system is 95% ready! Here's what's working:

✅ Working Components:

✅ FastAPI backend with all dependencies installed
✅ React frontend with TypeScript
✅ PDF processing with OCR (PyMuPDF + Tesseract)
✅ Whisper STT for voice queries
✅ Web search agent (DuckDuckGo - no API key needed)
✅ Google Drive MCP (with mock fallback - works without credentials)
✅ Vector database (ChromaDB)
✅ Citation system with image modals

🔑 API Requirements:

Required: Google API Key (for Gemini LLM) - Only 1 step needed!
Optional: Google Drive credentials (currently using mock responses)
Optional: SerpAPI key (currently using free DuckDuckGo)

⚠️ Required Setup

🔑 Essential Setup (Required)

1. Get Your Google API Key (5 minutes) - REQUIRED

Go to Google AI Studio: https://makersuite.google.com/app/apikey
Click "Create API Key"
Copy the key

Edit backend/.env and replace:

GOOGLE_API_KEY=your_google_api_key_here

with:

GOOGLE_API_KEY=YOUR_ACTUAL_KEY_HERE

🌐 Optional Enhancements

2. Google Drive MCP Setup (Optional)

Current Status: ✅ Working with mock fallback (simulated responses) For Real Google Drive: Follow these steps to access your actual Google Drive files:

Go to Google Cloud Console: https://console.cloud.google.com/
Create a new project or select existing one
Enable Google Drive API
Create OAuth 2.0 credentials
Download credentials.json → Save to backend/credentials.json

Note: Without real credentials, system uses mock Google Drive responses (works perfectly for demo)

3. Web Search API Enhancement (Optional)

Current Status: ✅ Working with DuckDuckGo (free, no API key needed) For Enhanced Web Search: You can optionally use SerpAPI for better results:

Get SerpAPI key: https://serpapi.com/users/sign_up
Add to .env:
```
SERPAPI_API_KEY=your_serpapi_key_here
```
Uncomment SerpAPI code in backend/agents/web_search_agent.py

Note: DuckDuckGo works great for most queries (no API key required)

🚀 Start the System

Terminal 1 - Start Backend:

cd backend
python main.py

Server starts at: http://localhost:8001

Terminal 2 - Start Frontend:

cd frontend
npm run dev

Frontend starts at: http://localhost:5173

🧪 Test All Features

Upload PDF: Click "Choose File" and upload a PDF with images
Text Query: Type a question and click "Ask"
Voice Query: Click "🎙️ Voice Query" and speak
View Citations: Click on citation numbers to see sources
Image Modal: Click on PDF page images to view full size

📋 All Features Implemented

1. ✅ Streaming Speech-to-Text (STT)

Implementation: OpenAI Whisper model (base)
How it works: Click voice button → speak for 10 seconds → automatic transcription
File: backend/stt/streaming_stt.py

2. ✅ MultiModal RAG

PDF Text Extraction: PyMuPDF for clean text extraction
Image Processing: Automatic page screenshots saved as PNG
OCR: Tesseract OCR on images for text in graphics/charts
File: backend/rag/pdf_processor.py

3. ✅ Agentic Query System

a) ✅ RAG + Web Search + MCP Google Drive

RAG: ChromaDB vector search on uploaded PDFs
Web Search: DuckDuckGo search for recent/external info
Google Drive MCP: Searches Google Drive docs (with mock fallback)
File: backend/rag/query_engine.py

b) ✅ Citations & Grounding

Smart Citations: [1], [2], [3] format in responses
Source Tracking: Shows PDF pages, Google Drive docs, web results
Source Summary: Displays count of each source type used

c) ✅ Click-to-View Images

PDF Images: Click citation images to view full-size page screenshots
Image Modal: Beautiful overlay with close button
Web Links: Clickable links to Google Drive and web sources

🎯 System Architecture

Frontend (React + TypeScript)
├── ChatBox.tsx      # Main chat interface with citations
├── UploadPDF.tsx    # PDF upload with progress
├── VoiceMic.tsx     # Voice recording and submission
└── App.tsx          # Main app with status indicators

Backend (FastAPI + Python)
├── main.py          # REST API server with CORS
├── rag/
│   ├── pdf_processor.py    # PDF text + image extraction
│   ├── chroma_store.py     # Vector database
│   ├── query_engine.py     # Multi-source query processing
│   └── embedder.py         # Text embeddings
├── stt/
│   └── streaming_stt.py    # Whisper voice transcription
├── agents/
│   └── web_search_agent.py # DuckDuckGo search
└── mcp/
    └── google_drive_client.py # Google Drive integration

🔧 Technical Details

Voice Processing Flow:

Record: Frontend captures audio via MediaRecorder API
Upload: Audio sent to /voice-query/ endpoint
Transcribe: Whisper converts speech to text
Query: Text processed through full RAG pipeline
Response: Returns transcription + answer + citations

PDF Processing Flow:

Upload: PDF sent to /upload-pdf/ endpoint
Extract: PyMuPDF extracts text and renders page images
OCR: Tesseract processes images for additional text
Embed: Text chunks vectorized with ChromaDB
Store: Vectors saved for similarity search

Query Processing Flow:

Input: Text or transcribed voice query
RAG Search: Vector similarity search on PDF content
Web Search: DuckDuckGo for recent/external information
Google Drive: MCP search of user's Google Drive
Generate: LLM combines all sources with citations
Format: Response with clickable citations and images

🚀 You're Ready!

Your system implements all requested features:

✅ Streaming STT for voice queries
✅ MultiModal RAG with PDF images & graphs
✅ Agentic search (RAG + Web + Google Drive MCP)
✅ Citation/grounding with source tracking
✅ Click-to-view images and content

Just add your Google API key and start the servers! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide for Agentic RAG Chatbot

✅ System Status

✅ Working Components:

🔑 API Requirements:

⚠️ Required Setup

🔑 Essential Setup (Required)

1. Get Your Google API Key (5 minutes) - REQUIRED

🌐 Optional Enhancements

2. Google Drive MCP Setup (Optional)

3. Web Search API Enhancement (Optional)

🚀 Start the System

Terminal 1 - Start Backend:

Terminal 2 - Start Frontend:

🧪 Test All Features

📋 All Features Implemented

1. ✅ Streaming Speech-to-Text (STT)

2. ✅ MultiModal RAG

3. ✅ Agentic Query System

a) ✅ RAG + Web Search + MCP Google Drive

b) ✅ Citations & Grounding

c) ✅ Click-to-View Images

🎯 System Architecture

🔧 Technical Details

Voice Processing Flow:

PDF Processing Flow:

Query Processing Flow:

🚀 You're Ready!

FilesExpand file tree

QUICK_START.md

Latest commit

History

QUICK_START.md

File metadata and controls

Quick Start Guide for Agentic RAG Chatbot

✅ System Status

✅ Working Components:

🔑 API Requirements:

⚠️ Required Setup

🔑 Essential Setup (Required)

1. Get Your Google API Key (5 minutes) - REQUIRED

🌐 Optional Enhancements

2. Google Drive MCP Setup (Optional)

3. Web Search API Enhancement (Optional)

🚀 Start the System

Terminal 1 - Start Backend:

Terminal 2 - Start Frontend:

🧪 Test All Features

📋 All Features Implemented

1. ✅ Streaming Speech-to-Text (STT)

2. ✅ MultiModal RAG

3. ✅ Agentic Query System

a) ✅ RAG + Web Search + MCP Google Drive

b) ✅ Citations & Grounding

c) ✅ Click-to-View Images

🎯 System Architecture

🔧 Technical Details

Voice Processing Flow:

PDF Processing Flow:

Query Processing Flow:

🚀 You're Ready!