🧠 Agentic RAG Chatbot - Complete Implementation

🎯 Project Overview

fully functional Agentic RAG system that implements all the requested features:

✅ Feature 1: Streaming Speech-to-Text (STT)

Implementation: OpenAI Whisper model for high-accuracy transcription
User Experience: Click microphone button → speak for up to 10 seconds → automatic processing
Technical: Real-time audio capture via MediaRecorder API → Whisper processing → text query
Files: backend/stt/streaming_stt.py, frontend/src/components/VoiceMic.tsx

✅ Feature 2: MultiModal RAG with PDF Images & Graphs

Text Extraction: PyMuPDF for clean text extraction from PDFs
Image Processing: Automatic page screenshots saved as high-quality PNGs
OCR Integration: Tesseract OCR extracts text from charts, graphs, and images
Vector Storage: ChromaDB stores text embeddings for semantic search
Files: backend/rag/pdf_processor.py, backend/rag/chroma_store.py

✅ Feature 3a: Agentic Query Processing (RAG + Web Search + MCP)

RAG Component: Semantic search through uploaded PDF content
Web Search Agent: DuckDuckGo integration for recent/external information
Google Drive MCP: Model Context Protocol for searching user's Google Drive
Smart Routing: System intelligently decides which sources to query based on the question
Files: backend/rag/query_engine.py, backend/agents/web_search_agent.py, backend/mcp/google_drive_client.py

✅ Feature 3b: Citation & Grounding System

Smart Citations: Responses include [1], [2], [3] style citations
Source Tracking: Each citation shows exactly where information came from
Multi-Source Display: Clear indication of PDF pages, Google Drive docs, web results
Source Summary: Count of documents consulted from each source type
Files: Frontend citation display in ChatBox.tsx

✅ Feature 3c: Click-to-View Images & Content

PDF Page Images: Click citation images to view full-size page screenshots
Image Modal: Beautiful fullscreen overlay with smooth transitions
External Links: Clickable links to Google Drive documents and web sources
Preview System: Thumbnail images show relevant PDF pages in citations
Files: Image modal system in frontend/src/components/ChatBox.tsx

🏗️ System Architecture

🌐 Frontend (React + TypeScript + Tailwind CSS)
├── 💬 ChatBox.tsx      # Main chat interface with citations & modals
├── 📄 UploadPDF.tsx    # PDF upload with progress feedback
├── 🎙️ VoiceMic.tsx     # Voice recording and submission
└── 🚀 App.tsx          # Main application with feature showcase

🔧 Backend (FastAPI + Python)
├── 🌍 main.py                    # REST API server with CORS support
├── 📚 rag/
│   ├── 📄 pdf_processor.py       # PDF text + image extraction + OCR
│   ├── 🗃️ chroma_store.py        # Vector database operations
│   ├── 🧠 query_engine.py        # Multi-source query orchestration
│   └── 🔤 embedder.py            # Text embedding utilities
├── 🎙️ stt/
│   └── 🗣️ streaming_stt.py       # Whisper voice transcription
├── 🤖 agents/
│   └── 🌐 web_search_agent.py    # DuckDuckGo search integration
└── ☁️ mcp/
    └── 📁 google_drive_client.py # Google Drive Model Context Protocol

🔄 Complete Workflow Examples

📄 PDF Processing Workflow:

Upload: User uploads PDF with charts/graphs → POST /upload-pdf/
Extract: PyMuPDF extracts text + renders page images → images/page_X.png
OCR: Tesseract processes images for additional text extraction
Combine: Text from PDF + OCR text from images merged by page
Vectorize: Combined text chunked and embedded in ChromaDB
Ready: System confirms processing complete with stats

🎙️ Voice Query Workflow:

Record: User clicks mic → MediaRecorder captures 10s audio
Upload: Audio blob sent to → POST /voice-query/
Transcribe: Whisper converts speech to text
Process: Text query goes through full RAG pipeline
Respond: Returns transcription + answer + citations + images

🧠 Agentic Query Workflow:

Analyze: System analyzes query for keywords ("latest", "current", etc.)
RAG Search: ChromaDB semantic search on uploaded PDFs
Web Search: DuckDuckGo search for recent/external information
Drive Search: Google Drive MCP for user's cloud documents
Synthesize: Gemini LLM combines all sources with proper citations
Format: Response with clickable citations and preview images

🛠️ Technical Implementation Details

Backend Technologies:

FastAPI: Modern async web framework with automatic OpenAPI docs
LangChain: RAG pipeline orchestration and LLM integration
ChromaDB: High-performance vector database for embeddings
OpenAI Whisper: State-of-the-art speech recognition
PyMuPDF: Fast PDF processing without external dependencies
Tesseract OCR: Optical character recognition for images
Google Gemini: Advanced LLM for response generation

Frontend Technologies:

React 18: Modern component-based UI framework
TypeScript: Type safety and better development experience
Tailwind CSS: Utility-first styling for rapid UI development
Vite: Fast build tool with HMR for development
Axios: HTTP client for API communication

Key Features:

CORS Support: Proper cross-origin resource sharing setup
File Upload: Multipart form handling for PDFs and audio
Static File Serving: Efficient image serving for citations
Error Handling: Comprehensive error handling throughout
Progress Feedback: Real-time upload and processing status
Responsive Design: Works on desktop and mobile devices

📁 File Structure

📦 chatbot-query/
├── 📄 requirements.txt          # Python dependencies
├── 📄 QUICK_START.md           # Quick setup guide
├── 📄 setup_instructions.md    # Detailed setup instructions
├── 🔧 start.bat               # Windows startup script
├── 🧪 test_system.py          # System validation script
├── 🔧 backend/
│   ├── 📄 .env                # Environment variables
│   ├── 🌍 main.py             # FastAPI application
│   ├── 📚 rag/                # RAG system components
│   ├── 🎙️ stt/               # Speech-to-text system
│   ├── 🤖 agents/             # Web search agents
│   ├── ☁️ mcp/                # Google Drive integration
│   ├── 📁 temp/               # Temporary file storage
│   ├── 🖼️ images/             # PDF page images and screenshots
│   └── 📁 uploads/            # Uploaded file storage
└── 🌐 frontend/
    ├── 📄 package.json        # Node.js dependencies
    ├── ⚙️ vite.config.ts      # Vite configuration with proxy
    ├── 🎨 tailwind.config.js  # Tailwind CSS configuration
    └── 📁 src/
        ├── 🚀 App.tsx         # Main application component
        ├── 📁 components/     # React components
        └── 🎨 assets/         # Static assets

🚀 Quick Start (Final Steps)

1. Set Google API Key (Required)

# Edit backend/.env and set:
GOOGLE_API_KEY=your_actual_google_api_key_from_makersuite

2. Start the System

Option A - Use Startup Script:

# Double-click start.bat (Windows)

Option B - Manual Start:

# Terminal 1 - Backend
cd backend
python main.py

# Terminal 2 - Frontend
cd frontend
npm run dev

3. Access the Application

Frontend: http://localhost:5173
Backend API: http://localhost:8001
API Docs: http://localhost:8001/docs

🎯 Success Metrics

Your system now successfully implements:

✅ Advanced Voice Interface: Click-to-record voice queries with Whisper STT
✅ MultiModal Document Processing: PDFs with text, images, and charts
✅ Intelligent Information Retrieval: RAG + Web Search + Google Drive
✅ Transparent Source Attribution: Smart citations with click-to-view
✅ Modern User Experience: Responsive design with real-time feedback
✅ Production-Ready Architecture: Scalable FastAPI + React stack

🎉 You're Done!

Your Agentic RAG Chatbot is complete and ready for use! The system provides enterprise-level functionality with a modern, intuitive interface. All requested features have been implemented with production-quality code and comprehensive documentation.

Enjoy your new AI-powered document analysis and voice query system! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
chroma_db		chroma_db
frontend		frontend
.gitignore		.gitignore
API_CREDENTIALS_GUIDE.md		API_CREDENTIALS_GUIDE.md
COMPREHENSIVE_TECHNICAL_DOCUMENTATION.md		COMPREHENSIVE_TECHNICAL_DOCUMENTATION.md
CREDENTIAL_REQUIREMENTS.md		CREDENTIAL_REQUIREMENTS.md
INTELLIGENT_ROUTING_GUIDE.md		INTELLIGENT_ROUTING_GUIDE.md
PROJECT_COMPLETE.md		PROJECT_COMPLETE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
RELEVANCE_FILTERING_GUIDE.md		RELEVANCE_FILTERING_GUIDE.md
create_test_pdf.py		create_test_pdf.py
debug_external_query.py		debug_external_query.py
debug_query_engine.py		debug_query_engine.py
requirements.txt		requirements.txt
setup_instructions.md		setup_instructions.md
start.bat		start.bat
test_agentic_features.py		test_agentic_features.py
test_gemini.py		test_gemini.py
test_intelligent_routing.py		test_intelligent_routing.py
test_relevance_filtering.py		test_relevance_filtering.py
test_system.py		test_system.py
test_voice_simple.py		test_voice_simple.py
test_websocket.py		test_websocket.py
verify_features.py		verify_features.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Agentic RAG Chatbot - Complete Implementation

🎯 Project Overview

✅ Feature 1: Streaming Speech-to-Text (STT)

✅ Feature 2: MultiModal RAG with PDF Images & Graphs

✅ Feature 3a: Agentic Query Processing (RAG + Web Search + MCP)

✅ Feature 3b: Citation & Grounding System

✅ Feature 3c: Click-to-View Images & Content

🏗️ System Architecture

🔄 Complete Workflow Examples

📄 PDF Processing Workflow:

🎙️ Voice Query Workflow:

🧠 Agentic Query Workflow:

🛠️ Technical Implementation Details

Backend Technologies:

Frontend Technologies:

Key Features:

📁 File Structure

🚀 Quick Start (Final Steps)

1. Set Google API Key (Required)

2. Start the System

3. Access the Application

🎯 Success Metrics

🎉 You're Done!

About

Uh oh!

Releases

Packages

Languages

Vijaysingh1621/AgenticRAG-AI

Folders and files

Latest commit

History

Repository files navigation

🧠 Agentic RAG Chatbot - Complete Implementation

🎯 Project Overview

✅ Feature 1: Streaming Speech-to-Text (STT)

✅ Feature 2: MultiModal RAG with PDF Images & Graphs

✅ Feature 3a: Agentic Query Processing (RAG + Web Search + MCP)

✅ Feature 3b: Citation & Grounding System

✅ Feature 3c: Click-to-View Images & Content

🏗️ System Architecture

🔄 Complete Workflow Examples

📄 PDF Processing Workflow:

🎙️ Voice Query Workflow:

🧠 Agentic Query Workflow:

🛠️ Technical Implementation Details

Backend Technologies:

Frontend Technologies:

Key Features:

📁 File Structure

🚀 Quick Start (Final Steps)

1. Set Google API Key (Required)

2. Start the System

3. Access the Application

🎯 Success Metrics

🎉 You're Done!

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages