Skip to content

Vijaysingh1621/AgenticRAG-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Agentic RAG Chatbot - Complete Implementation

🎯 Project Overview

fully functional Agentic RAG system that implements all the requested features:

βœ… Feature 1: Streaming Speech-to-Text (STT)

  • Implementation: OpenAI Whisper model for high-accuracy transcription
  • User Experience: Click microphone button β†’ speak for up to 10 seconds β†’ automatic processing
  • Technical: Real-time audio capture via MediaRecorder API β†’ Whisper processing β†’ text query
  • Files: backend/stt/streaming_stt.py, frontend/src/components/VoiceMic.tsx

βœ… Feature 2: MultiModal RAG with PDF Images & Graphs

  • Text Extraction: PyMuPDF for clean text extraction from PDFs
  • Image Processing: Automatic page screenshots saved as high-quality PNGs
  • OCR Integration: Tesseract OCR extracts text from charts, graphs, and images
  • Vector Storage: ChromaDB stores text embeddings for semantic search
  • Files: backend/rag/pdf_processor.py, backend/rag/chroma_store.py

βœ… Feature 3a: Agentic Query Processing (RAG + Web Search + MCP)

  • RAG Component: Semantic search through uploaded PDF content
  • Web Search Agent: DuckDuckGo integration for recent/external information
  • Google Drive MCP: Model Context Protocol for searching user's Google Drive
  • Smart Routing: System intelligently decides which sources to query based on the question
  • Files: backend/rag/query_engine.py, backend/agents/web_search_agent.py, backend/mcp/google_drive_client.py

βœ… Feature 3b: Citation & Grounding System

  • Smart Citations: Responses include [1], [2], [3] style citations
  • Source Tracking: Each citation shows exactly where information came from
  • Multi-Source Display: Clear indication of PDF pages, Google Drive docs, web results
  • Source Summary: Count of documents consulted from each source type
  • Files: Frontend citation display in ChatBox.tsx

βœ… Feature 3c: Click-to-View Images & Content

  • PDF Page Images: Click citation images to view full-size page screenshots
  • Image Modal: Beautiful fullscreen overlay with smooth transitions
  • External Links: Clickable links to Google Drive documents and web sources
  • Preview System: Thumbnail images show relevant PDF pages in citations
  • Files: Image modal system in frontend/src/components/ChatBox.tsx

πŸ—οΈ System Architecture

🌐 Frontend (React + TypeScript + Tailwind CSS)
β”œβ”€β”€ πŸ’¬ ChatBox.tsx      # Main chat interface with citations & modals
β”œβ”€β”€ πŸ“„ UploadPDF.tsx    # PDF upload with progress feedback
β”œβ”€β”€ πŸŽ™οΈ VoiceMic.tsx     # Voice recording and submission
└── πŸš€ App.tsx          # Main application with feature showcase

πŸ”§ Backend (FastAPI + Python)
β”œβ”€β”€ 🌍 main.py                    # REST API server with CORS support
β”œβ”€β”€ πŸ“š rag/
β”‚   β”œβ”€β”€ πŸ“„ pdf_processor.py       # PDF text + image extraction + OCR
β”‚   β”œβ”€β”€ πŸ—ƒοΈ chroma_store.py        # Vector database operations
β”‚   β”œβ”€β”€ 🧠 query_engine.py        # Multi-source query orchestration
β”‚   └── πŸ”€ embedder.py            # Text embedding utilities
β”œβ”€β”€ πŸŽ™οΈ stt/
β”‚   └── πŸ—£οΈ streaming_stt.py       # Whisper voice transcription
β”œβ”€β”€ πŸ€– agents/
β”‚   └── 🌐 web_search_agent.py    # DuckDuckGo search integration
└── ☁️ mcp/
    └── πŸ“ google_drive_client.py # Google Drive Model Context Protocol

πŸ”„ Complete Workflow Examples

πŸ“„ PDF Processing Workflow:

  1. Upload: User uploads PDF with charts/graphs β†’ POST /upload-pdf/
  2. Extract: PyMuPDF extracts text + renders page images β†’ images/page_X.png
  3. OCR: Tesseract processes images for additional text extraction
  4. Combine: Text from PDF + OCR text from images merged by page
  5. Vectorize: Combined text chunked and embedded in ChromaDB
  6. Ready: System confirms processing complete with stats

πŸŽ™οΈ Voice Query Workflow:

  1. Record: User clicks mic β†’ MediaRecorder captures 10s audio
  2. Upload: Audio blob sent to β†’ POST /voice-query/
  3. Transcribe: Whisper converts speech to text
  4. Process: Text query goes through full RAG pipeline
  5. Respond: Returns transcription + answer + citations + images

🧠 Agentic Query Workflow:

  1. Analyze: System analyzes query for keywords ("latest", "current", etc.)
  2. RAG Search: ChromaDB semantic search on uploaded PDFs
  3. Web Search: DuckDuckGo search for recent/external information
  4. Drive Search: Google Drive MCP for user's cloud documents
  5. Synthesize: Gemini LLM combines all sources with proper citations
  6. Format: Response with clickable citations and preview images

πŸ› οΈ Technical Implementation Details

Backend Technologies:

  • FastAPI: Modern async web framework with automatic OpenAPI docs
  • LangChain: RAG pipeline orchestration and LLM integration
  • ChromaDB: High-performance vector database for embeddings
  • OpenAI Whisper: State-of-the-art speech recognition
  • PyMuPDF: Fast PDF processing without external dependencies
  • Tesseract OCR: Optical character recognition for images
  • Google Gemini: Advanced LLM for response generation

Frontend Technologies:

  • React 18: Modern component-based UI framework
  • TypeScript: Type safety and better development experience
  • Tailwind CSS: Utility-first styling for rapid UI development
  • Vite: Fast build tool with HMR for development
  • Axios: HTTP client for API communication

Key Features:

  • CORS Support: Proper cross-origin resource sharing setup
  • File Upload: Multipart form handling for PDFs and audio
  • Static File Serving: Efficient image serving for citations
  • Error Handling: Comprehensive error handling throughout
  • Progress Feedback: Real-time upload and processing status
  • Responsive Design: Works on desktop and mobile devices

πŸ“ File Structure

πŸ“¦ chatbot-query/
β”œβ”€β”€ πŸ“„ requirements.txt          # Python dependencies
β”œβ”€β”€ πŸ“„ QUICK_START.md           # Quick setup guide
β”œβ”€β”€ πŸ“„ setup_instructions.md    # Detailed setup instructions
β”œβ”€β”€ πŸ”§ start.bat               # Windows startup script
β”œβ”€β”€ πŸ§ͺ test_system.py          # System validation script
β”œβ”€β”€ πŸ”§ backend/
β”‚   β”œβ”€β”€ πŸ“„ .env                # Environment variables
β”‚   β”œβ”€β”€ 🌍 main.py             # FastAPI application
β”‚   β”œβ”€β”€ πŸ“š rag/                # RAG system components
β”‚   β”œβ”€β”€ πŸŽ™οΈ stt/               # Speech-to-text system
β”‚   β”œβ”€β”€ πŸ€– agents/             # Web search agents
β”‚   β”œβ”€β”€ ☁️ mcp/                # Google Drive integration
β”‚   β”œβ”€β”€ πŸ“ temp/               # Temporary file storage
β”‚   β”œβ”€β”€ πŸ–ΌοΈ images/             # PDF page images and screenshots
β”‚   └── πŸ“ uploads/            # Uploaded file storage
└── 🌐 frontend/
    β”œβ”€β”€ πŸ“„ package.json        # Node.js dependencies
    β”œβ”€β”€ βš™οΈ vite.config.ts      # Vite configuration with proxy
    β”œβ”€β”€ 🎨 tailwind.config.js  # Tailwind CSS configuration
    └── πŸ“ src/
        β”œβ”€β”€ πŸš€ App.tsx         # Main application component
        β”œβ”€β”€ πŸ“ components/     # React components
        └── 🎨 assets/         # Static assets

πŸš€ Quick Start (Final Steps)

1. Set Google API Key (Required)

# Edit backend/.env and set:
GOOGLE_API_KEY=your_actual_google_api_key_from_makersuite

2. Start the System

Option A - Use Startup Script:

# Double-click start.bat (Windows)

Option B - Manual Start:

# Terminal 1 - Backend
cd backend
python main.py

# Terminal 2 - Frontend
cd frontend
npm run dev

3. Access the Application

🎯 Success Metrics

Your system now successfully implements:

  • βœ… Advanced Voice Interface: Click-to-record voice queries with Whisper STT
  • βœ… MultiModal Document Processing: PDFs with text, images, and charts
  • βœ… Intelligent Information Retrieval: RAG + Web Search + Google Drive
  • βœ… Transparent Source Attribution: Smart citations with click-to-view
  • βœ… Modern User Experience: Responsive design with real-time feedback
  • βœ… Production-Ready Architecture: Scalable FastAPI + React stack

πŸŽ‰ You're Done!

Your Agentic RAG Chatbot is complete and ready for use! The system provides enterprise-level functionality with a modern, intuitive interface. All requested features have been implemented with production-quality code and comprehensive documentation.

Enjoy your new AI-powered document analysis and voice query system! πŸš€

About

An agentic multimodal RAG assistant with voice queries, PDF/image understanding, web search, and grounded answers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors