A powerful Retrieval-Augmented Generation (RAG) system that allows users to upload various data formats and interact with them through natural language queries. Built with modern technologies and designed for scalability and ease of use.
- π Multi-Format Support: Upload and process CSV, Excel, PDF, and text files
- π§ Intelligent Retrieval: Uses sentence transformers for semantic search
- π¬ Natural Language Chat: Query your data using conversational AI powered by Google Gemini
- π Vector Database: ChromaDB for efficient similarity search and retrieval
- π Real-time Processing: Instant file processing and indexing
- π Chat History: Persistent conversation history with context awareness
- π¨ Modern UI: Clean, responsive interface built with React and Tailwind CSS
- β‘ Fast API: High-performance backend with FastAPI and async processing
ββββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β React Frontend ββββββ FastAPI ββββββ ChromaDB β
β (Vite + Tailwind)β β Backend β β Vector Store β
ββββββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ
β Google Gemini β
β AI Model β
βββββββββββββββββββ
- Frontend: React 18 with Vite, Tailwind CSS, and Lucide React icons
- Backend: FastAPI with async support, CORS middleware, and structured routing
- AI Model: Google Gemini 2.5 Flash for natural language processing
- Embeddings: Sentence Transformers for semantic understanding
- Vector Database: ChromaDB for efficient similarity search
- File Processing: Support for multiple formats with automatic text extraction
- FastAPI - Modern, fast web framework for building APIs
- Google Generative AI - Gemini 2.5 Flash model integration
- ChromaDB - Vector database for embeddings and similarity search
- Sentence Transformers - State-of-the-art sentence embeddings
- Pandas - Data manipulation and analysis
- PDFPlumber - PDF text extraction
- OpenPyXL - Excel file processing
- React 18 - Modern React with hooks and functional components
- Vite - Fast build tool and development server
- Tailwind CSS - Utility-first CSS framework
- Lucide React - Beautiful, customizable icons
generic-data-rag-agent/
βββ backend/
β βββ app/
β β βββ core/
β β β βββ config.py # Configuration settings
β β βββ routers/
β β β βββ chat.py # Chat endpoints
β β β βββ files.py # File management endpoints
β β β βββ history.py # History endpoints
β β βββ services/
β β β βββ indexer.py # Document indexing
β β β βββ ingestion.py # File processing
β β β βββ retriever.py # Vector search
β β β βββ history.py # Chat history management
β β βββ main.py # FastAPI application
β β βββ models.py # Pydantic models
β β βββ storage.py # File storage utilities
β βββ chroma_db/ # Vector database storage
β βββ uploads/ # Uploaded files storage
β βββ requirements.txt # Python dependencies
β βββ start_server.py # Server startup script
βββ frontend/
β βββ src/
β β βββ App.jsx # Main React component
β β βββ main.jsx # React entry point
β β βββ index.css # Tailwind styles
β βββ index.html # HTML template
β βββ package.json # Node.js dependencies
β βββ tailwind.config.js # Tailwind configuration
β βββ vite.config.js # Vite configuration
βββ start-backend.bat # Windows backend starter
βββ start-frontend.bat # Windows frontend starter
βββ README.md # This file
- Python 3.8+
- Node.js 16+
- Google Gemini API Key (Get it here)
git clone https://github.com/yashdew3/generic-data-rag-agent.git
cd generic-data-rag-agent# Navigate to backend directory
cd backend
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .envCreate a .env file in the backend directory:
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.5-flash
FRONTEND_ORIGIN=http://localhost:5173# Navigate to frontend directory (new terminal)
cd frontend
# Install dependencies
npm install# Start backend (from root directory)
start-backend.bat
# Start frontend (from root directory)
start-frontend.bat# Terminal 1 - Backend
cd backend
python start_server.py
# Terminal 2 - Frontend
cd frontend
npm run dev- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Click the "Choose Files" button
- Select CSV, Excel, PDF, or text files
- Files are automatically processed and indexed
- Type natural language questions about your uploaded data
- Examples:
- "What are the main trends in this dataset?"
- "Summarize the key findings from the uploaded report"
- "Show me insights about sales performance"
POST /files/upload- Upload and process filesGET /files/list- List uploaded filesDELETE /files/{file_id}- Delete a file
POST /chat/message- Send a chat messageGET /chat/history/{session_id}- Get chat history
GET /history/sessions- List all chat sessionsDELETE /history/sessions/{session_id}- Delete a session
cd backend
python test_system.pycd frontend
npm run lint # ESLint checking
npm run build # Production build
npm run preview # Preview production build- CORS Protection: Configurable origin restrictions
- File Validation: Secure file type checking
- API Key Management: Environment-based configuration
- Input Sanitization: Secure data processing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page (if you have one) or open a new issue to discuss changes. Pull requests are also appreciated.
This project is licensed under the MIT License Β© Yash Dewangan
Feel free to connect or suggest improvements!
- Built by Yash Dewangan
- πGithub: YashDewangan
- π§Email: yashdew06@gmail.com
- πLinkedin: YashDewangan
Built with β€οΈ for intelligent data interaction
This project demonstrates modern RAG architecture with production-ready code quality and comprehensive documentation.
