A comprehensive Retrieval-Augmented Generation (RAG) chatbot system for Singapore Institute of Technology (SIT), featuring hybrid search capabilities with FAISS vector similarity and BM25 keyword matching, powered by OpenAI GPT and ElevenLabs voice integration.
- Intelligent RAG System: Hybrid search combining FAISS vector similarity and BM25 keyword matching
- Voice Integration: Speech-to-Text and Text-to-Speech using ElevenLabs APIs
- Interactive Frontend: React-based chat interface with animated otter mascot
- Dual Input Modes: Support for both text and voice interactions
- Real-time Processing: Fast query processing with optimized vector database
- SIT-Specific Knowledge: Trained on SIT course information and institutional data
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ RAG System │
│ (React/JS) │◄──►│ (Node.js) │◄──►│ (Python) │
│ Port 3000 │ │ Port 3000 │ │ Port 8000 │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Voice APIs │ │ File Upload │ │ Vector DB │
│ (ElevenLabs) │ │ (Multer) │ │ (LanceDB) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
Before running the system, ensure you have the following installed:
- Node.js (v16 or higher)
- Python (v3.8 or higher)
- FFmpeg (for audio processing)
- OpenAI API Key: For GPT-based response generation
- ElevenLabs API Key: For speech-to-text and text-to-speech functionality
git clone https://github.com/Finance-LLMs/SIT-Chatbot-RAG.git
cd SIT-Chatbot-RAG
- Download the database from Google Drive: LanceDB Database
- Extract the zip file
- Copy the
data
folder from the extracted files - Place it in the root directory
Your structure should look like:
SIT-Chatbot-RAG/
├── data/
│ └── vector-index-lancedb/
│ ├── bm25_index.lance/
│ └── faiss_index.lance/
├── SIT-chatbot-main/
│ ├── backend/
│ ├── src/
│ └── ...
├── SITCHATBOTLLM/
│ ├── server.py
│ ├── requirements.txt
│ └── ...
└── ...
cd SITCHATBOTLLM
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env # Create if doesn't exist
# Edit .env and add your OpenAI API key
cd SIT-chatbot-main
# Install Node.js dependencies
npm install
# Set up environment variables
cd backend
cp .env.example .env # Create if doesn't exist
# Edit .env and add your API keys
Create .env
files with the following structure:
OPENAI_API_KEY=your_openai_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
The system requires three components to run simultaneously:
cd SITCHATBOTLLM
python server.py
- Runs on:
http://localhost:8000
- Provides: Chat completions endpoint with RAG functionality
cd SIT-chatbot-main/backend
node server.js
- Runs on:
http://localhost:3000
- Provides: Frontend serving, API proxy, voice processing
cd SIT-chatbot-main
npm run build # Build the frontend
Open your browser and navigate to: http://localhost:3000
- Type your question in the chat input
- Click "Send" or press Enter
- Receive AI-powered responses based on SIT knowledge base
- Click the microphone icon to start recording
- Speak your question clearly
- The system will transcribe, process, and respond with both text and voice
- "What courses does SIT offer in cloud computing?"
- "Tell me about the admission requirements"
- "What are the fees for the DevOps program?"
- "How long is the Cloud Computing course?"
- Storage: LanceDB with 4,076+ indexed documents
- Search: Hybrid FAISS + BM25 for optimal retrieval
- Content: SIT course catalogs, program information, policies
- STT Model: ElevenLabs Scribe v1 with English language optimization
- TTS Model: ElevenLabs Turbo v2.5 for natural speech synthesis
- Audio Optimization: 16kHz, mono, PCM format using FFmpeg
POST /api/chat
- Chat completions (proxy to RAG system)POST /api/speech-to-text
- Convert audio to textPOST /api/text-to-speech
- Convert text to audio
SIT-Chatbot-RAG/
├── README.md # This file
├── SIT-chatbot-main/ # Frontend & API Server
│ ├── backend/
│ │ ├── server.js # Node.js Express server
│ │ ├── .env # API keys
│ │ └── uploads/ # Temporary audio files
│ ├── src/
│ │ ├── app.js # React frontend logic
│ │ ├── index.html # Main HTML file
│ │ └── styles.css # Styling
│ ├── dist/ # Built frontend files
│ ├── package.json # Node.js dependencies
│ └── webpack.config.js # Build configuration
├── SITCHATBOTLLM/ # RAG Backend
│ ├── server.py # FastAPI RAG server
│ ├── bm25_chunk_search.py # Search implementation
│ ├── requirements.txt # Python dependencies
│ └── .env # OpenAI API key
└── data/ # Additional data files
└── vector-index-lancedb/ # LanceDB storage
This project is licensed under the MIT License.